Using Government IDs for Age Assurance

From selfies to zero-knowledge proofs

Posted by ekr on 19 Oct 2025

On the Internet, nobody knows you're a dog

Over the past few years, an increasing number of jurisdictions have started to require that service providers of various kinds (most frequently pornography but also social networking sites) check the age of their users. Many of these laws and regulations don't specify any particular form of age assurance, but instead simply require it to be "effective", or in the words of UK's OfCom, "highly effective". One obvious way to do this is to use some form of government ID to establish that you fall within the appropriate age range.

Government IDs in Person #

Before we talk about the online context, let's talk about how government IDs work in the physical context. Generally, it's a piece of plastic with your photo and some personal information (name, date of birth, etc.) on the front, and a bar code on the back, as shown below:

Drivers license front and back — Driver's license front and back

The US Situation #

Unlike many other countries, the US does not have a national ID card. Instead, the main form of ID is a driver's license, which is issued by states. While A US passport is issued by the federal government but many Americans do not have passports. Nearly everyone has a social security card, but these aren't usable as a form of authentication because they don't have any kind of biometric (not even a picture) to tie the card to the subject, nor do they have any meaningful anti-tampering or anti-forgery features.

National ID cards (in countries that issue them) are constructed in a similar fashion. The basic way that authentication with an ID card is that the relying party (e.g., the bartender checking your age) compares the photo on the front of the card to your face and then reads the information printed on the card. If they want to know whether you are over 21, they can just look at your birthday.

This has a number of obvious security and privacy challenges, starting with the integrity of the card itself. It's not at all difficult to find someone to make you a piece of plastic with your picture on it—think of the all the employers who issue ID badges—so plainly just having a plastic card isn't enough. Real ID cards have a number of features designed to prevent forgery or tampering, such as holograms, images that appear only under UV lights, raised parts of the card, etc. hence why you see TSA agents shining a UV light on your ID.^[1]

The security logic goes like this:

The card is tamper-resistant and so you can trust that the picture and information on the card are what was intended by the issuer.
The picture matches the person in front of you and therefore the card is theirs.
Because the card binds the picture and the information on the card together, the information on the card applies to the person in front of you.

Modern driver's licenses also have a bar code on on the back that replicates much of the information on the front.^[2] However, this usually doesn't have any extra digital security features (e.g., a digital signature) so for our purposes it's just a more robust way of reading the front of the card.

Remote authentication via ID cards #

An ID card of the type discussed above is an inherently physical object in that the security guarantees are tied to the card itself. Despite this, there are many situations in which people want to authenticate themselves remotely.

Send a scan or a photo of the ID #

The simplest thing to do is just to have the subject scan their ID or take a photo of it and send the resulting image over e-mail. This is inherently a very weak form of authentication for several reasons. First, nothing ties the image to the person who originally sent it. This means that anyone who can get an image of your license can impersonate you, including anyone who you send the image to or anyone who has momentary access to your ID. Think of the number of times you have to show your ID in a year and realize that each of those people has an opportunity to take an image of it.

Second, the process of photographing or scanning the ID nullifies nearly all of the physical security features, which means that it's trivial to make a fake image which looks sufficiently like a real ID to pass visual inspection, either starting with a real ID or totally from scratch. In fact, there are services which will do this for you, though it's not that difficult if you have reasonable skills with an image editing tool like Photoshop. Despite all this, scanned copies of IDs are surprisingly common.

Live Presentation #

A better practice is to require the subject to do something to show it's their ID. There are a number of options here, including:

Having them take a selfie with the ID
Using their device camera to take a selfie or a self-video

In general, a selfie is weaker than a video call because the attacker can just edit the card image into the selfie. On a video call, the relying party can require you to turn your head, make different expressions, etc. as a liveness check (though some of these systems are circumventable) as well as show the card from different angles, which facilitates checking for some of the security features. This kind of video authentication is in quite wide use, in both commercial and government contexts. For example, it's one of the permitted mechanisms for employment eligibility verification in the US, and is also used for age verification.

The security of these systems varies a fair bit depending on on the precise technical design. In general, there seems to be a moderate level of resistance to what's called a "presentation attack" in which the user has a fake card, wears a mask, etc. It's much harder to defend against what's called an "injection attack" in which the attacker controls the camera and can send any video content they want. While there are some techniques that try to detect artifacts in the video feed, the main defense is to have the device remotely attest to the integrity of the video feed via some attestation mechanism. However, this does not work on the Web, which does not have software attestation mechanisms.

It's possible that in the future you might get some leverage from C2PA, especially for still images, but I think it's unlikely that it will work for video because the browser reads in raw video from the camera and then compresses it for transmission, thus removing any attestation.

From Physical IDs to Digital IDs #

As illustrated by the discussion above, the use of physical ID cards for remote authentication suffers from two main security problems:

The binding between the various pieces of information on the card is weak because it depends on security features which are designed for in-person verification.
The binding between the subject being identified and the ID card is weak.

In addition, when used for age assurance, physical IDs have suboptimal privacy properties:

They reveal a lot more information than just whether you are over the required age.
They require you to show your face, which you might not want to do.

There are a number of age assurance settings why people aren't going to be excited about disclosing their full names (e.g., to watch porn). Not only do you have to worry about the relying party disclosing your identity, there is also the risk of data breaches, as recently happened with age verification for Discord.

Digital IDs attempt to address all of these problems using (surprise!) cryptography.

Digital Signatures #

We already have a way to address the problem of weak binding between the elements on the card: we digitally sign the data. Naively, we can just do what we do all the time for WebPKI certificates: each authority (i.e., an entity that issues IDs, such as the State of California) has an asymmetric key pair. When they want to issue an ID, they just take all the data that would go on the card (much of which is already on the card in digital format anyway) and digitally sign it with that key.

A digitally signed credential of this type is essentially a complete replacement for the physical card: you can encode it in any digital medium, such as a QR code and show it to the verifier (relying party). The verifier has some trusted device which can read the data and a list of the public key pairs it trusts to sign valid credentials. Once the credential is verified, all the data is trustworthy, and so the device can display it to the verifier, who then compares the picture to the subject's face, just as with a physical ID.

Unlike a physical ID, a digital credential of this type doesn't depend on any kind of physical tamper resistance, because all the security is cryptographic. This allows it to be encoded in a QR code and just printed on ordinary paper, and of course you can print as many copies as you want. This is a convenient property in some respects: if you've ever lost your driver's license, you'll know it can be a pain to replace—and don't even get me started on what it's like to replace a green card—and in the meantime you can't prove your identity at all. Think how much easier it would be if you could just print out a new ID on your home printer.

Unfortunately, having a trivially copyable credential also presents a security problem because of the weak binding between the subject and the credential: there are many people who look like you, so if you can get the ID of one of them, you can probably use it. Physical credentials make this attack somewhat hard to mount because they're hard to duplicate (assuming the security features are working) so if you have someone's ID that means they don't, meaning you have to steal or borrow someone else's ID. By contrast, there can be an arbitrary number of equally valid copies of a digital credential, so it's much more vulnerable to attacks where one person impersonates an other.

Credential Binding #

In order to prevent this kind of attack (technical term: cloning), real-world digital credentials systems are usually designed so they can't be used as a standalone form of identification. Instead, they are bound to a cryptographic key pair, just like a WebPKI certificate. When you request a digital credential, you provide a public key, which is then encoded as part of the credential. In order to authenticate with the credential, you demonstrate that you know the corresponding private key. The overall process looks like this:

Authentication with a cryptographic credential

Describing a real protocol is out of scope for this post, but at a high level, the verifier supplies a random Challenge value which the subject's device signs with its private key, thus proving that it knows the key. You need the challenge to prevent replay attacks where the verifier just makes a copy of the signature to show to some third party; because each verifier provides its own challenge, the signature isn't replayable.^[3]

Note that this challenge/response process just proves that the person trying to authenticate has the right private key, but not that it's the right person. For instance, if I were to steal your phone with your credential on it, I might be able to impersonate you. In order to ensure that it's the right actual person, you also need to check the picture against the person's face.

The result of this design is that even though the credential is copyable, the copy isn't useful if you don't have the corresponding private key, so it doesn't matter^[4] if the credential is public.^[5] However, it's still possible to clone credentials if the subject cooperates.^[6] Obviously, I'm not likely to let arbitrary people impersonate me, but what if an older sibling wants to let a younger sibling "borrow their ID" so that they can drink? With a physical card, this means that the older sibling can't authenticate, but with a digital credential they just have to give them a copy of their private key, which is much easier.^[7] In order to prevent this kind of attack, some credentials systems bind the credential to a specific device.

Device Binding #

Conceptually device binding works the same way as we've just seen, but instead of binding to any private key, the credential is bound to a key which is stored in a secure element, which is industry jargon for a tamper-resistant processor that lives inside your device. The key pair is generated inside the secure element, which is designed so that it won't disclose the private key, though it can be used to sign data. This means that the user can still authenticate but can't make a copy of the key.

At this point, you might be asking what prevents the user from claiming that they generated the key pair inside a secure element but actually generating it inside a regular computer and keeping a copy of the private key. The answer is at credential issuance time the subject has to provide an attestation which shows that the key was generated inside a secure element. The details of attestation mechanisms are complicated, but at a high level the secure element will have its own device key pair which is certified by the hardware manufacturer; it uses the device key pair to sign the public key for the credential. The issuer can then verify the signature chain and know that the private key was generated inside the secure element.

It's important to realize that this attestation mechanism relies on the issuer^[8] trusting the hardware manufacturer (e.g., Apple), both to manufacture the device so it's really tamper resistant and not to certify device key pairs that aren't associated with secure elements (as well as not to have their own certification infrastructure compromised). However, this also means that this kind of device bound credential is an inherently closed system; you can't just go buying any device you want, but instead you have to buy one that is trusted, and at the end of the day the secure element works for the manufacturer, not for you.

Selective Disclosure #

An unfortunate property of physical credentials is that they require disclosing all of the information on the ID. The standard example here is that when you want to buy alcohol, the clerk only needs to know you are over 21 (in the US), but when you show them your driver's license, they also learn your name, address, date of birth, etc. There's no real way around this because the credential is just a dumb piece of plastic, but with digital credentials you can do better.

The standard mechanism here is what's called "selective disclosure". Instead of just signing all the information directly like with a WebPKI certificate, the issuer instead signs a list of hashes, with one hash for each attribute, like so:

Signed attributes for selective disclosure

In order to prove a specific attribute (e.g., date of birth), the subject sends the verifier three values:

The signed list of hashes.
The actual value of the attribute plus its corresponding random value.
The signature over the hash list.

Commitments #

The reason you hash the attributes plus the random values and not just the attributes themselves is that some attributes are low entropy (i.e., they only have a small number of valid values). For example, there are only about 40000 valid birthdates, so an attacker who has the hash of your birthdate can easily just hash all of them and look for a matching hash. If you also hash in a secret random value, then the attacker also needs to try every possible random value, which is prohibitive if you use a long enough random value. The technical term for this in cryptography is a commitment.

The verifier checks the signature over the list of hashes and thus knows that they are valid. It then hashes the attribute value and random and ensures that it matches the hash on the list, thus showing that the attribute value is valid as well. However, it doesn't learn the actual values for the other attributes, just their hashes. This means that the subject wants to buy alcohol or access a pornography site they can reveal just their birthdate and not their name or address.

We can actually do better here. In this case the subject is just trying to prove they are over a certain age, which doesn't require knowing their actual date of birth. The way you support this use case in a selective disclosure scheme is by having a set of attributes that say whether a user is over or under a certain age. For example, if the subject is 18, we might have the following attributes:

Attribute	Value
>= 16	Yes
>= 17	Yes
>= 18	Yes
>= 19	No
>= 20	No
>= 21	No

When the verifier asks the subject to prove that they are over a certain age, the subject can present the appropriate attribute, which doesn't tell the verifier anything else. This technique doesn't allow you to prove arbitrary predicates (e.g., "I was born on a Tuesday") because the issuer needs to encode each predicate as its own attribute. However, as a practical matter there aren't that many predicates you want to prove on a regular basis.

Note that there are a few subtle points here. First, the subject should show the assertion that is closest to the requested threshold (in this case the smallest assertion) to prevent leaking more information than needed (if you need to be 18 to use a pornography site, then you don't want to prove you are over 21). Second, the verifier can't be allowed to make repeated queries for different age threshold, otherwise they can determine your precise age to within the granularity of the assertions. For example, the ISO spec for mobile IDs restricts the verifier to asking for two values in order to support querying for an age range.

Identity Binding and Selective Disclosure #

As should be apparent at this point, we now have the makings of a remote age verification system: we issue everyone a digital credential and then they use it to remotely prove their age using selective disclosure. Just as with a physical ID, we can remotely authenticate by providing a photo to the verifier along with a video or a selfie showing the subject's face. However, selective disclosure means we need not do so, because we can just disclose the relevant attributes. Of course, in this case, the only thing binding the credential to the actual user is the signature from the device bound private key, which means we're leaning much harder on the secure element; if that is compromised and the key is disclosed than anyone can remotely authenticate, not just someone who looks like the subject.

Linkability #

A selective disclosure system improves privacy by preventing the relying party from learning any information about the user other than the specific attributes that are disclosed. However, there are still privacy issues because the credentials are linkable. Consider the case where the user uses their credentials twice:

With a porn site to prove they are over 18.
At the airport to prove their name matches the ticket.

The table below shows the information disclosed in each case:

Scenario	Information
Porn site	Signed block, age >= 18
Airport	Signed block, Name

The problem here should be immediately obvious: the signed block is the same in both cases, which means that it's possible to link the two transactions. Specifically, this means that the airport and the porn site can collude to allow the porn site to learn the user's name even though it wasn't disclosed to them. More generally, relying parties can collude to determine the union of all disclosed attributes for a single credential.

There's actually an old clever—though inefficient—trick to prevent linkage by the issuer in selective disclosure systems, due to David Chaum. It takes advantage of a technique called a blind signature, which allows you to digitally sign a message M without seeing M. In the credential issuance setting, the subject would generate the valid unsigned credential C and then send the issuer a blinded version Blind(C). The issuer signs this value and returns Sign(Blind(C)) and the subject then removes the blinding to recover Sign(C) (which also has a different signature).

This leaves us with the problem that the subject might generate a bogus credential (i.e., with false information) and because the issuer is signing a blinded object, it can't tell whether it's bogus or not. The trick here is that the subject instead generates more than one candidate credential, C1, C2, C3... Cn, blinds them all, and sends the blinded values to the issuer. The issuer then picks one at random to sign and asks the subject to unblind the others. The issuer then checks that the unblinded credentials have valid values and if so, signs the remaining blinded one. If any have invalid values, the issuer refuses to sign, and potentially attempts to punish the subject.

By making the number of candidate credentials sufficiently large, the chance of successfully getting an invalid but signed credential can be made arbitrarily small. This technique is called "cut-and-choose" after the famous trick for fair division.

The obvious fix for this problem is for the issuer to give the user multiple credentials with the same information and the user uses a separate one for each transaction; this prevents relying parties from linking up individual transactions because the signature blocks will be different.^[9] It does not, however, prevent the relying party from colluding with the issuer to track the user. There are a number of plausible scenarios in which this could happen, but perhaps the most concerning in the context of age verification is that the issuer (in this case the government) uses some legal process to require the relying party (the age verification provider or the porn site) to provide the credentials the user provided and then links them up locally in order to determine which specific users visited which sites or (depending on the design) viewed which content. We'll see how to address this issue below.

Mobile Driver's Licenses #

Of course, it's a giant pain to roll out a whole new digital credential system for age assurance, but the good news is that we don't have to. [Corrected -- 2025-10-19]. This kind of digital credential system is already being rolled out for other purposes in a number of jurisdictions, including:

mobile drivers licenses (mDLs) in several countries and about 15 US states (10 of which are supported by Apple Wallet).
The upcoming EU Digital Wallet.

Both of these implement the ISO/IEC 18013-5:2021 specification, which is conceptually similar to the system I've described above.

ISO 18013-5 data model — ISO 180135-5 credential data model

To orient yourself here to the terminology here, the entire credential is called an mdoc and the mobile security object (MSO) is the signed object that contains the list of hashes. The mdoc public key is the key tied to the device. Once you have this kind of digital credential you can use it to prove your age in the same way as we've just shown above.

Bootstrapping off of this kind of digital credentials has two attractive privacy properties:

There are going to be many reasons to get a digital credential (e.g., to prove your right to drive, authenticate online, or identify yourself at the airport). This means a lot of people will have one anyway and unlike many age verification systems, the act of getting a digital credential doesn't inherently reveal that you want to engage in some age-restricted activity (e.g., watching pornography).
You don't need to prove your identity at all (nor reveal your appearance) in order to prove that you are old enough to access age restricted content; you just need to prove that you are over the threshold age.

Unsurprisingly, both Apple and Google have proposed remote authentication systems based on digital credentials. These systems are generic and support arbitrary types of authentication, including age verification.

Apple Digital Credentials #

Apple's proposed system is a fairly straightforward implementation of the selective disclosure system described above, with the addition of a Web interface, based on the W3C digital credentials API, thus allowing the user to remotely authenticate to a Web site. The overall workflow is shown below:

The process starts with the user loading their mDL into the device. As part of this process, the user is asked to take views of their face from multiple angles in order to ensure that they are the person associated with the ID. This process only has to be done once.^[10] The user also has to authenticate via FaceID or TouchID.

Later, when the user goes to a Web site, that site can use the the Digital Credentials API to request the desired attributes. The browser then queries the device for authentication. The device prompts the user about whether they want to reveal the requested attributes. When the user approves, they have to authenticate again in order to ensure that it's the same person as enrolled the device. Assuming the user consents, the device provides a verifiable response back to the browser. The browser provides the response back to the site, which then can verify the response and check the relevant attributes.

User Binding #

Because the device requires the user to authenticate, this system provides a measure of binding to the subject even if the site doesn't request the user's photo; only the user who enrolled the mDL is able to use it to authenticate. Note that this does not actually ensure that it's the same person that is associated with the credential, at least if the user is authenticating with TouchID, because nothing ensures that the same person provided their fingerprint as provided the mDL, so, for instance, person A could enroll their mDL on person B's phone. It seems like it ought to be technically possible for the device to match FaceID against the mDL, but based on Apple's description and the fact that they allow TouchID, I suspect it does not do so.

As with device binding, the security against user swapping depends on the security of the device. If the attacker compromises the device, they can bypass the local biometric check and authenticate as the subject of the credential whether they are the same person or not. Moreover, this assumes they aren't able to use a pass code, and Apple also appears to allow you to bypass the biometric checks entirely if you have accessibility enabled. However, in either case the fact that the iPhone hardware is closed and that Apple attests to its security is an essential feature of this design; if it weren't an attacker could extract the device key.

Privacy #

As discussed above, Apple's system attempts to preserve privacy by retrieving batches of credentials, each with its own device key, thus resisting linkage^[11] via credential reuse. This still does not prevent the issuer from linking up transactions, though it requires the relying parties cooperation (willing or otherwise) to do so.

In addition, Apple doesn't allow just anyone to request remote authentication. Apple requires relying parties to register with Apple Business Connect and getting a signing certificate that will be used to authenticate the request for remote authentication.^[12] As part of this registration, the relying party needs to document what attributes it will be requesting and why it needs them. This list will be enforced at authentication time, so that the relying party can't ask for extra attributes.

Zero-Knowledge Proofs #

It turns out to be possible to use some fancy cryptography to remove the linkability problem, by way of something called a zero-knowledge proof (ZKP). The details of how ZKPs work is way outside of the scope of this post, but the general idea is that you can use cryptography to prove that you know values with arbitrary properties.

Proving Program Output #

For the purposes of this discussion, you should think of a ZKP like this: The prover and the verifier agree on a program F (it can even be written in a conventional programming language). F is designed to run on two pieces of input:

A "public" input p known to both the prover and the verifier
A "secret" input w known only to the prover^[13]

F is designed so that given the inputs p and w it outputs either 1, indicating that p and w are valid ("accepting") or 0 ("rejecting") indicating that they are not. For example, suppose the prover claims that they know a message m such that SHA-256(m) = x. Then the program would look something like this:

function F(p, w) {
  if (SHA256(w) == p) {
    return 1;
  }
  return 0;
}

The public input (p) to F is the hash output x and the private input (w) is the secret message m. If m and x correspond, then F returns 1, and otherwise 0. It's obviously the case that the prover can run F and check the output themselves, but the verifier cannot because they don't know w, which is supposed to stay secret. The point of a zero-knowledge proof is for the verifier to convince the verifier that they ran F—or at least that they could have run F—with the output 1. In this context, the proof P is some value that the prover sends the verifier that does that. The verifier then checks P against F and p and if they all match, then the verifier is convinced that the prover knows w (in this case, the message m).

The way to think about this is that the prover wants to persuade the verifier that if the verifier were to run program F on w and p, they would get the right answer, even though the verifier didn't actually run it. So in this case F checks that the input value m matches the hash output x, but instead of letting the verifier run F, we offload the checking to the prover and the prover then convinces the verifier that it did the checking correctly. I know this all sounds like magic and you're just going to have to take my word for it—or more to the point the word of the cryptographers who really understand it—that it works.

In order to apply a ZKP system in practice, the prover and the verifier need to agree to the program F that the prover is going to run. That program can—in principle—do anything, but the verifier needs to be able to see the program to verify that it actually does what it is supposed to. Otherwise the prover could say "I'm running a program which checks the hash" but actually just run one that always returns 1. Note that part of the proof is that the prover actually ran F so they can't say they are running F and actually run F', but that doesn't help if the verifier can't actually examine F and be sure it does the right thing. Once the program is agreed upon, it gets compiled down into what's called an "arithmetic circuit",^[14] which is what the ZKP actually proves, so it's common to talk about the "circuit" that the ZKP works on, rather than the program, but they amount to the same thing.

Applying ZKPs to Digital Credentials #

Assuming we have a ZKP system that allowed us to prove correct execution of an arbitrary, program, now we have the problem of how to use that to verify a credential. As a reminder, let's go back to the skeleton of the authentication system without ZKPs, shown below:

The key point to focus on is the last line, where the site verifies the response. This is the source of the privacy problem, because it requires the site to have the credential. However, all the site really needs to know is that if it had verified the response, everything would have been fine, so we're going to use the same trick we just used above, which is to offload the job of verifying the response to the device, and instead have the device prove that it verified the response and everything was fine. This gives us the flow below:

Obviously, this has much better privacy properties because I don't actually disclose either the credential C or K_pub, so the relying parties can't link up multiple authentication transactions, even if they use the same credential (this means there's no need to issue new credentials for each transaction). Similarly, the issuer cannot link transactions.

It's important to recognize that in order to actually deploy this system, the site and the device need to agree on the program F which needs to do the job of verifying the credentials and associated signature and checking the disclosed attribute. This is a nontrivial piece of software and obviously needs to be correct. The details of how this will work may vary some between designs, but in general, there needs to be some deterministic way to go from the set of attributes that the site is interested in into the the program (circuit) that the device is going to use for the proof.

Google Wallet and ZKPs #

Google recently announced that they are going to be supporting age verification via zero-knowledge proofs, starting with a partnership with Bumble.

Given many sites and services require age verification, we wanted to develop a system that not only verifies age, but does it in a way that protects your privacy. That's why we are integrating Zero Knowledge Proof (ZKP) technology into Google Wallet, further ensuring there is no way to link the age back to your identity. This implementation allows us to provide speedy age verification across a wide range of mobile devices, apps and websites that use our Digital Credential API.

We will use ZKP where appropriate in other Google products and partner with apps like Bumble, which will use digital IDs from Google Wallet to verify user identity and ZKP to verify age. To help foster a safer, more secure environment for everyone, we will also open source our ZKP technology to other wallets and online services.

Unfortunately, this is nearly all the public information we have from Google on this topic. The only other thing they have published besides this blog post is a technical paper and corresponding implementation for a new zero-knowledge proof system called "Longfellow-ZK". This seems like interesting work, but it's only a small piece of the puzzle. The context here is that we want to leverage existing digital credential systems, but unfortunately those existing credentials are often signed with ECDSA, and for technical reasons, many existing ZKP systems struggle with proving stuff about ECDSA signatures. By contrast, the Longfellow-ZK system is able to efficiently cover ECDSA-signed credentials, and the authors show how to use it to compute proofs over those credentials.

It's clear how this is useful, but it's only a piece of the puzzle, and we don't seem to have either a complete system design or an actual protocol. What Google has not done—or at least I haven't seen—is publish the precise details of how to bind this to the Digital Credential API. In particular, we don't have:

The details of every message.
The exact structure of the circuit or an algorithm to generate the circuit.
Any mechanisms for rate limiting (see below).

Without these details it's a bit hard to say too much about how well this is going to work at scale.

Compromised Devices #

As discussed above, much of the security of a digital credentials system depends on the security of the device key. If the device key is compromised, then the attacker can use that key to impersonate the user. There are two main threat models here:

The attacker gets temporary control of the user's device and extracts the device key without their permission.
The user and the attacker collude to extract the device key.

Both of these threats are real, though our major concern is probably the second one, especially for age assurance. For a simple impersonation attack, you may want to impersonate someone in particular, but for age assurance, you just want to impersonate anyone who is over 18, then it's probably easier to use your own ID or the ID of some confederate, especially because the privacy features don't require you to disclose your own identity, just demonstrate that you're over 18.

These privacy features are also the challenge for detecting this form of attack. Naively, a relying party (RP, which is to say the verifier) could keep track of how many times a given identity was used and then investigate any identity which seemed to have excessive usage, but if you have a system like selective disclosure or zero-knowledge proofs, then things get more complicated.

The public descriptions of these kinds of systems I have seen are pretty vague about how they plan to defend against this form of attack; they mostly just seem to assume the secure element won't be broken, which isn't necessarily a safe assumption.

Selective Disclosure #

As noted above, selective disclosure systems aren't truly unlinkable: if you use the same mdoc twice, then the relying party (or parties) can link up multiple presentations. However, as noted above, a good implementation will get a fresh mdoc for each presentation. In this case, the issuing authority can still link up presentations but the relying party cannot.

However, you can take advantage of the fact that each new mdoc requires an interaction with the issuing authority. This creates a number of opportunities for detection. First, you can do some traffic analysis on the devices that ask for new mdocs, which they'll need to do so fairly often, not just for privacy reasons but also because they expire. For instance, if you see repeated queries from different IP addresses, that is potentially suspicious. Of course, whoever originally broke the credential can proxy your requests, but this makes things more complicated.

Another alternative is to rate limit presentations. The basic intuition here is that if there are N issuers and the user gets M then the attacker can only do N*M presentations before they have to use the same credential twice on one RP. This leaves two avenues for detection:

An RP noticing a lot of reuse
The issuer noticing that the user gets an excessive number of requests for mdocs.

Importantly, you don't have to detect every reuse: after all, you can always lend your phone to someone else, which isn't really detectable. Instead, the idea is to limit the number of authentications you can get out of successfully attacking a single device, thus forcing the attacker to expend the costs of enrolling multiple real identities—or stealing legitimate devices—and then breaking the devices to extract the device key. If it costs $500 (made up numbers) to break a device and each key can only be used for 5 users, this means that it needs to be worth $100 for each user who wants to circumvent age assurance.

Zero-Knowledge Proofs #

The situation with ZKPs is more complicated because the subject can create as many ZKPs as they want without having to go back to the issuer of the original credential. This means that neither of the mechanisms I described above will work in this context:

You can't rate limit at the issuing authority because you don't have to contact the issuing authority.
The ZKP doesn't include the mdoc, so you can't trivially compare multiple presentations by matching the mdocs.

There are, however, techniques for rate limiting in ZK authentication systems. The basic idea is that you define what's called a "nullifier", which is a characteristic value for the pair of subject and relying party (think Hash(device-private-key, RP-identity). When a user authenticates to an RP (or in this case proves their age), they include the RP-specific nullifier and the proof shows that it was computed correctly. If the same credential is used to authenticate to the same RP twice with the same user, the same nullifier will be used and so the RP will be able to detect reuse.

Obviously, this trivial design allows for linkage of multiple presentations, but we can set an arbitrary rate limit by including more inputs in the nullifier. Specifically, we can have:

An "epoch" value corresponding to some time window.
A counter which must be between 1 and N where N is some upper limit.

Put together, these constraints allow for N authentications per RP per epoch while remaining unlinkable. However, if someone tries to authenticate N+1 times, then they have to reuse the counter and the RP can detect that a nullifier has been reused and reject the authentication attempt.

It's also possible to extend this technique to not just reject the authentication attempt but determine which device was broken. Along with the nullifier, the authentication also includes a secret share for the user's identifier (or the device ID) which is designed so that if the counter is reused, the RP will be able to put together the shares and reconstruct the device or user identifier. Once the compromised device is detected, the issuing authority can revoke its ability to authenticate (most likely by just refusing to issue more mdocs and letting the old ones expire).

Note that the attacker can make this defense harder to mount by retrieving multiple mdocs with different device keys and providing them to the separate users, but each issuance is visible to the issuing authority, which can impose rate limits.

Again, it's not clear what Google is actually doing here; I'm just describing some avenues one could pursue.

Multiple RPs #

This whole system works a lot better if there are a small number of RPs. If there are a lot of RPs, then it becomes harder to detect reuse. You need to set the per-RP rate limit high enough that a legitimate user won't exhaust the limit during normal usage. It's likely that, at least for porn sites, a legitimate user will only use a small number of sites, but if there are a lot of porn sites, this leaves plenty of room to spread a bunch of illegitimate users across those sites. Of course, this still leaves the attacker with the problem of coordinating users so they don't accidentally overflow the limits, but it makes the detection problem harder.

I think there's a real practical question about the distribution of sites which require age assurance. Most content categories have really top-heavy distributions where the vast majority of traffic goes to a few sites (e.g., Facebook, Instagram, Twitter, TikTok, for social networking), and they aren't interchangeable, in which case just imposing rate limiting on the top site is likely to be fairly effective.

Adult sites don't have the network effects that social networking sites have, so it's possible they are more interchangeable and that users can gravitate to long-tail sites, as Dennis Jackson argues.^[15] It's hard to know in advance whether this is true, but what we can do is look at existing traffic patterns, which are similarly top-heavy:

Traffic to top porn sites — Traffic to the top porn sites

Note that some of the sites are owned by the same entities, so one would imagine they could cross-check between those sites.

This doesn't exclude the possibility that users will switch to lower-popularity sites if they have to, but it's definitely going to be a lot more work to find sites that are this unpopular compared to the big sites, and given how many of these sites consist of user-uploaded content, it seems likely there is a pretty significant dropoff in how much content there is on the smaller sites (I haven't checked!).

The Bigger Picture #

ZKP-based authentication and age assurance systems are extremely technically cool, but they're only a component in a larger system.^[16] When used properly, a ZKP system allows you to disclose/prove attribute A while not disclosing attributes A, B and C, but this doesn't mean that the RP can't learn those attributes via some other mechanism. For instance, if you connect to a server from your home without any form of IP concealment, then the server may well be able to learn who you are in any case.

In addition, as pointed out by Hancock and Collins, the RP may "overask" for attributes it doesn't really need, counting on the user not to notice that they're disclosing their name or precise age. Your client software can help defend against this by restricting the set of attributes an RP can request (Apple requires RPs to register which ones they will request and hopefully does some auditing of which ones they really need), but all of this is outside the scope of the ZK system itself. Similarly, if it's simple and easy to prove your age or other attributes, we may see a form of induced demand where you have to do so more and more often.

Finally, the proof systems themselves are very complicated and tricky to get right, especially at the current level of technological development. There have been some high profile cases where ZKPs were deployed and then found to not actually be secure in practice. This doesn't necessarily lead to a privacy problem from the user's perspective as most of the issues have instead allowed an attacker to prove something false rather than leaking the user's information. However, it's obviously still not great for deployment in practice.

With all that said, it's important to remember that the reference point here is the wide deployment of existing age assurance systems—whether of the facial age estimation or the "selfie with ID" variety—that don't conceal the user's identity from the verification service at all. From a purely technical perspective, designs based on selective disclosure or ZKPs are likely to have superior security and privacy properties compared to these existing systems.

See AAMVA DL/ID Card Design Standard 2020, Appendix B.4 for the security features. ↩︎
Encoded in PDF417 and conforming to AAMVA DL/ID Card Design Standard 2020, Appendix D. ↩︎
In some contexts, you might also want to sign the verifier's identity, but we don't have to worry about that right now. ↩︎
At least for the purpose of authentication. You still may not want everyone knowing your birthday. ↩︎
This is precisely the situation with Web server authentication: the server will give its certificate to anyone who asks, but you can't impersonate the server unless you know its private key. ↩︎
Or if their device is compromised. ↩︎
Though it's also worse in some ways, because once they get their ID card back, the younger sibling can't use it; with a digital credential they can use it indefinitely. ↩︎
The user doesn't really have to trust the manufacturer in this case, at least not more than they do for other purposes. ↩︎
Note that the device also needs to generate a fresh device key for each credential. ↩︎
See here for some discussion of the privacy properties. ↩︎
Apparently there is some case where it will reuse the credentials if it runs out and cannot contact the issuer in time. ↩︎
video at 12:20. ↩︎
The technical term for w is a "witness", hence w ↩︎
At least in many ZKP systems ↩︎
In the context of users selecting sites that do weaker or no age assurance. ↩︎
See commentary by Alexis Hancock and Paige Collins from EFF, Chatel et al., and Celi et al.. ↩︎

Educated Guesswork

Using Government IDs for Age Assurance

Government IDs in Person #

The US Situation #

Remote authentication via ID cards #

Send a scan or a photo of the ID #

Live Presentation #

From Physical IDs to Digital IDs #

Digital Signatures #

Credential Binding #

Device Binding #

Selective Disclosure #

Commitments #

Identity Binding and Selective Disclosure #

Linkability #

Blind Issuance and Cut-and-Choose #

Mobile Driver's Licenses #

Apple Digital Credentials #

User Binding #

Privacy #

Zero-Knowledge Proofs #

Proving Program Output #

Applying ZKPs to Digital Credentials #

Google Wallet and ZKPs #

Compromised Devices #

Selective Disclosure #

Zero-Knowledge Proofs #

Multiple RPs #

The Bigger Picture #