Educated Guesswork

A look at the Dutch vaccine passport system

Most of the widely deployed vaccine passport systems (New York, California, EU, New Zealand) are signed attestations to a person's name and vaccination/COVID test status. These have non-ideal privacy properties because it's possible for the relying party (the person checking the passport) to use the credential to track the user. As I discussed earlier, it seems to be quite difficult to significantly improve privacy here, so I was very interested to learn about the Dutch CoronaCheck CoronaCheck system, which has privacy as an explicit part of the design.

Note: I've not been able to find a complete specification of the system. This description is based on the documents found here, which provide a broad overview but not enough to implement the system, and some examination of the issuer code. It's especially hard to tell what is actually deployed. With that said, here is what seems to be going on:.

Basic Design #

As with all the other systems, the basic unit of the system is a signed credential. However, this credential has two main differences from what I've seen before:

  1. It contains far less identity information.

  2. It is signed with a special cryptographic algorithm that provides unlinkability.

Let's look at each of these pieces in turn.

Identity Minimization #

The first piece is essentially straightforward. A typical vaccine passport contains full identifying information for the subject, such as the full name and their birthday, though I believe that the Israeli ones contain a national ID number. This information can then be compared with some biometric identification (e.g., a driver's license) to physically authenticate the person. The Dutch version just contains the person's initials and their birth month and day. This superficially seems like a privacy improvement, but I'm not sure how much it really is.

The basic problem is that the system is only k-anonymous. It's a bit difficult to precisely determine the number of bits of information here, but we can approximate it as follows:

  • There are 12 birth months: 3.5 bits ($log_2(12)$)
  • There are ~30 birth days: 5 bits ($log_2(30)$)
  • There are 26 letters for each initial, but they're not evenly distributed, so let's say 4 bits each: 8 bits

This gives the relying party 16.5 bits of entropy, dividing the population into about 100,000 groups. The population of the Netherlands is about 18 million, so this gives us an anonymity set of around 200. Moreover, when combined with side information like apparent age and gender, the anonymity set becomes a lot smaller. Also, as I noted earlier, it's made worse by the fact that people's behavior isn't random. For instance, if we have four authentications for the initials ER within an hour with two at outdoor stores in Mountain View and two in bars in Los Angeles, it's likely that the first two are one person and the second two are another. This kind of constraint solving problem is something computers are very good at; you might not get a complete record of someone's behavior, but you'll learn a lot.

Digital Signatures #

Of course, minimizing the data in the passport doesn't prevent tracking if you show the same passport every time. The problem here isn't the data in the passport, which we'll assume is k-anonymous as described in the previous section, but the signature, which is high entropy[1] and therefore unique. In my earlier post I described a brute force way to address this in which the user gets a big pile of tokens each with separate signatures. The Dutch system instead uses a special digital signature scheme (Camenisch-Lysyanskaya Signatures). The details are beyond the scope of this post, but the basic idea is that the credential issuer performs a single signature which the subject can then use to prove the validity of their credential to a relying party without revealing the signature itself. Each proof is based on unique random data and so can't be linked to a subsequent proof.

I know that language was a bit technical, but it's enough for our purposes to think of this as a system in which the signer makes one signature and the subject gets to make as many equivalent but distinct and unlinkable signatures as it wants. This is equivalent but a lot more efficient to the "pile of tokens" approach (though the Dutch system also uses a pile of tokens for a different reason).

Remember, you have to show ID #

These are all understandable design choices, but it's not clear to me how they help. The problem, as I noted previously, is that the vaccine passport isn't a standalone form of proof but rather is embedded in a system in which you have to show identification to bind the credential to you. Even though the credential only has your initials and partial birthday, the other form of identification contains your full name, your picture, and (probably) your birthday, which means that the relying party has those.

It's true that they relying party that has to scan the vaccine passport doesn't have to scan that form of identification--though they might anyway--but even if they don't, your privacy now depends essentially on them not being able to remember and record any information from it. For instance, if they just record your birth year, your gender, and your first name, that's probably enough to uniquely identify most people.[2] It's not clear what prevents this form of attack, though it's probably more challenging in high throughput areas where the verifier would have less of an opportunity to record the data.

Note that we're also assuming an incredibly weak threat model here when we restrict ourselves to people's memories. Just because the verifier isn't obviously scanning your ID doesn't mean they aren't surreptitiously doing so. It's not at all difficult to conceal a small camera in whatever location the verification happens and show the ID to that camera for recording. Of course, at this point the vaccine passport isn't needed for tracking at all, because the identification isn't enough, but then why go to all the trouble to make the vaccine passport quasi-anonymous?

Concealing Health Status #

Even if we give up on preventing tracking, typical credentials still leak a fair amount of information. For instance the California credential contains:

  • The vaccine type (I think)
  • The lot number
  • Where it was performed
  • The date of injection

This can of course be used for tracking (see above) but it also might be something that the subject doesn't want people to know. For instance, the designers of the Dutch system argue that the credential shouldn't distinguish between various forms of "safety" (e.g., a negative test, recovery from COVID, or vaccination).

You could just remove all this information--as the NZ system does--and have the semantics of the credential be "this person is OK", but this presents the problem that different kinds of credentials should be acceptable for different periods. For instance, in the Dutch system they want a negative test to be usable for 40 hours, vaccination for 365 days, and recovery for 180 days). But if you just have a credential with a fixed validity period from the initiating event, this leaks both the type of the event and the time it happened (see my writeup on the New Zealand system for some of the problems with that). The Dutch system deals with this by providing the subject with multiple credential "strips", each of which is only good for 24 hours. Strips are issued for 28 days at a time--obviously fewer in the case of a test--and the subject just presents the currently valid strip (with a randomized signature, as described above) when they need to authenticate.[3]

This design is sort of a compromise in that it doesn't require the subject to be online all the time, but they do need to be online somewhat regularly in order to get a new set of strips. It's also not really compatible with people who print out their credentials. In that case, the strips are just valid for 4 weeks for vaccination/recovery and 40 hours for negative tests (which leaks whether this is a test or not), which reduces the load some. Even so, printing out a new strip every 28 days sounds like kind of a pain.

One obvious problem--as with the NZ design--is flexibility. What happens if you issue a bunch of 28-day strips on day 1 and then on day 5 you discover that it's necessary to treat different vaccination status differently? This isn't a hypothetical scenario, given that it seems that the various vaccines may provide different levels of protection against Omicron and even with a vaccine family there is probably a lot of difference between people who received two doses and those who have been boosted, as seen with Pfizer. In this case, you might want to start treating boosted people differently, but that's a problem if the credentials are good for 28 days. The Dutch system does have a way of dealing with this, which is effectively to invalidate all credentials (by incrementing the minimum version number field), but obviously this is going to cause a lot of disruption, especially for those who have printed out credentials which will suddenly become invalid.

Another problem is that there are probably settings even now in which you would want to distinguish between different credential types. For instance, in case of a close contact, the Palo Alto schools require that students show two negative COVID tests (at day 1 and 5), but if you just have a credential that indicates that the subject had either a vaccination or a negative test without telling you which kind. there is no way to use it to fulfill this requirement. This seems like a pretty common scenario and one that's difficult to fulfill with any kind of system in which the "what is acceptable" logic is central--and uniform--and verifiers just get a yes/no answer.

Note that it is possible to do better here: you can build credential systems in which the subject proves not only that they have a valid credential but can prove specific properties attested to in that credential without revealing the whole thing. For instance, you might imagine a system in which the credential contained all the information found in a typical system but where you only disclosed the minimum amount of information required for a given scenario (e.g., that you had a booster over two weeks ago). That would allow you to have the logic for the system in the verifier but still limit disclosure of information. It's true that these systems typically involve some fancy crypto (zero-knowledge proofs) but it's reasonably well understood and this system already is using a lot of crypto and indeed Camenisch-Lysyanskaya signatures are often used in precisely this kind of application so it's not clear to me why this design doesn't do that.

Summary #

I am glad to see an attempt to do something new here rather than just another trivial variant of the "signed credential" design, and it does suggest that there might be some room to improve the privacy of vaccine credentials. With that said, I'm kind of skeptical of the particular design choices. In particular, I don't think it's that useful to try to conceal the subject's identity given that the subject has to identify themselves in order to use the passport. It's possible that it's useful to conceal the details of what the credential is attesting to (vaccination, test, etc.) but the strip mechanism seems kind of clunky and inflexible, so I'm not sure that's the right design either. I know I'm repeating myself, but it would be a lot better if instead of everyone inventing their own thing we had some kind of multistakeholder effort which would get to clear requirements and then try to converge on a single design which did a good job of meeting those.

  1. This might not be immediately obvious, but if it weren't high entropy it would be trivial to forge by just generating candidate signatures and seeing if they verify. ↩︎

  2. See Latanya Sweeney's Simple Demographics Often Identify People Uniquely for more on this. For instance, she reports that "It was found that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}." ↩︎

  3. There is also some fancy randomization to prevent a test credential from revealing the time of the test, though this kind of seems like overengineering to me. The 40 hour number seems pretty arbitrary, so you could just have the last strip expire at the first midnight that was at least 40 hours after the test. ↩︎

Keep Reading