Privacy for license plates

Posted by ekr on 28 Nov 2021

Here at EG we spend a lot of time on privacy and obviously one of the big concerns is avoiding people tracking you, whether in person or on the Internet. From that perspective, I've always found license plates kind of anomalous. If it was illegal to leave your house without wearing a label with your social security number printed on it, we'd all recognize this as privacy-invasive--heck, I get upset when I have to show my ID at the airport--but for some reason when the identifier is bolted to your car people think that's fine.

I suspect a lot of what we're seeing here is just status quo bias: license plates have been around for a long time and people are used to them. But it's also true that there has been a not-that-well-publicized change in how easy license plate-based tracking is due to ubiquitous deployment of Automatic Number-Plate Recognition (ANPR) technologies--essentially cameras which record license plate numbers. For instance, in 2016, London had 1666 ANPR cameras deployed and I expect that there are more now. The result is that this enables an enormous amount of driver surveillance.^[1] A natural question here is whether we can build something that fulfills the legitimate purposes of license plates while having better privacy properties.

Most of the following is by way of a thought experiment: I don't actually expect license plates to be replaced, but it's useful practice in designing this kind of system to think about how the privacy properties of the system and how to improve them in a system this constrained.

Requirements/Constraints #

It seems that we have two primary functional requirements for license plates:

Identifying vehicles which are of interest for some reason. For instance, we might observe a vehicle committing some kind of violation and use the license plate to track down the owner.
Tracking vehicles which have been previously identified. For instance, a vehicle might have been stolen and we want to find it, or a suspect might be driving a given vehicle and we want to be notified if it appears.

On the privacy side, we want it to be difficult to use them for mass surveillance. Specifically:

It should not be surreptitiously possible to determine a person's identity from their license plate. In generally, the public should not be able to do this at all, and law enforcement should require some auditable process.
It should not be possible to use the license plate to follow arbitrary vehicles for an extended period of time^[2] However, we do want it to be possible to track specific vehicles of interest (e.g., driven by a suspect). Here too, the process for tracking specific vehicles should be auditable.

We can't significantly change the form factor of the license plate: it has to fit in roughly the same location it does now, be readable with the naked eye, and have a short enough number (say <12 digits) that a human can read and remember it. On the other hand, we aren't committed to it beng some kind of metal plate. That's good because any static identifier is going to have bad tracking properties. Realistically our new license plate is going to need to change and so will need to be some kind of smart screen, but I think it does have to work without being online all the time which eliminates some designs.

Designs #

Instead of presenting a finished design, I'm going to work my way up to it, starting by designing something that won't really work and then refining it into something that might. This helps get at the key ideas but also is a useful demonstration of how to work through a problem like this and the tradeoffs you have to make.

An Infeasible Design #

Let's start by relaxing the constraint that the plates have to be consumable by humans. This gives us a system that's a lot easier to design and let's us work out some of the problems; then we can look at re-adding that constraint. We start by taking two shortcuts:

Identifiers which are infeasibly long.
Identifiers which can be interpreted by machines rather than humans.

We start by assigning each vehicle $i$ a unique identifier $I_i$ which is associated with the vehicle registration. $I_i$ is never displayed on the vehicle, however. Instead, every so often (every minute?) a vehicle's plate changes with vehicle $i$'s current license plate at time $t$ denoted as $P(i, t)$.

As a first attempt, we can just use public key encryption. Each jurisdiction $j$ (e.g., California) has an asymmetric key pair $(K_j^{pub}, K_j^{priv})$ and the current plate number is the encryption of $I_i$.^[3] I.e.,

$$P(i, t) = E(K_j^{pub}, I_i || t)$$

This seems like it meets the rest of our requirements: if you don't know $K_j^{priv}$ it's not possible to determine $I_i$ and you can't link up $P(i, t)$ and $P(i, t')$. On the other hand, law enforcement can use the private key to decrypt any given plate and then can determine $I_i$, thus identifying vehicles and tracking them as necessary. Auditability comes from restricting access to the private key and we can put as much ceremony around that access as we want. However, the problem is that this works badly for tracking. Every time you want to determine if a new vehicle is the same as an old one you need to use the private key, which means that it can't be that onerous to do the decryption, thus making auditability more difficult.

We can, however, improve the situation fairly easily by making the plate generation algorithm somewhat more complicated. Instead of just having it be the public key encryption of the identifier, we add a pseudorandom value unique for each vehicle. To make this work, each vehicle $i$ creates a random key $L_i$. This value is not known to the authorities. It then generates its plate number using the following three values:

The encryption of the identifier $I_i$
The encryption of the linkage key $L_i$
A pseudorandom value based on $L_i$

I.e.,

$$P(i, t) = [E(K_j^{pub}, I_i || t), E(K_j^{pub}, L_i), PRF(L_i, t)]$$

In this case, the values are all fixed length so you can just concatenate them and the receiver can sort it out. If they were variable length, you would need some kind of separator or length prefixed encoding or something.

The way this gets used in practice is that when you see a new plate number you want to identify, you decrypt $P(i, t)$ to recover $I_i$ as before. This is only sufficient to identify the vehicle. However, if you also want to track it, you also decrypt $L_i$, which you can use to predict the last piece of $P(i, t)$ (i.e., $PRF(L_i, t)$ just by computing the pseudorandom function. This wasn't possible before because $L_i$ was secret to the vehicle, but once you know $L_i$ it's straightforward. This allows you to separate the functions of identification from tracking, but you only need to use the private key at most twice per vehicle (once to identity it and once to track it) which allows for tight audit control.^[4] Once you have recovered $L_i$ you can then track the vehicle indefinitely by predicting the PRF for time $t$.

This design is cryptographically straightforward and has the desired privacy and functional properties. The only problem is that it's not usable by humans: it requires identifiers which are too large to memorize or transcribe and you need a computer to predict the future identifiers. Maybe when we're all wearing Apple's AR headset, it can automatically process these smart plates, but let's see if we can do better in the meantime.

A more usable design #

If we're going to get down to human scale identifiers, we're going to need to jettison sending a public key encrypted value because it inherently requires values that are too long for humans to memorize. This limits the design space quite a bit because we have to be able to generate a sequence $P(i, t)$ that has the following properties:

It's known to the vehicle (user)
It can be generated by the authorities once the vehicle is identified.
It cannot be generated by third parties.

One obvious thing to do is to just replace the asymmetric key pair with a symmetric key and use format preserving encryption to keep the ciphertext small. The problem is that the encryption key then needs to be known by every user or at least every smart license plate, and so it's possible to extract the key and track other users, thus violating property (3). I don't know of any design that involved just encrypting $I_i$ or any derivative. Instead the best I can do requires the authorities to compute each candidate $P(i, t)$. For instance, suppose we say:

$$P(i, t) = Truncate(PRF(L_i, t))$$

Assuming that the authorities know all values of $L_i$ (note that this is a change from the previous design where they do not), they can just compute all potential values of $P(i, t)$ for any time window and look up the value of interest. This is the kind of computation that would ordinarily be impractical on a cryptographic scale, but any jurisdiction will probably have only a few million vehicles (California has about 30 million registered cars), and doing $10^8$ PRF executions is quite cheap. This is especially true because the time windows need to be much longer in order for this to be useful. For instance, if we want to tell people "be on the lookout for plate number 12345" that number probably has to be valid for at least a day or two.

It's a little unfortunate to have the authorities exhaustively search all $P(i, t)$ values, both because it's expensive and because it requires the $L_i$ database to be online. However, we can do better by precomputation. The way this works is that for each time window $t$ you precompute the encryption table from our previous design. I.e.,

$$[E(K_j^{pub}, I_i || t), E(K_j^{pub}, L_i), P(i, t)]$$

for each vehicle and then you store the table (this is on the order of a few gigabytes of data a day). You then use the transmitted $P(i, t)$ value to look up the right database entry and then $K_j^{priv}$ to decrypt $I_i$. As before, you can now identify the vehicle. If you want to track the vehicle, you also decrypt $L_i$. From then you can run the PRF forward to compute the sequence of values.

It's important that after the authorities build the table they discard the $L_i$ values and then shuffle the table so^[5] it's not possible to know which entries correspond to each other across time windows. If you do this, then the table itself isn't enough to link up multiple plate values^[6]. Instead, you need the private key, which gives us the same properties as before.

Taking a step back, this is almost the same as our pure public key system, except that we've replaced public key encryption on the vehicle with precomputed public key encryption by the authorities and now use the (truncated) pseudorandom sequence as a lookup key.

Collisions #

One somewhat unfortunate property of this design is that it is susceptible to collisions. If we generate a large number of random values out of a smallish space, we will get repeats (see birthday paradox). If you have $V$ vehicles, you need more than $V^2$ possible plates to keep the chance of collisions low. In a state like California, this means something like $2^{60}$ possible value. Each digit of the plate encodes about 5 bits so that means 12 digits, which is far more than any current plate scheme.^[7]

It's possible to do quite a bit better by permuting rather than computing a pseudo random value. For instance, we could assign each vehicle a short identifier $S_i$ and then compute

$$P(i, t) = Encrypt(K_{permute}, S_i)$$

where Encrypt is some sort of format preserving encryption scheme. This will provide unique values with a much smaller number space, but at a cost. For the reasons mentioned above, the vehicle can't know $K_{permute}$, so the authorities need to precompute all $P(i, t)$ values and distribute them to each vehicle. This has a bunch of logistical difficulties (e.g., $K_{permute}$ needs to be online in order to issue the plates).^[8] Moreover, it seems likely that the system needs to work with only partial plates, in which case you'll get collisions in the plate values anyway.^[9] Anyway, this is not clearly better, but instead reflects the kind of design tradeoff that you have to make when building something under these constrained circumstances.

Attacks #

It's worth noting that this system is obviously subject to a variety of attacks. In particular, nothing stops me from making a fake smart plate that shows random identifiers. However, not much stops me from doing that now, either with the low tech way of having multiple real plates that I swap (ever see The Transporter?) or a smart plate that looks like a real plate; I'm pretty sure I could make one of these that looks pretty real, especially behind a license plate holder. The basic premise here is that people aren't working too hard to cheat the system.

Summary #

As I said at the beginning, I'm under no illusions that we're going to replace our current license plate system with a system of smart plates. However, it's still important to look at the systems we've built and see how/whether they can be used and how they can be improved. In this case, we have a system which originally had so-so privacy properties and due to modern technology in the form of ANPR now has quite bad ones. When designing new systems we need to be careful not to reproduce this situation in the future.

Update: 2021-11-29: Added some clarifying material around the initial construction and also fixed some holdover text where I assumed that the plate had only two components, not three.

This pattern, in which technology turns a theoretical privacy violation into a practical one, has become quite common. See also DNA-based forensics and facial recognition. ↩︎
We would typically formalize this as saying that it is not possible to do significantly better than guessing in distinguishing seeing the same vehicle twice from seeing two similar vehicles. ↩︎
It's best if you have some kind of randomized encryption like HPKE so that attacker's who somehow learns $I_i$ can't trial encrypt. ↩︎
You actually probably want to have two key pairs so that you can separately audit their use and to prevent attacks where decrypting $L_i$ is confused with decrypting $I_i$. ↩︎
For instance by sorting them by the encrypted $I_i$ values. ↩︎
More nerdsniping, there's probably some multiparty computation way of constructing the table so that it's never linkable. ↩︎
Yes, I know that plate numbers are structured, which reduces the space further. ↩︎
We can still precompute the table I mentioned in the previous section, as that provides better privacy on the lookup side. ↩︎
Though we could probably use some of the digits we've saved to add some sort of error correcting code. ↩︎

Educated Guesswork