What does the NeuralHash collision mean? Not much
Posted by ekr on 19 Aug 2021
In today's Apple CSAM scanning news, it appears that Apple platforms already have a NeuralHash APIs built in and Asuhariet Ygvar (apparently a pseudonym) has reverse engineered the algorithm and built a tool to convert it to the Open Neural Network Exchange (ONNX) format. Based on that work, Cory Cornelius has constructed a pair of images with the same hash, aka a "collision". The coverage of this is kind of confusing and there seems to be a bit of a sense that this is news of a vulnerability (though note that Jonathan Mayer is quoted in the Register article making a number of the points I make below). From my perspective, this isn't surprising and doesn't really change the situation.
Threat Model #
Evasion: Perturbing an existing CSAM image so that it has a different hash from the one in the database so that you could then distribute that image undetected.
Forgery Creating an innocuous image that has a hash that's already in the database and distributing it to someone innocent so that they are flagged by the scanning system (and potentially subject to some sort of legal action).
Note that both attacks require knowing the NeuralHash algorithm, but the latter also requires knowing the hash of at least one entry in the database.
It's important to recognize that these attacks depend on opposed properties of the hash. With something like a cryptographic hash in which any change in the input changes the output with high probability the evasion attack is trivial and doesn't require knowing the details of the hash algorithm: just change a single pixel and you're done. The purpose of a perceptual hash like NeuralHash is to make it so that small changes to the input don't change the output. That's why you need to know the details of the algorithm in order to mount the evasion attack, in order to tell which perturbations actually change the hash value.
By contrast, the forgery attack depends on it being relatively easy to generate an image with a given hash value. The structure of perceptual hash functions makes this comparatively easy to do The result is that if the attacker has a hash that corresponds to an entry in the database then they can make an image that has that hash. Less obviously, it's also possible to make an image that looks nothing like the original image and still has the same hash, as shown in the example collision:
It's not obvious to me that this is a necessary property -- consider the case of a hash that's just an 8x8 bitmap of the image -- but it seems to be a property of NeuralHash and similar constructions; that's certainly what I and the other analyses I have seen have assumed. This is important because the purpose of the attack is to frame someone by sending them images on their machine that they keep around and upload to iCloud and this doesn't work if those images are obviously CSAM.
Its not clear that Apple has any countermeasures for the evasion attack. The primary one I can think of would be to have NeuralHash be secret, thus making it hard to know whether a given perturbation actually changed the hash. Apple hasn't published the details of NeuralHash, and we don't know that the version we're seeing here is the final version, but unless they take some real efforts to conceal it -- which, again, would undercut the verifiability claims they have been making -- then we should assume that it will eventually become known to attackers.
This isn't an ideal property, but the whole design of the current system assumes that there aren't any real attempts at evasion. After all, Apple only scans images that are uploaded to iCloud, if people don't want to be detected all they have to do is turn off photo sharing to iCloud, so evasion is fairly straightforward.
Apple's system includes three countermeasures against forgery attacks (and false positives):
The hash database itself is secret (blinded with a key known to Apple).
They screen potential CSAM images using a second perceptual hash and only forward those which match for human review (this was not initially announced but published last week).
They do human review to see if images are actually CSAM.
The first countermeasure is intended to prevent attackers from knowing which hashes they should be targeting, as the vast majority of hashes will not be in the database. Note, however, that if an attacker knows a piece of CSAM that is in the database, they can compute the hash themselves if the know the NeuralHash algorithm, so we should expect that at least some of the hashes will get out.
The second countermeasure seems like a good idea, but I'm not sure how robust it's going to turn out to be. In order for it to work, we need the secondary hash outputs to be independently distributed from the the on-device hash, in the sense that two different images which have the same NeuralHash value are unlikely to have the same value in the secondary hash. I don't know enough about the design of Apple's secondary hash to know if this is true. My wild guess would be that it has a similar structure to NeuralHash but just uses different features. In any case, it would increase confidence in this process for Apple to publish statistics about the overlap between these two hashes, even if they can't publish the details (which they can't because this countermeasure requires the secondary hash to be secret).
The human review is obviously the final backstop against forgery attacks. This probably does a pretty good job of preventing false reports to law enforcement, but it's not going to be great if there needs to be a huge amount of human review.
One more thing... #
Note: Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.
This actually seems like a minor operational problem: the CSAM scanning system depends on the NeuralHash matching exactly, so either Apple will need to make the API produce consistent results or insert all the potential results into the database.
As I said above, the only really surprising thing here is that a version of NeuralHash is already out there in Apple devices. Both the evasion and forgery attacks are pretty obvious and Apple has some -- albeit imperfect -- countermeasures in place, so I don't think this materially changes the situation.
It's easy to see that it's just high probability and not certainty because the number of possible inputs is much bigger than the number of hash values, and so there must be at least two inputs with the same hash value. ↩︎
"Relatively" here means with a complexity significantly less than 2b-1 where b is the length of the hash in bits. In this case, the hash seems to be 96 bits, so much less than 296, which is an impractically large number of computations. ↩︎
As I understand it, the intuition here is that these hashes are designed so that similar images have similar hashes (a low Hamming distance.), but this means that you can use optimization algorithms to find your way from one hash to another by making changes that progressively move you closer to the hash you want. I'm not sure if it's possible to design a perceptual hash without this feature. ↩︎
Note that this is not the same as the hashes being different. For instance, it's easy to design two hashes H1 and H2 where the hashes tend to be different, just by doing H2 = SHA-1(H1). This wouldn't solve the problem here, because hash collisions in H1 would still be collisions in H2. ↩︎