Perceptual versus cryptographic hashes for CSAM scanning
Posted by ekr on 24 Aug 2021
As I discussed earlier there has been a lot of talk about collisions in the NeuralHash perceptual hash used for CSAM detection. While I don't think these collisions are necessarily that serious and Apple has proposed some countermeasures for dealing with them, it's worth asking whether this is the best design.
To recap: a cryptographic hash such as SHA-256 is designed to make it prohibitively expensive to create two inputs with the same hash output (a collision).[1] However, for the same reason that it's hard to find a collision, it's also trivial to create two inputs that are very perceptually similar, but have different hashes (in general, a change of a single bit will do it). Importantly, you don't need to know anything about the internal structure of the hash algorithm to do this, it's just a basic property of cryptographic hashes. The result of this is that if you have a CSAM detection system that's based on checking against a list of cryptographic hashes of those images, it's easy for an attacker to alter a given CSAM image without changing the image in any meaningful way, e.g., by changing the color of a single pixel slightly.
Perceptual hashes attempt to address this issue by trading off increased ease of forgery for decreased ease of evasion. They're designed so that similar-looking images have the same hash, which means that it's much harder to alter a given image to look the same but still have a different hash (if you don't know the algorithm). The price of this is that it's also much easier to alter a given non-CSAM image to have a given hash (but only if you do know the algorithm). This tradeoff makes sense when you realize that in conventional systems such as Bing, Gmail, Facebook, etc. the hashing is done on the server side and so the algorithm (usually PhotoDNA) can be kept secret. However, the way that Apple's system works requires NeuralHash to be run on the client, which means that -- as we have seen -- it's inherently at much higher risk of exposure. However, once the hash is publicly known, this changes the situation significantly and it becomes trivial for an attacker to either:
- Alter an image so it has a different hash in order to evade detection.
- Create an innocuous image with a hash that's in the database (assuming they already know such a hash) in order to frame someone else.
It seems like there are two main classes of modified images that a perceptual hash can detect that a cryptographic hash does not:
-
Those which have been altered for some non-adversarial purpose (e.g., cropped)
-
Those which have been altered for the purpose of evasion
Apple's system will of course catch the first type of modified image, but because it's relatively straightforward to create an altered image which will evade NeuralHash, it's not clear how effective it will be at detecting the second type. As noted above, it's not going to be effective against people who specifically altered the images to evade NeuralHash, but that doesn't mean it won't be effective at all. For instance, there might have been images which were altered to evade some other hash algorithm or Apple could periodically modify NeuralHash. This isn't something that they can do that often, but when they do, it would presumably sweep up a number of images which had been altered to evade the previous version.
With that said, it's not clear how much alteration for evasion there is really going to be. In general, it's important to note that the whole system as currently designed is quite easy to evade: just don't upload your images to iCloud. Admittedly, the people doing the image construction might be sophisticated, thus allowing the people they send the images to to evade detection even if they aren't careful enough not to use iCloud, but it also seems like the word not to use iCloud is likely to get out pretty fast.
Another way to get at that question is to ask what happens now. Specifically: what fraction of images that are flagged by PhotoDNA or similar systems are bit-for-bit identical to the original image? If this number is very high -- in an environment where evasion is quite a bit harder -- then it suggests that there isn't a lot of alteration, whether adversarial or not (though of course it might also be the case that the perceptual hash is so good that it's not worth trying to evade; perhaps looking at a historical baseline from before the perceptual hash was rolled out would help). In any case, if there aren't a lot of altered images, then it might be worth reconsidering a cryptographic hash, which would have effectively no risk of forgery[2] thus making a bunch of Apple's back-end machinery (the second hash and the visual inspection) unnecessary.
I don't know if there's any public data on this -- I don't have any -- but it seems like it might be useful input to this kind of design question.
Technical note: The jargon here is that finding two inputs of any type that have the same hash is called a collision. Finding a second input that has the same hash as an existing input is called a second preimage and finding an input that has a given hash without knowing the message is called a first preimage. For obvious reasons, the difficulty goes first preimage > second preimage > collision. ↩︎
Greg Maxwell suggests that it might be possible to create a sort-of-perceptual hash with a low risk of forgery but also some resistance to evasion, but the design he proposes sounds pretty evasion-friendly, so it's not clear how useful that is here. ↩︎