Why getting voting right is hard, Part III: Optical Scan
Posted by ekr on 05 Jan 2021
This post originally appeared on the Mozilla Blog
This is the third post in my series on voting systems. For background see part I. As described in part II hand-counted paper ballots.have a number of attractive security and privacy properties but scale badly to large elections. Fortunately, we can count paper ballots efficiently using optical scanners (opscan). This will be familiar to anyone who has taken paper-based standardized tests: instead of just checking a box, next to each choice there is a region (typically an oval) to fill in, as shown in the examples below These ballots can then be machine read using an optical scanner which reports the result totals.
Optical scan systems come in two basic flavors: "precinct count" and "central count". In a precinct count system, the optical scanner is located at the precinct (or polling place) and the voters can feed their ballots directly into it. Sometimes the scanner will be mounted on a ballot box which catches the ballots after they are scanned. When the polls close, the scanner produces a total count, typically recorded on a memory card, printed on a paper receipt, or both. These can be sent back to election headquarters, together with the ballots, where the are be aggregated.
In a central count system, the optical scanner is located at election headquarters. These scanners are typically quite a bit larger and faster. Ballots are collected at the precinct and then sent back there for counting. Some scanners are self-contained units that do all the tabulating and some just connect to software on a commodity computer which does a lot of the work, but of course this is all invisible to the voter. It's of course possible to have scanners at both the precinct and election central -- this could help detect tampering with the ballots in transit -- but I'm not aware of any jurisdiction which does that.
Because optical scan ballots are just paper ballots counted via a different method, the voter experience is basically the same, both in good ways (secrecy of the ballot, easy scaling at the polling place) and in bad ways (accessibility). In fact, in case of equipment breakdown or concerns about fraud you can just hand count the ballots without negatively impacting the voter experience (or in fact without voters noticing). The two important ways in which optical scanning differs from hand counting is (1) it's much faster (2) it's less verifiable.
Speed and Scalability #
The big advantage of optical scanning is that it's more efficient than hand counting. A hand counting team can process on the order of 6-15 contests per minute. This is much slower than even the slowest optical scanners: To pick a vendor whose technical specs were easy to find, ES&S sells central count scanners that count ballots from 72 to 300 double sided ballots per minute, depending on the model. This is quite an improvement over hand counting when we consider that each ballot will likely have several contests. As an example, the first sheet of a recent Santa Clara sample ballot has 3 contests on one side and 4 on the other, so we're talking about being able to count about 2000 contests a minute on the high end.
Precinct count scanners typically aren't particularly fast; they're comparable to typical consumer-grade scanning hardware and just need to be fast enough that they mostly keep up with the rate at which voters fill in their ballots. Even low-end desktop scanners can scan 10s of pages a minute, so it's not generally a problem to have one or two scanners handling even a modest sized precinct, given that it typically takes voters more than a minute to fill in their ballot and that you can't check-in more than a few voters a minute. Additionally, because voters scan their ballots as they vote, you get results as soon as the polls close without having to have extra staff to count the ballots; the poll workers just need to supervise the scanning process (as well as the rest of the tasks they would have to do with hand-counted ballots such as maintain custody of the materials, check-in voters, etc.).
Optical scanning is also a lot cheaper. In the Washington recount studied by Pew of optical scanning was $290,000 as opposed to $900,000 for the hand count. This is actually an underestimate of the advantage of optical scanning because, as noted above, that was just the cost to hand count a single contest, whereas the scanning process counts multiple contests at once.
Security and Verifiability #
Optical scanning introduces a new security threat: the scanner is a computer and computers can be compromised. If compromised, the computer can produce any answer the attacker wants, which is obviously an undesirable property, but one we take the risk of whenever we put computers in the critical path of the voting process. This isn't just a theoretical risk: there have been numerous studies of the security of voting machines and in general the results are extremely discouraging: in past studies, if an attacker is able to get physical access to a machine, they were usually able to compromise the software.[1]. Most of the work here was done in the early 2000s, so it's possible that things have improved, but the available evidence suggests otherwise. Moreover, there are limits to how good a job it seems possible to do here, which I hope to get to in a future post.
The impact of an attack depends on the machine type. In the case of precinct-count machines, this means that voters might be able to attack the machines in their precinct, and potentially through them the entire jurisdiction[2]. This is a somewhat difficult attack to mount because you need unsupervised access to the machine for long enough to mount the attack. It's not uncommon for these devices to have some sort of management port (you need some way to load the ballot definitions for each election, update the software, etc.) though how accessible that is to voters depends on the device and how it's deployed in practice.
In the case of central count machines, attack might be limited to voting officials, but as noted in Part I, it's important that a voting system be immune even to this kind of insider attack. Precinct count machines are susceptible to insider attack too: anyone who has access to the warehouse where the machines are stored could potentially tamper with them. In addition it's not uncommon for voting machines to be stored overnight at polling places before the election, where you're mostly relying on whatever lock the church or school or whatever has on its doors.
The general consensus in the voting security community is that our goal should be what's called software independence. Rivest and Wack describe this as follows:
A voting system is software-independent if an undetected change or error in its software cannot cause an undetectable change or error in an election outcome.
What this means in practice is that if you are going to use optical scan voting then you need some way to verify that the scanner is counting the votes correctly. Fortunately, once you've scanned the ballots, you still have them available to you, with the exception of any which have been folded, spindled or mutilated by the scanner. This means you can do as much double checking as you want.
Naively, of course, you could just recount the ballots by hand. This often happens in close races, but obviously doing it all the time would obviate the point of using optical scanners. What's needed is some way to check the scanner without counting every ballot by hand. What's emerging as the consensus approach here is what's called a Risk Limiting Audit. I'll cover this in more detail later, but the basic idea is that you randomly sample ballots and hand count them. You can then use statistics to estimate the chance that the election was decided incorrectly. You keep counting until you either (1) have high confidence that the election was counted correctly or (2) you have counted all the ballots by hand.[3]
In really close races, you basically have to do a full recount by hand. The reason for this isn't so much that the machines might have been tampered with but that they might have made mistakes. Even the best optical scanners sometimes mis-scan and it's not reasonable to expect them to do a good job with the kind of ambiguous ballots that you see in the wild. Ideally, of course, the scanner would kick those ballots back for manual processing, but you don't want to kick back too many and so there's ambiguity about which ballots are ambiguous and so on. In most elections this stuff doesn't matter, but in a really close one it does, and so if you're working with hand-marked ballots there eventually comes a point where you need to fall back to hand counting. The main value of optical scanning is to reduce the need for routine hand-counting when elections aren't close, which is fortunately most of the time.
Write-Ins, Scanning Errors, Overvotes, and Other Edge Cases #
Of course, unlike humans, optical scanners aren't very smart -- and for security reasons, you don't really want them doing smart stuff -- so there are a number of situations that they handle badly.
For instance, it's common to allow "write-in" votes in which the candidate's name does not appear on the ballot but instead the voter writes in a new name. Write-in candidates don't usually win -- although Lisa Murkowski famously won as a write-in candidate in 2010 -- but you still need to process their ballots. As shown in the example at the top, the natural way to handle this is to have a choice for each contest which has a blank name: the voter fills in the bubble associated with the space and then writes the name in the space.[4]
It's also common to have ballots which can't be read for one reason or another. For instance, the voter might have used the wrong color pen or not completely marked the bubble. Voters also sometimes for more than one candidate in a given election ("overvoting"). The general way to handle these cases is to have the machine reject these ballots and set them aside for further processing by hand.[5] If the number of rejected ballots is less than the margin of victory then you know that it can't affect the result and while you do eventually want to process them for complete results, you don't need to for purposes of determining the winner. If there are more rejected ballots than the margin of victory you of course need to process them immediately, but as rejected ballots are typically a small fraction of the total this is much more feasible than a full hand count.
There are of course some edge cases that optical scanners aren't able to even reject reliably. A good example here is "undervoting" in which a voter doesn't vote in certain contests. This could be a sign of marking error or it could be intentional; it's actually quite common in for voters in the US to just vote the presidential contest and then skip the downballot races. Because this is common, you don't really want the scanner rejecting all undervoted ballots. Instead you keep a tally of the number of undervotes in a given contest and if it's large enough to potentially affect the election you can go back and hand count the whole election.
It's important to understand that a risk limiting audit ensures that none of these anomalies can affect the election result, so at some level it doesn't matter how the scanner handles them; it's just a matter of setting the right tradeoff in terms of efficiency between the automated and manual counting stages. However, if -- as is far too common -- you are not doing a risk limiting audit, it's important to be fairly conservative about having the scanner note ambiguous cases rather than arbitrarily deciding them for one candidate or another.
Up Next: Vote By Mail #
So far in this series I've talked about paper ballots as if they are cast at the polling place, but that doesn't have to be the case. They can just as easily be sent to voters who return them by mail. Depending on the situation this is referred to as "vote by mail" (VBM) or "absentee ballots". VBM brings some special challenges which I'll be covering in my next post.
See, for instance the reports of the 2007 Californa Top-to-Bottom Review. ↩︎
A number of studies have found "viral" attacks in which you compromised one machine and then used that to attack the election management systems, which were then used to infect all the machines in the jurisdiction. ↩︎
You might be wondering if this is really the best we can do. RLAs are the best known method that is totally software independent, but if you're willing to rely on your own software that is independent of the voting machine software, then one option would be arrange to video-record the ballots during counting and then use computer vision techniques to independently do a recount. I collaborated on a system to do this about 10 years back. It worked reasonably well -- and would surely work far better with modern computer vision echniques -- but never got much interest. ↩︎
Actually, the whole idea of having pre-printed ballots is less universal than many Americans think. The Wikipedia article on the so-called Australian Ballot makes fascinating reading. ↩︎
One advantage of precinct-level counting is that you can detect this kind of error and give the voter an opportunity to correct it. ↩︎