Surprise, blockchains won't fix Internet voting
Our democracy shouldn't depend on the security of mobile devices
Posted by ekr on 09 Jan 2023
You'll notice that in my post on end-to-end voting I never mentioned the word "blockchain". However, there's been quite a bit of interest in the "crypto" community around somehow using the blockchain to "fix" voting. For instance, here's Binance CEO Changpeng Zhao arguing back in 2020 that it will lead to more secure elections with faster results:
If there is a blockchain based mobile voting App (with proper KYC of course), we won't have to wait for results, or have any questions on its validity. Privacy can be protected using a number of encryption mechanisms.— CZ 🔶 Binance (@cz_binance) November 5, 2020
And here's Ethereum founder Vitalik Buterin endorsing the idea:
The technical challenges with making a secure cryptographic voting system are significant (and often underestimated), but IMO this is directionally 100% correct. https://t.co/J0qHiN2bbk— vitalik.eth (@VitalikButerin) November 5, 2020
See also Buterin's more extensive defense of this position here, which argues for the blockchain-as-bulletin board design. I address some but not all of his points below.
Spoiler alert: I think this is wrong, in two separate ways.
First, blockchains are not really a useful element in Internet voting: they don't solve the basic security problems in the system, and are worse than the existing technologies they would replace.
Second, the basic premise that we need Internet voting in order to fix our existing voting systems is largely misguided: it's true that we see a lot of problems with those systems in practice, but it's also quite possible to use paper-based systems to run an election that produces quick results which can be independently verified. To a great extent, the operational problems that have gotten so much press are the result of conscious decisions made by policymakers. Moreover, at our current level of technology Internet voting has serious vulnerabilities that we just have no real idea how to overcome.
Blockchains are not the solution to Internet voting #
Let's dispose of the obvious point first: the big problems in the security of Internet voting stem from the need to secure software (and keying material) on voters' devices. A blockchain doesn't really do anything to address this. Moreover, the fact that we fairly routinely see successful attacks on crypto infrastructure as well as theft of crypto currency, including from crypto investors (and maybe even core Bitcoin developers???)—who you would expect to be sophisticated—does not exactly suggest that the cryptocurrency community has discovered the secrets to key management and to building secure cryptographic software. And of course, even if they had, that software has to run on commodity platforms which of course have their own security problems; if end-user devices are compromised, then you can't trust the cryptographic voting software on top of them even if that software is perfect.
The difficulty of getting ordinary people to use cryptography correctly isn't some surprising piece of news. There's decades of papers on how hard cryptographic software is to use (see here and then here). In fact, here's Zhao just last month saying that that 99% of people can't adequately handle manage their own keying material for their crypto:
For most people, for 99% of people today, asking them to hold crypto on their own, they will end up losing it.”
“Most people are not able to back up their security keys; they will lose the device [...] They will not have the proper encryption for their backup; they will write it on a piece of paper, someone else will see it, and they will steal those funds,” he explained.
But this is precisely what we are asking people to do in order to do any kind of Internet voting (with or without a blockchain). The security of these systems depends critically on the security of the keying material used to authenticate each user. If people can't safely do that for the keys to manage their money, then why should we expect them to do so for a key they only have to use twice a year?
They aren't even a useful element #
OK, so blockchains don't solve the basic security problem with Internet voting, but maybe they are a useful component? Again, I think the answer is "no". The obvious place you might want to use a blockchain is as the "bulletin board" for an E2E system. The bulletin board needs to be (1) publicly accessible and (2) have public consensus on the contents. Given that the point of a blockchain is to provide consensus about which coins have been spent, this seems like a natural fit. The idea here would be that you would submit your ballot as a record on the blockchain (just as you would a record of a spending transaction). Any records which had been included as of the date of the election (or some other deadline, presumably) would then be treated as "on the bulletin board" for the purposes of the rest of the protocol. You'd of course need all the rest of the apparatus of end-to-end verifiable voting like the provable mix, etc., but maybe the blockchain would be useful as the bulletin board.
While possible in theory, this doesn't really get you much in practice. First, the verifiability properties of a blockchain do not map well onto what you need for an election. Second, this use of a blockchain in this context has a number of practical problems, as discussed in a quite thorough report by MIT researchers Park, Specter, Narula, and pioneering cryptographer (and co-inventor of the RSA public key algorithm) Ron Rivest.
The distinguishing feature of blockchain type systems is that they are designed to be "zero-trust", in the sense that you don't need to trust a central authority to maintain the integrity of the log. The specific property that the blockchain is guaranteeing that everyone has consensus on:
- Which transactions are in the log
- What order they occurred in
The details of how it accomplishes this are out of scope for this post (I've been working on a post about this, but I'm not happy with it yet), but the key insight to have is that the reason you need this kind of system is that the transactions in the log do not themselves provide all the information you need to verify them. Specifically, while they are typically digitally signed and so you can verify they are authentic, but you need the blockchain to tell you what order they occurred and to ensure that people don't conceal transactions.
E2E voting is similar in that you don't trust the voting authority but different in that all of the information it publishes is self-authenticating, so you don't need some separate mechanism to ensure it was correctly recorded. Specifically:
- You can verify that all the input votes are valid by checking their signatures (this is true of cryptocurrency systems too).
- You can verify that the mixing was conducted correctly by checking the proofs of shuffling.
- You can verify that the votes were decrypted correctly by checking their proofs.
The only thing you can't directly verify from this information is that votes weren't incorrectly excluded from the original input set, but a blockchain doesn't really assist you here, because it's just a record of what people claimed happened. Instead, what you need is for the authority to publish the input set in some way that everyone can see and that allows people to challenge the input set. Specifically, the authority publishes the set of signed encrypted ballots to the bulletin board and then:
Voters who believe that their votes were improperly excluded can challenge that exclusion.
Observers who believe that a vote was improperly included (e.g., the signature is invalid, or the voter is ineligible) can challenge that vote.
This does require that everyone agree on the contents of each bulletin board, but you don't need the blockchain to provide it because the election officials can just post it on their Web site. Well, mostly.
Partitioning Attacks #
The reason for the "mostly" is that you can't check whether all the votes that are supposed to be present actually are, because you don't know who voted. Rather, you are counting on other people having checked that their votes appear on the bulletin board (or people checking for them). If that bulletin board is just a Web site then it's theoretically possible to mount what's called a partition attack.
Suppose the election officials want to suppress Alice's vote. If they just exclude it from the bulletin board, then Alice might catch them. Instead, they selectively exclude it, by creating two copies of the bulletin board:
- The main one they use for the actual count that excludes Alice.
- A bogus bulletin board that includes Alice.
When Alice goes to check her vote, the election officials send Alice the bogus version, and so her checks succeed. However, when anyone else checks the bulletin board, they send the real copy.
This is actually a very hard attack to mount in practice because any number of things can go wrong. First, if Alice checks the final totals, she'll see that they don't match. Even if she's lazy, this depends on being able to perfectly detect when Alice is checking as opposed to someone else; as there is no reason to authenticate this transaction, that's difficult. You could use the IP address, but what if Alice votes from her phone and checks from her laptop?
Moreover, this attack is easy to defeat as long as you have any consensus mechanism at all. You certainly don't need anything as fancy as a blockchain, though because we already have numerous mechanisms for election officials to communicate authoritatively with the public in ways that ensure that everyone gets the same information (e.g., by having that information broadcast on television or published in the newspaper). All they need to do is publish the hash of the bulletin board via one of these mechanisms and then everyone can verify that they have the same bulletin board contents.
The point is that this is not a situation which needs distributed consensus; it just needs regular consensus. The whole system has to be centrally operated anyway, and that central authority is a natural mechanism for establishing consensus.
Practical Problems #
The details of how blockchains work are outside of the scope of this post, but briefly, a blockchain is a public list of transactions, with every transaction appearing—or at least attested to—by the blockchain. It is maintained by a set of servers who are responsible for checking the validity of transactions and appending them to the public log. In what's called a "permissionless" blockchain, these servers are just operated by ordinary people (or at least in theory, in practice of course it takes a lot of resources to be relevant) and there aren't any special trust relationships with those servers. At a very high level the process looks something like this:
- The user (voter) generates a candidate record that it wants incorporated into the blockchain.
- The user's software then sends the record to some set of other network nodes.
- Those nodes propagate that record to other nodes until all—or at least most—of the other nodes in the network have a copy.
- One or more network elements select a set of outstanding records and incorporate them into the blockchain. Note that I've totally omitted how this happens. For our purposes, it's magic.
- The extended blockchain is propagated to the rest of the network.
The result is that everyone knows by looking at the blockchain which records are in the consensus and which are not (this part is magic too).
As Park et al. observe, there are a number of things which can go wrong here. For instance:
The nodes that the user submits their record to could decide not to propagate it to other nodes, thus preventing a given user from voting.
The nodes responsible for selecting the set of outstanding records could omit a specific record, either unintentionally (because it gets lost) or maliciously (to suppress a given user's vote).
An attacker could attempt to mount a denial-of-service attack on the network to prevent it from coming to consensus. Park et al. suggest a specific attack scenario which exploits the fact that in some networks the user has to pay to have their transactions included in the blockchain, and the nodes have discretion about which transactions to include (and can favor the higher bidding ones) at times when the incoming transaction rate exceeds the throughput of the network. If the network is shared with other applications like financial transactions, an attacker could potentially flood the system with transactions in an attempt to starve out legitimate votes.
An attacker might be able to exploit defects in system elements or the associated protocols to globally or selectively mount denial-of-service attacks on an election.
The bigger picture here is that blockchains don't provide a guaranteed level of service and that the actual delivered level of service depends on network elements which are untrustworthy and potentially malicious. This opens up a lot of opportunities for attackers to interfere with election outcomes even if they aren't able to actually forge votes. They don't need to be completely successful, either, they just need to have a big enough impact to swing a close election. Of course, some of these attacks are possible with centrally operated systems, but at least in those systems you know who to blame for outages (and remember, I'm not saying that Internet voting is good, even with centralized systems!).
I could go on here, but if you're really interested, you should read the MIT report. The authors do a valiant job of trying to design a blockchain-based voting system using coins as votes, but honestly it's just a mess, with all the problems I've described here and more (this isn't a critique of the authors; their point is that it's a bad idea, so it's proof by contradiction.) The bottom line is that blockchain technology just isn't a good fit for this application.
Solving the wrong problem #
Finally, the whole argument here kind of rests on a misdiagnosis of the situation, namely that the problem with conventional voting systems is that they are inherently (1) slow to get results and (2) open to questions of validity, and hence that we need Internet voting to solve these problems.
It's entirely possible for conventional voting systems to produce rapid results (though in all fairness, not as fast as an Internet-only system). It's true that there have been a number of recent elections where it took a number of days to determine the winner, as more votes trickled in. In some cases, candidate A looked like a winner early but was the eventual loser when all the votes were in, which has caused a lot of suspicion among people who didn't understand what was happening. However, many jurisdictions actually are able to resolve elections quickly. For instance, Florida mostly got same-day results in 2022.
To understand what causes delay, it helps to understand the logistics of voting. The consensus best choice in the voting security community is optically scanned (opscan) paper ballots. These can be counted in one of two ways:
Precinct count: The ballots are fed into a machine in the precinct which counts them immediately and then can report the results.
Central count: The ballots are sent back to election central where they are scanned.
Precinct count systems can deliver results immediately upon poll closure, with some potential risk to voter privacy (you have to trust the machine not to record the order of ballots and their contents). With systems like this, you can get a count on election night (pending verification, as below). Central count machines obviously take longer to report values, but modern central count scanners can count hundreds of ballots per minute, so it's not implausible that you could get an election night count with an acceptable cost, as Florida already does.
There are a number of reasons why elections can be slow to resolve, but one of the main ones is absentee/mail-in ballots. For instance, in California, ballots can be postmarked on election day, so you need to wait days for all of the ballots that were mailed to be delivered. In some jurisdictions, you can't even start counting absentee ballots until election day, which means you need to count a lot of ballots right away. A number of jurisdictions have both of these problems: in Mississippi ballots can be processed up to 5 days after election day if they are postmarked on election day and you're not even allowed to start checking the signatures on them until election day! As noted above, if you have the right policies you can get answers reasonably quickly.
It's certainly true that ballots received over the Internet could be tallied instantly, so in that respect we would expect Internet voting to be faster, but this only works if we require everyone to vote over the Internet, which has the potential to really disenfranchise a lot of people (people who can't afford modern devices, those who aren't comfortable with new technologies, etc.). If a significant number of people still vote mail-in with paper ballots, then you still have the problem. The bottom line here is that if we want to prioritize rapid election results at the cost of making it harder to vote remotely (and while for many people an app would be easier, for some it would be harder), then we know how to do it; it's a choice to have slow election results.
It's also important to note that this is all about preliminary results. Full verification takes time, both with paper-based systems and for end-to-end verifiable systems. For paper-based systems, this is because the risk-limiting audit or hand count is manual. In end-to-end verifiable systems, the cryptographic pieces can be checked immediately, but you need to give time for people to challenge the initial vote input set (and specifically to object that their vote was not included). Until that's happened, you have no way of knowing that the voting system didn't just exclude a lot of voters.
Disputes about validity #
From a technical perspective, election validity comes down to the ability to demonstrate to a third party—ideally to any third party, but in practice to some set of third parties that are collectively trusted by the electorate—that each phase of the election was correctly conducted, or at least that the inevitable errors were insufficiently large to affect the final result.
For ordinary elections, verifiability is provided by a combination of observability—at least in principle—for the manual processes and double-checking for the inherently unverifiable electronic processes (if any). This second feature is typically described using the concept of software independence (SI), defined by Rivest and Wack as follows:
A voting system is software-independent if an undetected change or error in its software cannot cause an undetectable change or error in an election outcome.
The intuitive reason for SI is that we know computers to be very insecure—and multiple reviews of electronic voting systems have found serious vulnerabilities—and that their operations are opaque, so any voting system shouldn't depend on trusting them.
With a hand-marked paper ballot system, you have some set of processes to ensure that only registered voters vote, but you still need to verify that the tabulation is performed correctly. If you count the ballots by hand, we're back to observability, but if you count them by machine, then you need a double check. This can be provided by using using a risk-limiting audit, in which a sample of the ballots is publicly counted. Of course, if there is real doubt or the margin is very close then you can do a full hand count, but in either case the entire counting process can be made verifiable (though in practice, RLAs are nothing like universal). They key point here is that if you follow the right practices, then even a complete compromise of the scanner will not lead to the wrong result. If you use ballot marking devices instead of hand-marking the ballots, then this does not completely provide SI: if the BMD is compromised then the attacker can have it record the wrong result; some voters will check and catch the error, but others won't and for those voters the attack will succeed. The counting process is still verifiable, of course.
Similarly, end-to-end verifiable systems provide SI for tabulation by making it possible—at least in theory—for someone to write their own system from scratch that will verify the election. However, if users are voting on their own devices, then any compromise of those devices can completely compromise the device, and there's no plausible way to detect or recover from this form of attack, which is even worse than with BMDs. Imagine what happens in an election where it's discovered that even a small number of user devices had been compromised; how would you have confidence in the result? As noted above, using a blockchain doesn't help with this at all.
Even if we confine our attention to the parts of the system that are independently verifiable, actually convincing yourself that the election was correctly conducted can be a pretty challenging proposition. A full hand count is directly verifiable if you watch the whole thing, and while the idea behind a risk limiting audit is simple, knowing how many ballots to count involves some reasonably complicated math. The situation with any end-to-end verifiable system is dramatically worse in that not only is the math very complicated, even the logic takes thousands of words to explain. It's pretty hard to see how explaining that votes are correct because they are digitally signed and then mixed in a way you can check by verifying a zero-knowledge proof is going to put to rest any questions of validity.
You'll note that above I said that from a technical perspective validity disputes comes down to third party verifiability. The bigger problem here is that many election disputes don't come down to technical questions at all, because most people people aren't going to research the details of how elections are run—how many people still think that there was tabulation fraud in Georgia, even after a full hand count?—and end up making decisions on other grounds, using motivated reasoning or based on who they trust more. It's hard to see how any set of technical mechanisms will really convince everyone, though I'm especially skeptical that arguments based on fancy cryptography will do the job.
The Bigger Picture #
As I said in my original post on end-to-end verifiable voting, voting isn't just a technical problem: it's embedded in a system of social practices and it's those social practices which make the problem complicated (again, I encourage anyone interested in voting to actually go serve as an election worker). It's of course possible to improve voting technology, but most proposals for how we could radically improve everything using new technology X fall down when you realize that X don't take into account those existing operational realities. This is largely the case with Internet voting. The problem with using blockchains for Internet voting is simpler, though: it doesn't solve any problem that can't be solved with other, simpler technology. Of course, that could also be said of a number of other proposed applications of blockchains, which, to quote Mark Nottingham are not magical.
The scare quotes are here because there is of course a pre-existing use of the term "crypto" to mean "cryptography". ↩︎
The reason this is important is that you need to prevent "double-spending" attacks where people use the same cryptographic token to pay two people. ↩︎
The analogous check in a blockchain-based cryptocurrency system is that the payee verifies that a transaction is recorded on the blockchain before they believe they have been paid. ↩︎
This is actually how pre-Bitcoin timestamping systems were designed. ↩︎
The Bitcoin maximum transaction rate is famously low, though other networks do better. ↩︎
Handwaving alert: The Interscan Hipro can scan 300 pages per minute and costs under $200,000. Los Angeles is probably the biggest county in the US with almost 6 million registered voters: if you had about 40 scanners you could do all these counts in less than 10 hours at a capital cost of less than $10 million (of course, there are lots of other costs to consider). ↩︎
Remember that many registered voters don't actually vote, so you need some way of distinguishing the case where people didn't vote from the case where their votes were discarded. ↩︎
By which I mean that for the vast majority of voters, there is at least one verifier they trust, even if not all voters trust the same verifier. ↩︎
Outside the US, hand counting is common, but in the US, it's pretty much necessary to use machine counting for logistical reasons. ↩︎