What the heck is going on in New York's election?
Posted by ekr on 01 Jul 2021
If you've been following the already bizarre NYC mayoral election, you've no doubt heard that the NY Board Of Elections (BOE) has had to withdraw their partial tallies because they accidentally counted some test ballots. The root of this problem seems to just be simple human error, but the situation is vastly complicated by NY's use of what's called Ranked Choice Voting (RCV) also called Instant Runoff Voting (IRV).
How it Usually Works: First Past the Post and Runoffs #
Many people tend to think of voting as simple: you vote for your preferred candidate and whoever gets the most votes wins. This model, usually called "first past the post", is certainly common but by no means universal, and has some obvious problems which emerge if there are more than two candidates. Consider the case where we have three candidates, Alice, Bob, and Charlie and 12 voters. We run the election with the following results:
So, Alice wins, right? But here's the thing: what if everyone who preferred Charlie actually preferred Bob to Alice? This system just ignores that fact and hands the election to Alice. But if Charlie had dropped out, then Bob would have gotten those vote and would have won instead of Alice, with a comfortable margin like so.
This situation strikes many people as fundamentally unfair: If you support a candidate with a low chance of winning but you also have a preference between the leading candidates, you have to decide between voting for that candidate or actually influencing the outcome of the election in the direction you prefer. It also means that third party candidates can potentially change the outcome by being in the race (the disparaging term here is "spoiler"). It's not like this can't happen in the real world, either: in several recent US presidential elections (1992, 2000, 2016) third party candidates have received enough votes that it could in principle have changed the outcome. In any case, having people who have no chance of winning not affect the election seems like a desirable property.
One way to address this is by having what's called a "runoff" election. The general way that a runoff works is that if no candidate gets more than a given threshold percentage of the vote then you run a new election with some of the lower-ranked candidates omitted. A particularly consequential example of this is that of the Georgia 2020 senate races, in which you had to get 50% of the vote in order to win. However, in both the regular election (for a full term) and the special election (for a four year term), no candidate got over 50%, so a runoff election was run three months later with just the top two candidates. In one of those races (the special election) the first-ranked candidate in November (Raphael Warnock) eventually won, but in the other, David Perdue had the most votes in November but eventually lost to Jon Ossoff in January, giving the Democrats a 50-50 Senate with VP Kamala Harris as the tie breaker.
Instant Runoff Voting (aka Ranked-Choice Voting) #
Runoff elections have an obvious appeal in that the eventual winner actually receives a majority of the vote, not just a plurality, and you can be confident that they actually were the preferred choice between the two candidates. However, they also have a number of undesirable properties. First, it's expensive and inconvenient to run another election months after the first one. Moreover, that election is run under different conditions than the first, so there is time for politicking and you don't know you're getting the same outcome you would have gotten from a runoff done on election night.
It's possible to avoid those costs using Instant Runoff/Ranked Choice Voting (RCV). The idea behind RCV is to simulate a series of runoff elections without actually having to run them. To make this work, instead of listing only their top candidates, voters instead rank the candidates on the ballot. A typical version of the election decision procedure works like this:
- Count up the votes for everyone's top choice.
- Eliminate the candidate with the lowest number of votes.
- If only one candidate is left, they are the winner, otherwise go to 1.
For instance, suppose we have the following ballots with three candidates.
In round 1, we count up all the first choices (Alice: 2, Bob 2, Charlie 1). So, we have a tie between Alice and Bob with Charlie as the last place candidate. We remove Charlie from the election, changing Harold's ballot to be "Alice, Bob", making it a vote for Alice and giving her the win. In this particular case, the first round had a tie, but RCV can also change the results. Consider what would have happened if there were 49 ballots for Alice, 51 for Bob and 2 for Charlie and then Alice. In a first-past the post system, Bob would have won, but in an RCV system, Alice wins.
I just want to note for the moment that there is a lot of debate about whether RCV is actually a good voting system from a political perspective (i.e., does it produce the "right" outputs?). I'd just like to bracket that discussion for now, and talk about the logistical properties in the context of what we're seeing in New York.
RCV Logistics in Practice #
The core thing to recognize about RCV is that unlike first-past-the-post systems the running tallies of the "first choice" don't capture the entire state of the tally, and in many cases don't do a very good job at all. Consider the case where even though there are three candidates, voters only have three sets of preferences (this is unrealistic, but just convenient for analysis):
If you just look at the running tallies before the RCV elimination round, it looks like Alice is way in the lead, but actually most voters prefer either of Charlie or Bob to Alice, so once you've eliminated Bob, Charlie is going to win with 60% of the votes.
A related problem is that relatively small low numbers of ballots can change the eventual winner even if the gaps between the leaders is quite large. Consider the election directly above, but with the people who prefer Bob preferring Alice to Charlie rather than Charlie to Alice (I've bolded the changed preferences).
So, in this current election, Bob is eliminated first, his votes go to Alice and she wins 69-31. But if we shift 1% of votes from the third to the first row, giving us:
In this case, Charlie is eliminated, his votes go to Bob, and Bob wins 60-40. So, just by moving 2% of votes (2 votes) from one candidate to another we've changed a landslide win for Alice to a landslide win for Bob.
The key point here is that in RCV election just looking at the top-line numbers is super misleading. Instead, you need to think of the election as consisting of a bunch of different possibilities depending on who gets eliminated and when. In order to do this, you need not just the raw tallies for every candidate in each position, but actually the number of ballots with each possible ranking of candidates This can be quite a bit of data: as I understand the New York City election has 13 candidates and you get to pick 5, so that means that you have over 100,000 different potential slates that people could have voted for, and you need to see how many voted each of those got in order to understand the state of the election.
So, part of what's confusing in New York is that you're seeing the top-line numbers of how many votes each candidate has based on the current ballots that have been counted, but there are a lot of absentee ballots (~125000) that haven't been counted yet, and there are still at least three viable candidates (Adams, Garcia, and Wiley). The gaps between them are very small: ~15000 between Adams and Garcia after all the elimination rounds, but only 350 between Garcia and Wiley, so you need to do a bunch of what-ifs based on what the contents of those absentee ballots might be and based on the precise composition of the already counted ballots. (See this NYT article for more on this). So, it's not just a simple matter of saying that Garcia needs 15000 more votes than Adams in order to win. What if, for instance, Garcia got 15000 more votes than Adams but Wiley got 500 votes more than Garcia? In principle, there may even be enough absentee ballots to put Yang back in the race because he was aout 80,000 ballots behind Garcia!
To make matters worse, NYC inadvertantly posted ballot tallies that included a number of test ballots. Those tallies were quickly taken down, but it's obviously another source of confusion.
Take home #
I do want to emphasize at this point that it's quite possible to run RCV-based elections efficiently. In fact, Australia routinely runs a similar system called single transferrable vote. It's a little more mathematically complicated to do a risk limiting audit with IRV but there's now some exciting work showing how to do it efficiently in practice. What we're seeing here is the result of combination of a particularly contested election, a large number of absentee ballots, the desire to post preliminary results, and a pretty serious ballot handling error with the test ballots.
There is debate about the impact of third party candidates in the 1992 and 2016 elections, but this was certainly something people were worried about at the time. It seems fairly clear that poor ballot daesign caused a number of people in Florida to inadvertantly vote for Buchanan rather than Gore, in numbers large enough to have shifted the election to Bush. See Wand et al. for more. Thanks to Joseph Lorenzo Hall for this reference. ↩︎
Wikipedia has the background here, but briefly: usually US Senate terms run 6 year and the elections are staggered, but in this case the sitting senator resigned and so they had to run a special election to fill the rest of the term. ↩︎
People often say that you need a list of all the ballots, but that's not actually required. ↩︎
As an aside, I feel compelled to point out that there is a simpler way: approval voting, is a simple modification of first-past-the-post in which you are allowed to vote for multiple candidates and whichever candidate has the most votes in total wins. This is much simpler to reason about but at the cost of not letting voters differentiate between candidates other than between "acceptable" and "not acceptable". The debates about approval versus RCV are heated and technical (see here for an overview), and I won't get into them here. ↩︎