Yesterday, I wrote about the problem at the Democratic caucuses and asked the question of what would be a good algorithm for tabulating a lot of votes quickly with a relative lack of technology.
First, an update on the parameters. 320 people voting for 19 delegates -- 9 men plus 9 women, plus 1 extra delegate of either gender (the candidate with the highest remaining vote count). Plus 9 alternates, in a similar manner. This is means we needed to tally more than 6,000 individual votes, split into two halves, for men and women. Voting was by candidate number.
What happened is that teams of people entered the ballots into very simple spreadsheets -- basically, just the candidate numbers. This was good because it was both simple and parallelizable. Like most developers, I type numbers very quickly. I was paired with another software guy (one person has to read, one has to tally) and we breezed through 85 ballots in 30 minutes (935 entries, an average of about an entry every 2 seconds). But the fact that we did 85 ballots is indicative of a problem -- there weren't enough laptops and teams, and other teams weren't as fast. Still, all the data entry was done in under an hour.
Then, something inexplicable happened. I don't actually know what. Given all the data in spreadsheets, it should be a simple matter to pull it all into one spreadsheet, perform a series of COUNTIF formulas for each of the potential candidate numbers and then sort the results by total. Even if that spreadsheet hadn't been created in advance, this is like a 5 minute operation. Instead, it took more than an hour to take all of the tallies into results. The people doing the totaling vanished into some other room, so I don't know what happened. Beforehand, I heard something about Microsoft Access being used, but I don't know why it would have been. Access isn't the best application to use when what you want to do is count data, especially when equivalent data has been entered in multiple columns.
Net result: it took more than two hours for the results of the vote to be known.
Everybody was well intended, but I think there were a number of factors contributing to the a less-than-desirable end result:
- Unreasonable restrictions (initial statements that computers couldn't be used, or that no more than one computer could be used).
- A single plan, for a situation that didn't happen (200 candidates), rather than multiple plans for different situations. The plan was inflexible.
- Not enough parallelism.
- A final step that was overly complicated.
And it's those last two that are the biggest problems -- no matter how much we optimize the system, if we have a bottleneck somewhere, all our optimizations will be for naught.
Of course, it would be much simpler if Washington used a primary, with systems that are already in-place to count votes, but that's a whole 'nother story.