From Prof. Sam Wang of Princeton University.
This page is available online at synapse.princeton.edu/~sam/pollcalc.html
Real electoral result: Kerry 252 EV, Bush 286 EV. (outcome map) (comparison with pre-election polls)
Thursday, December 2, 4:30PM: After chewing over the Berkeley group's analysis (Hout et al.) and corresponding with some of you, it appears that there is no good evidence for a real county-level anomaly. The essential problem is that the largest e-voting counties have large populations and have no counterpart for comparison. Hout et al. have done fits that extrapolate from small counties to large counties. Considering the urban/rural divide, this is not permissible. This problem seems to be insurmountable by statistical analysis. As stated before, Broward and Palm Beach Counties may be worth looking at further, but only by means such as direct counting, ideally as done by the Miami Herald.
Thursday, December 2, 12:00 noon: Here's a story in the Seattle Times (quoting a Miami Herald story) on recounts in three Florida counties (two complete and one partial). The upshot is that the two complete counts indicate no irregularities, and the third one seems OK (with some gap, perhaps because sampled precincts were not representative). For further commentary see this thread.
Monday, November 22, 10:00AM: Andrew Gelman and Bruce Shaw have taken a closer look at the Hout et et al. report, and comment here. They point out that most of the anomaly is concentrated in two counties, Broward and Palm Beach. This suggests that if attention is focused anywhere, it should be here. This has also been commented upon by Kevin Drum at Washington Monthly.
Thursday, November 18, 10:30PM: There's an interesting report from a statistical analysis group at Berkeley about Florida voting patterns. After reading it over, I can't find any big problems with it. It is the best study yet done on the question of possible irregularities. This does not mean that irregularities occurred. However, it does mean that something unusual happened in Florida. Also, note that their estimate of voting discrepancy (130,000 to 260,000) is similar to my estimate based on pre-election polls (270,000).
What they have done is use voting patterns by country from 2000 and 1996, income by county, total population, and Hispanic population to try to explain voting patterns in 2004. These variables encompass many arguments made for why this year's results are surprising. For instance, it has been argued that rural voters liked Bush better this year. But since the Berkeley group used county population as a variable, this should stand as a proxy for population density. Note that because they are using regression analysis, they are not claiming that Bush should have gotten the same number of votes as previous years. This would not be true: we all know that Clinton-Dole 1996 turned out very differently from Bush-Gore 2000. Instead, they are using those variables to help account for variations among counties, for instance if one county tends to be more Republican than another.
Their analysis indicates that even when all these variables are accounted for, a significant difference remains between counties that used electronic voting and counties that used optical scanning or paper ballots (see also a previous analysis by Jeff Chambers).
But - in which group did something strange happen? They say electronic voting, but one could also imagine that optical scan and paper ballots were both off. This is less parsimonious, but can one rule it out? Maybe.
An additional piece of evidence is available: pre-election polls. In my own analysis (see below), I find that pre-election polls predicted a margin of Bush over Kerry by 1.4%, or about 105,000 votes. Instead the reported result was Bush over Kerry by 5.0%, or about 375,000 votes. The discrepancy is 270,000 votes, which is consistent with their model of misattribution, i.e. Kerry votes being counted as Bush votes, which predicts a discrepancy of 262,000 votes. The other model, "ghost votes," is also possible, though it only predicts a 131,000 vote discrepancy. Therefore, assuming that electronic voting had problems relative to other methods leads to an estimated discrepancy that is very similar to one made using pre-election polls.
A more innocuous explanation may exist, but their work (and my analysis) put bounds on what that explanation might be. A change in sentiments among rural voters seems unlikely, since the Berkeley team found no anomaly in Ohio (and also used county population as a variable). Another possibility is the fall hurricanes, since these did not happen in other states; however, how this would work is unclear. In any event, this story is not over.
Tuesday, November 16, 11:45PM: Below is a graph of all state polls in Florida, Ohio and Pennsylvania in the weeks preceding the election. (Comparisons for other states are available here.) The gray band indicates 1 SEM surrounding the average for the last seven days; this is about half the width of the 95% confidence band. The arrowheads indicate the final announced outcomes. Ohio and Pennsylvania polls were consistent with the final outcome, but Florida polls were not consistent.
Pennsylvania polls indicated a Kerry win by 2.1 0.7%; the final result was Kerry by 2.2%.
Ohio polls indicated a Bush win by 1.0 0.7%; the final result was Bush by 2.5%. This could be accounted for by the trend toward Bush in the last five days.
Florida polls indicated a Bush win by 1.4 0.9%; the final result was Bush by 5.0%. This final result is off by 4 SEM. It is also in the same direction as the claims of voting fraud made based on county-level data.
Although not definitive, these data are at least consistent with the suggestion that something unusual happened in Florida, either in the 33 polls conducted in the final weeks of the campaign, or in actual voting.
Saturday, November 6, 9:00AM: Here is a pre-election article by Kerry consultant Mark Mellman that predicted Bush's popular vote share to within 0.1%. Going by the article, the factors that went into the calculation included job approval, the economy, war, and right track/wrong track sentiment. Putting aside the talk about values (based on a single poorly-worded exit poll question), what about the relatively simple hypothesis that the economy's not that bad combined with loyalty to a wartime president?
Despite all this, the true margin of victory was about 150,000 votes in Ohio, and even smaller margins in Wisconsin and New Hampshire. As I said during the campaign, Electoral College mechanisms (essentially because of increased clustering of Bush supporters) gave Kerry an approximately 2% advantage compared with the popular vote. Thus the difference between the Meta-margin above and the popular vote margin.
Friday, November 5, 6:00PM: I continue to be deluged by email on the subject of the anomalies in Florida voting in small counties. As I said below, these data sort fairly well by the rural/urban divide. In a graph by Jeff Chambers that there may be some small remaining anomaly. However, this could be ballot spoilage. Here's an analysis demonstrating the size of the anomaly. I don't think this is going anywhere. The most constructive thing at this point is to redirect energies to voting reforms, such as those advocated by the Open Voting Consortium. This has the advantage of serving all Americans.
In the coming days I will revisit polling data to see what the turning points were in the election, as measured through the Electoral College. I suspect that some of the shifts in my data that I could not explain may be explained by campaign moves that were not obvious at the level of national media. The electoral vote calculation is low-noise and captures swings well, so this is a perfect use for it.
Thursday, November 4, 9:00PM: A number of you have mailed in a plausible explanation for the seemingly anomalous results in optical-scan voting counties in Florida. Essentially, the apparent disproportionate voting for Bush occurred in less populated counties. These counties, being non-urban and having fewer resources, are still using optical-scan ballots. Therefore three variables are correlated here: smaller populations, voting technology, and Democratic crossover for Bush. As a result there is a correlation between the last two variables, but the real variable of interest is population. This creates an ambiguity of interpretation, and is an example of the dictum that correlation is not causation.
The reason population is of interest is that it is a stand-in for rural vs. urban dwellers, as Andy Royle and others point out. Royle made a graph of population size and Bush victory margin, graphed by county. This graph suggests to me that in rural counties, many people are registered Democrats but cross over to vote for Republicans. This could be because Democratic party allegiance is a holdover from times when rural voters were more likely to be Democrats. Many new registrants in these counties are Republicans, which supports the idea.
This speaks to the idea going around that heartland and rural voters are turning to Republicans on "values" issues. In my view this conflicts with a natural link between them and the stated policy goals of the Democratic Party. It suggests a disconnect between stated Democratic values and how the party is perceived. For an interesting exposition I recommend What's The Matter With Kansas? by Thomas Frank, which addresses the dominance of the GOP in the heartland.
Wednesday, November 3, 7:45PM: A few items of business. First, note that without my optimistic assumptions, I nailed the Electoral College tally. This, even in the face of single-state probabilities that give a different map and EV total. Poll margins are also quite close: see these validations of the method (still under modification but viewable). Now there's a testament to meta-analysis (and an indictment of the mindless mainstream horserace coverage of polls this season). Note, however, the large margin of error on my decided-voter estimate. A one-percent swing in Ohio or Wisconsin would have changed the total EV count - in the case of Ohio, to great effect.
This site. I am considering what to do. It is highly disorganized because I started off HTML-coding by hand. If readership continues, I may transfer to regular blogging software. However, this requires time or help. Another question is what topic(s) to cover. Let's first see if traffic continues. Your letters of support are heartening. I can't reply to all, but please continue to write. If your comments are for general consumption, here is a public thread.
Exit polls. Much of my mail today concerns exit polls, with calls for analysis to check for widespread fraud. Think about the lessons you have learned here. A more plausible possibility is that exit polls themselves are biased, for instance by the identity of the questioner or the temperament of the respondent.See my analysis of exit polls.
The incumbent rule. After some thought, I realize that we simply don't know what factors drove the result yesterday. The problem is that all the factors sum to give final voting, and are therefore hard to distinguish. In the New Republic is a suggestion that turnout was either symmetric or went against my assumption.
Fraud in Florida? This is an interesting question, and my evidence suggests that further investigation is warranted. Of all the battleground states, Florida was one of the most surprising in terms of deviating from the outcome expected from polls (and to a lesser extent, exit polls). The deviation favored Bush. It's too bad about this year's exit poll problems, because they would have provided support. My analysis of this is in the validation of 2004 results.
Wednesday, November 3, 11:30AM: To summarize points so far this morning: So far, the electoral outcome matches pre-election polling data very closely, with the possible exception of Florida. Therefore the electoral count looks a lot like the decided-voters median listed above. However, my final predictions were wrong. It appears that my add-on assumption about undecideds (the rule that they break against the incumbent) was wrong; this may have been because of the war, as suggested by the Mystery Pollster. My turnout assumption was also wrong. Exit polls do match my projection, which is surprising to me. This could be because those data are somehow non-representative, for instance because of gender bias.
Specific comparisons: victory margins were predicted well by polls. Out of 23 battlegrounds, the direction of the outcome was predicted in 22. The exception was Wisconsin, where the polling margin was 0.4% for Bush and the actual margin was about 0.4% for Kerry. Quantitatively, 12 victory margins were within one standard error and 17 were within the 95% confidence interval. Not perfect, but not bad.
Wednesday, November 3, 10:30AM: In Ohio, many provisional ballots are left to be counted. In the meantime, here is some general analysis. Overall, pre-election polls, exit polls and actual voting are mostly correlated. An exception occurs in Florida, suggesting that something unusual might have happened there, either in voting or in exit and opinion polling. The effect is probably smaller than Bush's margin.
Voting margins track pre-election polls - with exceptions. Voting margins were more favorable to Bush by 0.9 0.6% (median SEM; SD, 3.1%) than pre-election polls. This is very near to no difference at all. All discrepancies greater than 5 percentage points (AR, HI, NC, WV) occurred in states with few recent polls. The next largest discrepancy was in Florida, 3.6% towards Bush. Since Florida had so many polls, this is 4 SEM away from zero. Otherwise the match is quite good. Overall, 12 out of 23 pre-election estimates were within 1 SEM of the voting outcome, less than the expected 16 but not bad. I conclude that in most cases this year, five or more likely-voter polls taken in the week before the election gave an estimate that strongly correlated with final voting.
Voting margins and exit polls differ systematically. Exit polls were more favorable to Kerry by 3.0 1.5% (median SD) than real voting. This is tentative since I do not have the most complete exit polling data. Currently, in FL I have an exit poll margin of Kerry +1 and a real voting margin of Bush +5%; the discrepancy, 6 points, is again somewhat extreme. This is consistent with the discrepancy noted above. In OH I have an exit poll margin of Kerry +1 and a real voting margin of Bush +2; the discrepancy, 3 points, is right in the middle of the range. I don't know why exit polls would differ systematically, though one obvious possibility is the gender gap in respondents.
Graphs and further analysis will follow shortly.
Wednesday, November 3, 2:15AM: I was very wrong about FL and about the overall offset. The outcome is somewhere between the decideds-only and the with-undecideds medians. However, for months I have been saying that it was about Ohio, and it is. The margin there seems headed for about 1%, without absentee or provisional ballots. Another squeaker - though things are not entirely over yet.
Wednesday, November 3, 12:45AM: Possibly K259, B254 (assuming MN, IA, HI, MI). Then Ohio (20EV) - returns here. May come to Cuyahoga County (includes Cleveland) and absentees. Also NV (5EV), but at this point that doesn't matter.
Wednesday, November 3, 12:15AM: Consistent with the gender imbalance, exit polls seem to be biased toward Kerry relative to final totals. Not crunching numbers yet but will try later.
Tuesday, November 2, 10:00PM: Zogby's projections, which resemble mine, are here. Here is a plot of afternoon exit polls against my last poll margins. As you can see, most of the data points fall around Kerry +3%, the assumption that went into my final projection. However, the gender breakdown is 59F-41M, suggesting a biased sample, and thus indicating a problem with my assumption. However, as you all know, much depends on FL and OH. The partial returns for FL (77% of precincts reporting) indicate Bush - does anyone know if these include early voting, absentee and overseas? Overseas and absentee may not be counted until Thursday. Early votes - probably counted.
Tuesday, November 2, 5:45PM: There are so many of these exit polls - just like the campaign season all over again. Time for meta-analysis. I am plotting all that I can find on my graph. Nearly all of them are above the no-bias line. Looking at all of them at once, the median bias seems to be about +3%. This suggests to me that if overall, these polls reflect total voting, OH and FL will end up for Kerry, with margins of 2% by night's end. (Famous last words...)
Tuesday, November 2, 3:00PM: Early exit polls on Drudge (above the title). If you plot them on my brochure graph, their median is around +5% bias. This may regress as Republicans get to the polls, but I think this is a telling sign. Not sure if I will be blogging the returns - just a heads-up for you. I wonder if my prediction was too cautious! That's what I get for paying too much attention to my email.