It’s Good to be Average

Last week, we examined the accuracy of several presidential forecasts. For those familiar with statistics and probability theory, the results proved unsurprising: the forecasts came reasonably close to the state-level outcomes, but the average forecast outperformed them all.

Put another way, the aggregate of aggregates performed better than the sum of its parts.

This year’s Senate races provide us another opportunity to test our theory. Today, I gathered the Senate forecasts from several prognosticators and compared them to the most recent Election Day returns. As before, I also computed the RMSE (root mean squared error) to capture how accurate each forecaster was on average.

We must note one modest complication: not all forecasters posited a point-estimate for every Senate race. Nate Silver put forward a prediction for every race; but Sam Wang of Princeton University only released 10 predictions for competitive races.

We accordingly compute two different RMSEs. The first, RMSE-Tossups, only computes the RMSE for those races for which each forecaster put forward a prediction. (There are nine races that fall into this category: Arizona, Connecticut, Massachusetts, Missouri, Montana, Nevada, North Dakota, Virginia and Wisconsin.)

The other calculation, RMSE-Total, shows each forecaster’s RMSE over all predictions. Wang, for example, is evaluated by his accuracy on the ten predictions he made; while Silver is evaluated on all 33 races.

Forecast	RMSE-Tossups	RMSE-Total
Wang	4.7	4.6
Silver	5.1	8.0
Pollster	3.8	5.8
RealClearPolitics	5.4	5.1
TalkingPointsMemo	3.9	8.0
Average Forecast	4.4	5.4

The numbers in the above table give us a sense of how accurate each forecast was. The bigger the number, the larger the error. So what can we learn?

Alas! The average performs admirably yet again. It’s not perfect, of course; for some races, there are precious few forecasts to average over: Delaware, for instance, has only the 538 prediction.

To begin accounting for this, we weight the RMSE by the share of forecasts used to compute the average. If we limit our evaluation of the average to only those races with three or more available forecasts, the RMSE drops to 4.8.

What else emerges from the table? For one, the poll-only forecasts — especially the Wang, RCP and Pollster forecasts — perform better than Nate’s mélange of state polls and economic fundamentals.

North Dakota, where Democrat Heidi Heitkamp bested Republican Rick Berg, provides a case in point. Pollster and RealClearPolitics both predicted a narrow win for Ms. Heitkamp. The 538 model considered the same polls upon which Pollster and RCP based their predictions; but the fundamentals in Mr. Silver’s model overwhelmed the polls. As a result, the 538 model predicted that Mr. Berg would win by more than five points.

In sum, however, all of the forecasts did reasonably well at calling the overall outcome. We can chalk this up to another victory for (most) pollsters and the quants who crunch the data.

Forecasting the End of the World

As many readers may note, we at Margin of Error tend to think of the world within a Bayesian framework. That’s not exactly unique: most of the prominent forecasters think of probability more as Bayesians than as Frequentists.

This morning, XKCD decided to wade into the debate:

I know which bet I would take. How about you?

Aggregating the Aggregates

Thanks to data compiled by Kevin Collins at Princeton, we can examine the accuracy of some of the state-level forecasting models. Nate Silver’s 538 model performs marginally better than the pack. But the best predictor: an average of the forecasts.

To assess accuracy, we calculate the Root Mean Squared Error. To do so, we take the actual result in state i, , and a forecaster’s prediction in state i, , and calculate:

As you can see, higher values indicate that the forecaster made bigger errors. Put another way, the number shows us how badly each forecaster missed on average.

Alex Jakulin, a statistician at Columbia, helpfully pointed out that a more useful metric may be the RMSE weighted by the importance of each state. We would expect misses to be larger in small states and should correct for that. Accordingly, we present the RMSE for each forecaster, and the RMSE weighted by the proportion of electoral votes controlled by each state.

Forecast	RMSE	Wtd RMSE
Silver 538	1.93	1.60
Linzer Votamatic	2.21	1.63
Jackman Pollster	2.15	1.71
DeSart / Holbrook	2.40	1.79
Margin of Σrror	2.15	1.98
Average Forecast	1.67	1.37

All told, the forecasts did quite well. But look at what worked better: averaging over the forecasts. This makes good statistical sense: as Alex points out with a fun Netflix example, it makes more sense to keep as much information as possible. In a Bayesian framework, why pick just one “most probable” parameter estimate, instead of averaging over all possible parameter settings, with each weighted by its predictive ability?

During the Republican primaries, Harry Enten published a series of stories on this blog doing precisely that; and then, as now, the “aggregate of the aggregates” performed better than any individual prediction on its own.

On the whole, all the forecasting models did quite well. As the figure above shows, critics of these election forecasts ended up looking pretty foolish.

I only now wonder if the 2016 will see a profusion of aggregate aggregators; and if so, how much grief Jennifer Rubin will give them.

How Well Did Polls Do?

The polls, and the forecasters using them, performed pretty well. Harry Enten posted for the Guardian this morning about the overall success of pollsters in the 2012 cycle, and John Sides put up this great post/figure over at The Monkey Cage. (The figure was also picked up by Ezra Klein.)

I wanted to see the estimated error margins around the final pollster predictions. So here’s my take with 95 percent confidence bands:

The figure shows the difference in the predicted and actual Obama margin. Positive numbers mean that the pollster overstated Mr. Obama’s margin of victory; the negative implies the inverse.

Almost all of the polls came within a reasonable distance from the outcome. In fact, most contained the true outcome in their margins of error.

Updating Your Prior Beliefs

We’re not live-blogging the presidential race, but here is an interesting tidbit:

There is not much to update in the model, but we can update our expectations given current returns. Put another way: we can ditch simulations with results we know to be incorrect.

By updating our beliefs, Mr. Romney’s chances of winning drop from 32 percent to only 12 percent. As you can see in the figure, Mr. Romney’s distribution of electoral votes has become much more certain (as we would expect), but he’s lost most area under the curve to the right of 270.

The door on Mr. Romney’s probability of winning is closing, and quickly.

Back to Fundamentals

As we published yesterday, the Margin of Error forecast predicts Mr. Obama to secure reelection with 303 electoral votes to Mr. Romney’s 235. This translates roughly into a 68 percent chance for Mr. Obama to win, leaving a nonnegligible 32 percent chance of an upset victory by Mr. Romney.

One of the more interesting elements of the model is that it’s agnostic to state-level polling. Most of the highly-trafficked forecasting models (the gold standard is Nate Silver’s 538 model) use various methodologies for aggregating state- and national-level polling.

The MoE forecast, on the other hand, uses very few variables. The national popular vote is predicted using late-season approval data; the state-level votes are forecast using (a) previous election results; (b) August-November change in unemployment; (c) home state advantage; and (d) a regional dummy variable. No polls.

Despite the stark difference in methodology, the forecast comes in well in-line with most quantitative models. The most notable gap between our model and many of the poll-aggregation models is, in fact, that ours makes a far more conservative prediction of uncertainty.

So, what’s the deal with this fundamentals-based model, anyway? In my opinion, a fundamentals-based forecast brings some distinct advantages and disadvantages. Continue reading →

Final Forecast: Mr. Obama Favored

Over the summer, we published a forecast for the 2012 presidential election based solely on election-year fundamentals. On the evening before Election Day, I am republishing the forecast. The only changes since summer come in the form of updated economic variables and approval numbers.

The model predicts Mr. Obama to win 51.5 percent of the national two-party vote to Mr. Romney’s 48.5 percent. Propagating these predictions to state-level results, the model forecasts Mr. Obama to secure 303 electoral votes to Mr. Romney’s 235.

Over 10,000 simulations, Mr. Obama wins the election 68 percent of the time.

The model rests pretty comfortably in line with other social scientific models.

As I will expand on in a later post, one of the notable elements of the model is that it does not account for horserace polls. Unlike the models that get the most attention, this one only incorporates economic fundamentals for state-level predictions.

Accordingly, the model likely understates Mr. Obama’s advantage in Ohio, where the auto bailout seems to have made a dent in Mr. Romney’s chances. Working in the opposite direction, the model seems to overestimate Mr. Obama’s chances of winning Colorado, which most forecasts predict to be exceptionally close while the model gives Mr. Obama 53 percent of the two-party vote. For Colorado in particular, the model seems to be overly reliant on Mr. Obama’s favorable result there in 2008 and the state’s decent unemployment trajectory.

On the flip side, the model does not suffer from what’s undoubtedly haunting many Democrats: the chances that polls, with low response rates and undersampled cell-only households, are systematically incorrect. Put simply: without polling data included, the model doesn’t require polls to be accurate.

The model still comes with considerable uncertainty. In fact, the model may be overly cautious. Nate Silver’s model, for example, gives Mr. Obama a roughly 20-percent higher probability of winning than ours. That stems from multiple levels of uncertainty that we’ve added to the model: at the national, regional, subregional and state levels.

Though we can coerce the model into predicting a binary outcome in each state, we must note that several states are really too close to make a meaningful prediction. In particular, Virginia, North Carolina and Florida are forecast to come within a razor-thin margin of 1 point or less. Those states are, according to this naïve model, way too close to call.

All told, the model holds pretty closely to what we are seeing from other prominent models. Mr. Obama holds a definitive lead going into Election Day. Indeed, even were all too-close-to-call states to fall into Mr. Romney’s column, Mr. Obama would still secure reelection.