Using Algorithms to Predict Elections: A Chat With Drew Linzer
Drew Linzer is an Assistant Professor of Political Science at Emory University, and a 2012-13 Visiting Assistant Professor at Stanford University. He received a Ph.D. from UCLA in 2008. A statistician and expert in public opinion, Dr. Linzer launched the website votamatic.org during the 2012 U.S. presidential campaign, where he accurately forecasted the election outcome in all 50 states, as early as June.
Registration Link: http://bit.ly/Zs3wms
This interview was conducted by George Hill and published in the Big Data Innovation Magazine.
What kind of reaction has there been to your predictions?
Most of the reaction has focused on the difference in accuracy between those of us who studied the public opinion polls, and the "gut feeling" predictions of popular pundits and commentators. On Election Day, data analysts like me, Nate Silver (New York Times FiveThirtyEight blog), Simon Jackman (Stanford University and Huffington Post), and Sam Wang (Princeton Election Consortium) all placed Obama's reelection chances at over 90%, and correctly foresaw 332 electoral votes for Obama as the most likely outcome. Meanwhile, pundits such as Karl Rove, George Will, and Steve Forbes said Romney was going to win -- and in some cases, easily. This has led to talk of a "victory for the quants" which I'm hopeful will carry through to future elections.
How do you evaluate the algorithm used in your predictions?
My forecasting model estimated the state vote outcomes and the final electoral vote, on every day of the campaign, starting in June. I wanted the assessment of these forecasts to be as fair and objective as possible -- and not leave me any wiggle room if they were wrong. So, about a month before the election, I posted on my website a set of eight evaluation criteria I would use once the results were known. As it turned out, the model worked perfectly. It predicted over the summer that Obama would win all of his 2008 states minus Indiana and North Carolina, and barely budged from that prediction even after support for Obama inched upward in September, then dipped after the first presidential debate.
The amount of data used throughout this campaign both by independent analysts and campaign teams has been huge, what kind of implications does this have for data usage in 2016?
The 2012 campaign proved that multiple, diverse sources of quantitative information could be managed, trusted, and applied successfully towards a variety of ends. We outsiders were able to predict the election outcome far in advance. Inside the campaigns, there were enormous strides made in voter targeting, opinion tracking, fundraising, and voter turnout. Now that we know these methods can work, I think there's no going back. I expect reporters and campaign commentators to take survey aggregation much more seriously in 2016. And although Obama and the Democrats currently appear to hold an advantage in campaign technology, I would be surprised if the Republicans didn't quickly catch up.
Do you think that the success of this data driven campaign has meant that campaign managers now need to be an analyst as well as a strategist?
The campaign managers may not need to be analysts themselves, but they should have a greater appreciation for how data and technology can be harnessed to their advantage. Campaigns have always used survey research to formulate strategy and measure voter sentiment. But now there are a range of other powerful tools available: social networking websites, voter databases, mobile smartphones, and email marketing, to name only a few. And that is in addition to recent advances in polling methodologies and statistical opinion modeling. There is a lot of innovation happening in American campaign politics right now.
You managed to predict the election results 6 months beforehand, what do you think is the realistic maximum timeframe to accurately predict a result using your analytics techniques?
About four or five months is about as far back as the science lets us go right now; and that's even pushing it a bit. Prior to that, the polls just aren't sufficiently informative about the eventual outcome: too many people are either undecided or haven't started paying attention to the campaign. The historical economic and political factors that have been shown to correlate with election outcomes also start to lose their predictive power once we get beyond the roughly 4-5 month range. Fortunately, that still gives the campaigns plenty of time to plot strategy and make decisions about how to allocate their resources.