Polls of voting intentions are published regularly, with several polls a week often appearing. I’ve built a little mathematical model that looks at the totality of several weeks’ opinion poll results and tries to make a more accurate prediction of the current state of the voting intention of the British public. This should give a more reliable indication of voting intentions than just relying on one (often cherry picked) poll.
The graph above shows the “Poll of Polls” result for 22 December 2014 (the date of the last opinion poll done before Christmas). I will aim to update this “Poll of Polls” regularly as new polling results are published.
For more info on how I created this model see below.
Purpose of the “Poll of Polls”
UK political opinion polls are published several times a week. Obviously the results of the polls will vary over time as public opinion varies. But even with polls taken at the same time there are significant variations. These variations are usually because of:
- differences in sampling methodology across polling companies;
- polling results that fall in the edge of the upper or lower bounds of the margin of error for that particular poll;
- differences across pollsters in the way the polling questions are asked.
Partisan supporters of political parties cherry pick the polls they tweet or publicise, as do newspapers who invariably give undue prominence to the polls they have commissioned, whilst ignoring other polls. This “Poll of Polls” tries to give a more balanced picture of the results of the opinion polls.
I don’t believe the method I have used is the only method, or the best method. But I do believe it is valid. Others may find fault with what I’ve done or suggest better models. I’m happy to hear constructive comments.
In the spirit of openness I should disclose I am a trade union official and a Labour Party member (and activist), so I desperately want Labour to win the 2015 general election. But I am also a proud geek and like to see the correct use of numbers and data, and I am committed to this being a statistically robust piece of work that will give a fair picture of the state of voting intentions in the UK.
Hence, I’ve outlined what I’ve done and how, so that anyone can comment on this work and provide constructive criticism.
Inspiration for this “poll of polls”
The US statistician and sports journalist Nate Silver created a stir by predicting 49 of the 50 states in the 2008 presidential election. He achieved this remarkable feat by using a Bayesian analysis of the opinion polls in the lead up to the election. Bayesian analysis is done by making a prediction not just based on the most up-to-date information at hand (e.g. the latest opinion poll), but also looks at any prior knowledge that you may have (e.g. past opinion polls).
My inspiration for this “Poll of Polls” comes from the work Nate Silver did, but I am in no way comparing myself to him. My knowledge of statistics is quite rudimentary hence the model I have developed is quite basic. But I do have high level (non VBA) Excel skills and am happy to do the data wrangling required to get a basic analysis together.
So this work should be viewed as that of an “enthusiastic amateur” and not an expert. I am sure a more complex model could be built, but increasing complexity does not always mean increasing accuracy.
The list is regularly updated, and as it gets updated I will update my “Poll of Polls.”
You can download the spreadsheet that did the calculations by clicking here.
How the analysis was done
The analysis is a form of Bayesian analysis where the prediction of the election outcome is done by, not just looking at the most-up-to-date opinion poll, but also by looking at past opinion polls as well.
There is well established commentary of why this is a legitimate thing to do. Firstly, Nate Silver’s own analysis of the 2008 presidential elections shows that a Bayesian analysis can be more accurate. Secondly, it is an often made comment by political pundits that it is not any one particular opinion poll that matters, but it is the (average) movement in the polls that matters most.
This analysis was done using a “weighted average” of opinion polls using two fixed parameters that affected the way the weighted average was done. The two parameters are:
- The period (in days) the weighted average is done for – the analysis uses a period of 60 days, i.e. only polls going back 60 days from the most recent poll are used in the averaging.
- Half-life for each poll’s results – the analysis uses a half-life of 30 days, i.e. the actual values for each poll are reduced using a 30 day half-life depending upon how long ago from the most recent poll the poll in question was carried out.
Note the spreadsheet has a sheet called “Model Inputs” where the values for these two parameters can be varied and the graphs and data sheets are automatically updated.
Because of the way the maths works, the “sum of the weighted averages” does not equal 100 per cent, which of course it should. It actually comes to less than 100 per cent. So the weighted average for each party is then worked out as the percentage of the “sum of the weighted averages” to give the “true” weighted average poll rating for each party over the period in question. If you don’t understand this bit, don’t worry. If you do understand this bit then if you check the maths you will see it is a reasonable thing to do.
A further note on the methodology
According to the Wikipedia entry about Silver’s 538 blog, the weightings used in the predictions are weighted using a half-life of thirty days using the formula 0.5P/30 where ‘P’ is the number of days transpired since the median date that the poll was in the field. The formula is based on an analysis of 2000, 2004, 2006 and 2008 state-by-state polling data.
I have not had time to do a similar analysis of past election results versus past polling data to determine a UK specific half-life. So in the absence of any other data I have chosen a half-life of 30 days as well. If I get time to do the analysis of past results I will alter this.
I have also chosen a period of 60 days (chosen to be twice the half-life) to carry out the weighted average calculation. I do not know if Nate Silver has a cut-off as with my model, but it seems sensible to me to have some form of cut-off where past voting intentions are ignored.
I am sure that Nate Silver’s predictive model is more sophisticated than mine, hardly surprising as he is a professional statistician and I am not. But I do believe my model is valid and gives a far better prediction of voting intentions than just looking at the most recent poll.
Note the spreadsheet has a sheet called “Model Inputs” where the values for these two parameters can be varied and the graphs and data sheets are automatically updated. This means a sensitivity analysis on the predictions can be done by varying these parameters.
Why no figures for the Greens or SNP?
I’ve not excluded the Greens and the SNP for partisan reasons; it is solely because the historic data I’ve used from the UK Polling Report website only gives the parties listed in this analysis. Remember I need historic data as the prediction relies upon past opinion polls as well as the most recent ones.
If anyone can point me to historic data (that I can easily copy and use) that gives figures for a wider range of parties (time permitting) I’m happy to try to update my model.
This won’t predict the outcome of the General Election in 2015
Predicting the outcome of the 2015 General Election is a very tricky business with the definite shift from traditional three-party politics. UKIP, the SNP and the Greens will get a significant number of votes and in some constituencies could either win them, or “steal votes from other parties” in such a way that it impacts on which party wins a particular seat.
The results in Scotland and the 100 or so key marginals will have a huge impact on who forms the next government. To get a view on the key marginals it is worth checking out the periodic polling that Lord Ashcroft does in these seats.
Despite the issues outlined above, I believe this “Poll of Polls” is useful as a major part of the political narrative focusses on the overall vote share for each of the four parties used in this model. Despite any shortcomings, this model gives a more balanced view of likely voting intentions than looking at any one particular poll.