Are we being shortchanged in our Celebrations tubs?

20151220_132141-01.jpeg

My final, new improved bar chart with a different tub of Celebrations. Note the purpose made graph paper

Yesterday I posted a tongue-in-cheek picture on Facebook of a bar chart (not the one above) that I made up of sweets in a Celebrations tub. It was a riposte to a pie chart that Simon Brew had done here. As someone who likes good data visualisation I find pie charts nearly always to be worse than a bar chart – if you want to know why read Edward Tufte or this or this.

Below, on the left is Simon’s pie chart, and on the right is the bar chart I did with my own tub of Celebrations. My bar chart shows I’m clearly being short changed on the Malteasers (which I love) and there are far too many Bounty bars (who likes Bounty bars?!?).

pie_chart

Simon’s bar chart

12390885_10156263641790507_4443434368670656197_n

My initial alternative bar cahrt

 

 

 

 

 

As a trade unionist I believe in fairness and social justice. And in the case of sweets I believe in #FairCelebrations and this simple analysis throws up some serious questions about whether or not we are being short-changed.

Although Simon did an analysis of a few tubs of Celebrations (he did not say how many) it’s clear from his post it was not enough to be statistically valid. We need to get the right sample size to be absolutely sure of any assessment of whether or not we have #FairCelebrations in our tubs.

If, for the sake of argument, we assume there are 1 million single Celebrations sweets eaten this Christmas, how many tubs do we have to to check to ensure we are 95% certain (the confidence level) of the numbers of each type of sweet to within ±1 sweet (the confidence interval)?

This website allows you to plug in the numbers and it gives 9,513 sweets. Simon has determined from his analysis that there are 82.4 sweets in each tub, so this gives us 115 tubs we need to check.  Even though I’m a tubby guy with a sweet tooth this is too many tubs for me to eat. But I am hoping to gather some crowd sourced data from you and others dear reader.

I propose we start a citizen science project to analyse 115 tubs to check if we have #FairCelebrations. If we can get the data for 115 tubs we can be 95% certain of our findings. And if we find an unfair distribution then we have the evidence to take on manufacturer and demand #FairCelebrations for all.

I’ve actually improved the way the bar chart is constructed by creating some purpose made graph paper. Take a look at the picture at the top of this post and you will see the contents of a Celebrations tub placed neatly in the graph paper. I think this this looks far better than my first attempt.

So, to the crowd sourced Celebrations data…

  1. Buy a tub of Celebrations
  2. Download the Word document that has the graph paper here. You will need need some scissors and tape to make the full continuous sheet. This whole step is optional but it will make things easier. Alternatively you could create a grid on the back of some wrapping paper.
  3. Empty the tub and count out each of the Celebrations.
  4. Then put them on the graph paper, but start on the left with the sweets that have the most and then move to the right in descending order. If you do it like this it makes the graph quicker an easier to read – check out the picture at the start of this post.
  5. Take a picture of it.
  6. The either tweet the picture or post it in the comments section below. When you tweet the picture please use the protocol below.
  7. Eat the tub of Celebrations
  8. Return to point 1 above and restart the whole process

The Twitter protocol…

When you tweet your picture please start it with “My #FairCelebrations data for @RaviSubbie” and then post your picture.

I will have your data which will have been openly published which means we will be applying good open data principles that allows the data to be checked. I will put your data into a spreadsheet and publish the results on this blog. It will then be subject to the peer review of the hive mind of the internet.

This is important work. But someone has to do it.

Spreadsheet to calculate which Labour Leadership candidate to vote for

I was intrigued by this blog post from Benjamin Studebaker that proposed a utility theory based model to help people to choose which candidate to go for in the Labour Leadership contest. So just for fun I knocked up a little spreadsheet to do the sums, using Studebaker’s model.

This is only really of interest for those people who are trying to balance off the competing factors of electability versus each candidate’s policy offer. If you have strong views on the candidates or the issues, the model is not really for you. But if you are undecided it is worth having a look at. Before playing with the model, I’d suggest you read Studebaker’s blog post here.

All you need to do is enter your own values in the boxes with red text below and the sums will be down for you and the graph will be updated automatically.

If you want to use it on a phone or tablet you need to double tap one of the red cells to get it to bring up the keyboard.

Scorecard on my general election predictions is D- and the lesson learned is GIGO

The other day I made a series of general election predictions, and in the spirit of self-flagellatory repentance I will score myself. So here goes…..

Vote Share – what I said about Labour and Tories: Labour vote share will be 0.5% higher than the Poll of Polls to reflect the better ground operations of Labour, and the Conservatives will be correspondingly 0.5% lower. [This would have given Lab = 33.8% and Con = 33.2%]

The actual vote share was: Lab = 30.9% and Con = 36.9%. This is a fail and gets no marks.

On Ukip vote share I said: Ukip vote share will go up on the Poll of Polls vote share because of the “shy Ukippers” who don’t like to tell pollsters they will vote Ukip. But they will also lose vote share because of their much poorer GOTV operation. Overall these two things will cancel out and mean their vote share is as per the Poll of Polls. [This is 13.4%]

The actual vote share was: Ukip = 12.6%. This is not a total fail and is worth a mark.

On Scotland I said: As with the “shy Ukipper” effect in England there will be a “shy Labour” affect in Scotland (similar to the “shy no” in the independence referendum) which along with tactical voting will mean Labour won’t suffer total wipe out and get between 3 and 7 seats.

The actual result was: Labour won one seat. Not a total fail but only just worth a generous half a mark for it not being total wipe out for Labour.

Shock # 1 predicted was Alex Salmond: Alex Salmond will get run very close in Gordon. I predict whoever wins will have a majority less than 1,000.

What happened: He won convincingly by over 6,000 votes increasing the share of the vote. My prediction was way off and gets no marks.

Shock # 2 predicted on the Lib Dems: Clegg will hold his seat but end up being removed as Lib Dem leader after the Lib Dems end up with fewer than 25 seats and what remains of his party disagree with his attempts to form a coalition with the Conservatives.

What happened: Clegg held his seat, the Lib Dems dropped to 8 seats, and Clegg is no longer leader but for different reasons to my prediction. I’ll be very generous and award myself half a mark.

Shock #3 with Farage: Farage will not win his seat.

What happened: The only prediction I got totally right.

I would give myself a very generous D- for getting a few of the more minor predictions close, but on the big prediction, the Lab-Con vote share, it was an abject fail. So overall it has to be a fail, hence the D-.

Before becoming a trade union official I worked for 10 years as an engineer. I became quite proficient at monitoring engineering data and then analysing it through different mathematical models. One acronym I always kept in mind (one that all data analysts will know) is GIGO – Garbage In, Garbage Out.

I believe the methodology of the Polls of the Polls mathematical model was sound, but the information put into the model (the results of the opinion polls) was clearly garbage and I should have remembered GIGO as a possible source of error.

The only thing that spares my blushes is that so many other people got it wrong too. But this guy actually did predict the Shy Tory phenomenon based on looking at opinion polls and actual election results going back 50 years. He identified a number of issues, one of the most crucial being a tendency of the opinion polls to overstate Labour and understate the Conservatives.

His blog post (done before the election was held) is a must read for anyone wanting to understand what went on.

For my part I will stick to my general maxim when a predictive model is shown to be so badly wrong, I will only ever put any faith in it if there is strong evidence that suitable steps have been taken to correct the fundamental flaws. Given the true test of a predictive model is the real outcome it is trying to predict, it’s going to be hard to believe any opinion polls until they are judged against the outcome of the 2020 general election.

Feynman on Knowing versus Understanding

The title of this blog is a nod to something the great physicist and Nobel laureate Richard Feynman once said. Here is a great clip one of his lectures where he talks about the difference between knowing and understanding. It gives a good insight into how a theoretical physicist can hold several different theories in their head to explain a phenomenon, but  not know which is the correct explanation.

He also gives a great example of ancient Mayan astronomers who were able to predict astronomical phenomena very accurately but had no underlying theory for what the physical reality that produced these phenomena, e.g. they did not know what the moon was but they could accurately predict its phases.

Please excuse his assumption that physicists are men – he was very much a product of a different time.