Sportsbet’s line for David Warner’s first innings score against Pakistan is 38.5 runs.
A punter who bets based on recent form might look to his average over the past ten innings, which is 39.8 runs.
Based on that, Warner covering the line is probably a little too close for comfort for most players.
Look a little deeper however. Those ten innings have resulted in scores of 42, 41, 11, 68, 97, 35, 1, 45, 11 and 47.
So he’s covered the 38.5 line in six of the ten innings: a 60% chance, which equates to odds of $1.66.
Too close for comfort? Far from it. That $1.88 now looks decidedly juicy, although 10 games is a very small sample size.
That’s just one quick illustration of the inherent limitations of averages as a measure.
Averages are simple. Everybody understands what an average is and how it works: it’s just the total divided by the number of events. Warner has scored 398 runs over those ten innings, hence the 39.8 average.
To illustrate further, let’s compare the data set to two possible alternative sets. They’re organised in order to make them easier to view:
Actual: 1, 11, 11, 35, 41, 42, 45, 47, 68, 97.
Alternate A: 2, 2, 6, 8, 24, 36, 37, 74, 96, 113.
Alternate B: 0, 0, 11, 39, 40, 46, 61, 61, 67, 73.
Three sets of data, all completely reasonable as an example of a batsman’s last ten innings. All totaling 398, all with an average of 39.8.
Using averages? There’s no difference whatsoever between the three sets.
But what about clearing the 38.5 line?
Actual: 6/10 = 60%
Alternate A: 3/10 = 30%
Alternate B: 7/10 = 70%
Three very different results there.
Alternate A is skewed with a large number of smaller scores, supplemented with a couple of larger ones. Alternate B is the opposite. But of course, the average doesn’t pick any of that up.
So what to do?
Advanced punters with a stats focus often use a range of statistical measures to give themselves a better view of actual probability.
Time to rewind back to high school maths… and you thought you’d never use this stuff!
Check out the other more popular measures, the median and the mode.
The median is the value that lies in the middle of a dataset when it’s arranged in ascending or descending order, as ours are. We have an even amount of scores, so there’s two middle scores. In this circumstance, the median is the point between the two middle scores.
The mode is the number which appears most often. In this context, we’re dealing with a large range of random numbers, so most will only occur once. So to make it more meaningful, we instead allocate the scores to ranges of ten and judge the most common value from there.
|40 – 49|
|0 – 9|
|60 – 69|
Look at the three measures together, and you get a much better view of the differences between the three sets.
And that’s just the beginning… dust off those school textbooks!