Since I don’t gamble, you might wonder why I care. Well, I do have a competitive streak, and I enjoy the challenge of trying to generate good football predictions. However, I soon discovered that there are some on the web (I won’t name names) who lean heavily on the betting line to “improve” the points accuracy of their predictions. The net effect of this is that various metrics commonly used to evaluate the quality of a set of predictions (such as the metrics on Prediction Tracker like “mean square error” and “percent correct”) tend to look pretty good when one’s predictions are essentially a “fuzzed up” version of the betting line. The only metric that does not benefit from this tactic is, not surprisingly, one’s win percentage “against the spread.” However, since most predictions are no better than a coin toss against the line anyway, I realized that it is actually rather difficult to tell who is particularly good at doing predictions and who isn’t. So, I began to wonder if maybe, buried up in all the noise, there was some indicator that some predictions had, in fact, some added value over and above the betting line. The answer turned out to be “yes.”
Anyway, I began to run with the aforementioned model and soon found that I could generate an expected probability of win “against the spread” that was a better long-term indicator than the actual win percentage itself. If you want to know more on this, email me. Next, while the expected win probability was interesting, it failed to account for the number of games and didn’t produce a true “significance measure.” So, I translated it to the following metric that I call a “significance score:”
where L is the line, P are the predictions, S are the actual outcomes (as spreads -- home score minus away score), N is the number of games, angled braces indicate averages over the N games, and the function indicated by the Greek phi is the standard normal cumulative distribution function (CDF). The equation is an approximation (albeit a very good one) that assumes that the difference between P and L is relatively small compared to the other two differences.
If a set of predictions is essentially random noise (or any mix of the betting line and noise), the argument in the CDF above will tend to be a standard normally distributed random variable (mean zero, standard deviation one). As a result, P will be uniformly distributed between zero and one (or 0% and 100%). If, however, a set of predictions can manage to eliminate a source of error not corrected by the line, then the argument in the CDF will tend to be increasingly positive and P will tend to be higher than 50%, potentially even much higher.
OK, so what about the results? I pulled in the data from Todd Beck’s Prediction Tracker for the period from Week 5 of the 2007 Season through the end of the 2009 Season. I excluded the first four weeks of the 2007 season since I had just had Atomic Football added to Prediction Tracker and was still monkeying with the algorithm during that period. Since week 5 of 2007, my algorithm has changed relatively little. Note that the numbers that follow were done against the opening lines. Again, I wanted that for comparison since I publish Atomic Football’s predictions on Sunday (sometimes very early) and generally do not update them during the week. Most other participants on Prediction Tracker also publish early in the week and don’t update them thereafter. So, here you are…
System P
Stat Fox 99.9948%
Atomic Football 99.9945%
Edward Kambour 99.971%
Nutshell Sports 99.76%
Stephen Kerns 99.71%
Nutshell Sports Retro 98.6%
Born Power Index 98.0%
Pigskin Index 97.9%
Moore Power Ratings 97.8%
Sagarin Predictive 96.6%
System Median 96.4%
Keeper 96.3%
System Average 95.7%
Super List 95.2%
Dokter Entropy 93.9%
Dunkel Index 93.7%
Lee Burdorf 93.6%
Dave Congrove 93.1%
CPA Rankings 93.1%
Ashby AccuRatings 90.4%
Covers.com 89.1%
Least Squares 89.0%
Bassett Model 87.6%
Bihl System 86.3%
Harmon Forcast 85.1%
Massey BCS 80.8%
Tom Benson 80.6%
Laz Index 71.1%
Howell 66.8%
Massey Consensus 65.5%
Beck Elo 63.8%
Marsee 63.7%
Hank Trexler 59.8%
Logistic Regression 57.2%
Least Squares w/ HFA 57.1%
Sagarin 53.8%
The percentile column indicates the relative difficulty of achieving the performance strictly by chance. For example, a score of 99% indicates a level that could be exceeded by chance alone only one time out of one hundred. Oh, and if you’re curious, the top six systems all scored greater than 92% against the updated (“Saturday morning”) line, with the next closest being below 82%. By the way, the average score across all the systems (including those not listed above) against the updated line was 44.5%, indicating that the average computer prediction may even be worse than a coin toss come game day.
As I stated in my open, I don’t gamble. One reason is that a lot of other very interesting things come out of this model. I won’t go into details here except to say that even when one can “expect” (in the statistical sense) a positive net return, there is a serious problem with managing risk. In the end, managing the volatility means limiting returns to the point that eventually the stock market still looks better. So, I still prefer to bet on teams like Walmart or Apple.