Tuesday, November 25, 2008

Why Ranking Violations are a Flawed Metric

I hate to ever criticize anything without being prepared to offer an alternative. So, I'll let you know up front, I will offer an alternative (at the end of this post).

But first, for those who might have no idea what "ranking violations" are, here is a very brief tutorial...

Let's say John Doe has made his own football rankings. Is there an easy way to see if they make sense? A popular approach is to calculate the frequency of "ranking violations." A ranking violation occurs when a loser in a played game is ranked higher than a winner. Now why, might you ask, would any rankings ever do that? The answer is, once you're about halfway into the season, there's no way around it. There is simply no way to rank teams such that winners are always ranked above losers. Eventually, some 2-5 team beats a 5-2 team and making the ranking violations go away becomes impossible. If you would like to see some ranking violation stats in action, check out Ken Massey's College Football Ranking Comparison page (scroll to the bottom of http://www.mratings.com/cf/compare.htm).

OK, enough on what ranking violations are. If you're still unclear, google it. Next...

Now, if we can't make ranking violations go away, then it would seem to make sense that we rank teams to keep them at a minimum, right? That way, we don't have to listen to folks invoke the "head to head" argument. I think I preached on that in another post, so I won't go down that road here. The short answer to should we minimize ranking violations is... "No."

So, I've made the beginnings of an argument in support of minimizing ranking violations and now I'm suggesting it's a bad idea. Why? The reason is that it's almost, but not quite, the best metric. The problem is a little complicated, so bear with me.

Let's take a sample problem. It's not terribly realistic, but it's been designed to make a point. We have three teams in a conference -- A, B, and C. Teams A and B play each other ten times during the regular season and A wins every time. I know this wouldn't happen in the real world, I'm only making the point that A is clearly better than B. If you have a problem with this, then the alternative is that A and B play a common set of opponents. Team A wins all of their games and B loses all of theirs. Better? OK, now introduce team C. C plays two games, beating team A and losing to team B.

Now time to rank the teams. Obviously, we rank A ahead of B. But what about C. We can minimize the ranking violations by ranking them above A (first in the conference) or below B (last in the conference). Strange, our minimum ranking violations approach has clearly shown us that team C is probably either the best or the worst in their conference, but probably not in the middle. If this makes sense to you, then quit now -- there's no hope. Otherwise, read on...

OK, it would seem reasonable (both subjectively and from a "maximum likelihood" viewpoint -- we won't dive into the math on that here), that team C probably belongs between A and B, but how can we express that mathematically. The solution I propose is an alternative to ranking violations that I've dubbed "record violations" (I have also referred to it as schedule violations). It goes like this...

Team C's record is 1-1. By ranking them between A and B, one opponent is ranked higher (what I'll call the "higher") and one is ranked lower (the "lower"). Thus, their lower/higher is 1-1. Because their W/L (win-loss) matches their L/H (lower/higher), they have zero record violations.* You can check out our L/H numbers on our Atomic Football ranking page.

I first proposed this metric to Ken Massey in late 2006, and I'm hopeful he will find the time to add it to his comparison page. Here is the text from my original message:



I wanted to suggest a variant on the ranking violation metric.

Consider a team that has beaten #13, #15, #17, and #19 and lost to #1, #3, #5, and #6**. In addition the team has beaten #9 and #11. Being 5-5 against teams of average rank #10, 1-4 against teams ranked #1-#9 and 4-1 against teams ranked #11-#20, it would seem reasonable to rank this team #10.

However, doing so yields two ranking violations. One of the violations could be alleviated by moving the team up to #8 or down to #12. This is obviously a counterintuitive situation (and one I discussed in my recent paper). Now consider an alternative metric.

If we retain the #10 ranking, then this hypothetical team is 5-5 against 5 teams that are ranked higher and 5 teams that are ranked lower.

Thus if Wins-Losses is the same as Lower-Higher (lower being the number of teams*** ranked lower and higher being the number ranked higher), then we would say that we we have zero "Record Violations" (if you have an alternative name, please let me know). In other words, with this metric we will allow a ranking violation corresponding to a win against a higher ranked team to cancel a ranking violation corresponding to a loss against a lower ranked team. Thus, for this team we find:

Rank RankingViolations RecordViolations

As you can see, ranking violations have two local optima, whereas record violations do not.

To put things on the same percentage scale as our traditional ranking violation [sic], we will continue to normalize by the number of games since the maximum number of record violations for a given team is equal to the number of games played by that team.

Obviously, record violations will always number [sic] equal to or less than ranking violations since we begin with the rankings [sic] violations but allow some to cancel out others. The purpose of this metric is to prevent the obviously nonsensical situation mentioned above in my opening example. For this reason, I think it is a slightly superior metric. I would certainly love to see the results of it on your comparison page by year's end. If you do choose to employ this metric, I would also appreciate a reference. Lastly, I did not get a reply from you on my previous message. I know this is a busy time for you, so I understand...

Thanks for all your hard work in this most important field of endeavor (I say this tongue in cheek, of course).



*For those who might run with the math, yes, if you consider the record violations for all three teams, you get a minimum of two violations for any of these orderings -- ABC, ACB, or CAB. The point is, record violations, unlike ranking violations, don't force you to one of the extremes.
**This was supposed to say #7.

Saturday, November 15, 2008

The "Best" Team

How often do we hear fans complaining because the "best" team(s) didn't get to play in the conference championship, or the "best" team wasn't ranked number one, or the "best" team didn't make a BCS bowl. Hmmm. What does it mean to be the "best?" The problem is, if you don't think about it too much, it seems pretty easy. It's obvious. The "best" is the "best," right? How hard can it be?

If you're content to have "best" be simple and obvious, skip the rest of this. Otherwise, read on...

Is the "best" the team that on average did better than any other over the entire season? Is an opening loss as bad as losing the last game of the season? What if your team has Heisman contenders at QB and RB and they both get injured in the waning seconds as your team wins the final game of the regular season? Better yet, what if they went undefeated against the toughest schedule in the country? Are they still the "best" team -- right now, that is? Have they earned the right to play for the national championship anyway, even if their star players will be watching from the sidelines? And what about consistency? Team A plays a very tough schedule and beats every opponent by less than a touchdown. Team B plays the same schedule, whips every opponent by four touchdowns except one who beats them by a field goal in overtime. Which one is best, A or B? If scoring matters, then can you make up for a loss one week by running up the score next week? If it doesn't matter, then why do we invoke it so often when trying to prove our case about who is better? Why do we appeal to it as a "tiebreaker" when W/L and SoS aren't enough? Lots of questions. How about some answers...

The bottom line, in my opinion, is that there should be a standard. Otherwise, you have something like this...

You're taking a class at school. Your teacher informs you that in the upcoming test, problem #1 will count 90% of your grade. On test day, you skip #1 and work all the other problems. When the graded test comes back, you have a 10% grade -- you aced all of the problems you worked. Now you complain -- "but I got ALL BUT ONE of the problems right." "Doesn't matter," the teacher says, "the standard is what it is." So would it be better to have no standard? You have no idea what the teacher wants. He might only give credit for spelling your name right, or maybe you'll get points for turning in a blank test so that he can reuse it next year. You really have no idea what it is you're supposed to do.

At least when there's a standard, it is at least fair, and no one really has a right to complain. To strive to achieve in areas the standard does not emphasize is simply to fail.

Let's look at it another way... In each football game, we have a standard -- the team with the most points wins. There are no points for yards, takeaways, completed passes, fewest penalties, etc. The standard is clear -- most points wins. To do anything else and then complain about it is ridiculous.

Before I take the next step, let me state this clearly -- the BCS has been a huge step in the right direction. "Yeah, but wouldn't a playoff be better?" you ask. Well, the BCS IS A PLAYOFF. Think about it. Playoffs are when you select (by whatever standard) some number of the "best" teams and let them "play-off" until only one remains. Before the BCS, that number was ZERO. With the BCS, it is now TWO. That's a step in the right direction, right? Would four be better? I think so. Eight? Maybe. The top ten where six get a bye? I'd consider it.

Now back to the standard. We're talking about COLLEGE football, right? Colleges. Places where there are supposed to be a lot of smart people, right? Couldn't all of those smart people figure out some absolute standard they could all agree to. One that's full and open. Granted, it wouldn't be quite as simple as what we have in an individual game (most points wins), but if we had a "formula," if you will, that everyone agreed to, then there would be no questions about what needed to be done. The college computer science departments could run what-if scenarios and know ahead of time who needed to beat whom to achieve a certain rank, or make the playoffs, or make the championship game. I could go on, but I'll resist the temptation and stop here.

Tuesday, November 11, 2008

Overtime Alternative

Before I start... a warning.  Your first reaction to this suggestion will probably not be a good one.  Let me suggest you chew on it a little before rejecting it outright.  Here goes...

If the clock runs out in regulation with the score tied, turn the clock off, continue playing (i.e., no coin toss and kickoff), and play to sudden death (first to score wins).  There, I told you that you wouldn't like it.  Now let's kick it around a bit.

Situation A:  One minute left.  Team A has the ball, is trailing by 3 points, and chooses to play for a tie.  If they tie the game (in regulation), their opponent will NOT have to work against the clock -- they can take as long as they like to try to make the go-ahead score.  Once the clock runs out, first to score wins.

Situation B:  One minute left.  Team A has the ball, is trailing by 3 points, and chooses to play for the win.  If they take the lead (in regulation), their opponent must tie or go ahead before the clock runs out.

What are the advantages?  First of all, by removing the coin toss to start overtime, you remove a random and arguably unfair element of the game.  In regulation, each team got to receive once -- fair enough.  Winning the coin toss gives such an advantage, why not just use the coin toss alone to decide the winner?  Second, you have new elements of strategy to consider (see above).  Third, in a tie, the team with the ball at the end of regulation gets to keep it -- why not reward them for having possession at that point?

Feedback is welcome, but be sure to mull it over a bit first.

Friday, June 20, 2008

Simple Head-to-Head

An irate fan calls into the local sports talk radio station Monday evening so angry he can barely get the words out. Between the coughing, gagging, and spitting, he manages to say: “This is ridiculous. It’s nonsense. How can they rank the Cartersville Skunks #5 ahead of my Waynesboro Lemmings. The Lemmings beat the skunks 17-14 the third week of the season. The rankings are stupid. It’s simple head-to-head. Simple head-to-head. I’ve got nothing more to say.” Click.

Amazingly, there are more than a few fans who think they’ve got it all figured out. The problem is, they’ve never taken a pencil and tried to do what they insist makes so much sense – just rank the teams so that the winners in each game are ranked higher than the losers.

Funny thing is, early in the season this is still possible, and yet fans fuss at the rankings because they don’t want to believe that the few games played are actually representative of how good (or bad) their team is. However, by midseason, this kind of ranking is no longer possible. Sooner or later, team A beats team B who beats team C who beat team A. Or some team beats a 10-1 team and loses to a 1-10 team.

As geeks who do computer rankings, we do understand the frustration. In fact, in the “ranking community,” we even have a lingo to describe all this stuff – “retrodiction,” “ranking violations,” “ranking by pairwise comparison,” and the like.

Before you’re tempted to call your local sports talk radio program, let me offer up a slightly different way to assess a set of rankings. Suppose a team is 12-0 at the end of the season. It would be reasonable for them to be ranked somewhere above all twelve of their opponents. Another team goes 0-12. It would seem reasonable for them to be ranked somewhere below all twelve of their opponents. Consider another team that goes 6-6. Would it not seem reasonable for them to be ranked above six of their opponents and below six others? Here’s the catch. Would this not seem reasonable even if this team actually beat one or two of their higher ranked opponents while losing to one or two of their lower ranked opponents? After all, teams have good days and bad days. Upsets are what makes football exciting, right?

For what it’s worth, any decent ranking method will approximately do just this? Why not exactly this, you might ask? Well, consider this one example of why it can’t always be done. Suppose no one goes undefeated. Someone must still be ranked #1. Since they have one loss, they are ranked above a team to whom they lost.

There is another problem with this scheme. If a team goes 12-0 against a schedule that includes no top 25 opponents, exactly how high should they be ranked? Somewhere between #1 and their best opponent, but where?

The bottom line is that ranking football teams is a really hard problem. Probably harder than any other sport because the teams play so few games.

Back to the irate caller. He’s got it all figured out. We just move the Lemmings up to #4. Of course, the Lemmings lost to the 4-6 Hedgehogs, so we’ll have to move the Hedgehogs up to #3. But the Hedgehogs lost to six other teams, and we don’t have enough slots for them, so we’ll have to move the Hedgehogs, Lemmings, and Skunks down to make room. Wait, one of those teams was the Skunks. I thought this was SIMPLE head-to-head.

Thursday, June 19, 2008

On the Current BCS System

The current BCS system.  Exactly what is it?

I would characterize it as a two-team playoff system.  As such, it's better than the old pure poll-based system of days gone by (AP, UPI, etc.) where we never really knew who the national champion was, but it's not nearly as much fun as the playoffs enjoyed by the other divisions and associations (FCS, II, III, and NAIA).

So, as much as most FBS fans call for a "real" playoff system (i.e., four or more teams) and criticize the BCS in the same breath, I still think the BCS is a step in the right direction.  After all, a two-team playoff is better than no playoff at all.

Monday, January 21, 2008

Rankings and Predictions

College Football Rankings.  What else evokes such emotions among people and yet is so relatively unimportant in the grand scheme of things?  I happen to be a recovering college football fan.  No longer am I sick for a couple of days when my team loses.  I suppose I have replaced one passion for another.  Through some series of events over the last ten to fifteen years, I have developed an obsession(?) with ranking teams and, more recently, predicting games outcomes.  With the help of coworker and friend, Paul Colvert, Atomic Football was born several years ago.

Our website, Atomic Football, has become a labor of love.  Not exactly tons of dynamic content -- it is just a hobby and both Paul and I have families that come first -- but you will find some unique stuff.  There are two items on our web site of which we are extremely proud.  The first is our win-loss rankings.  To our knowledge, they appear to be the only win-loss rankings where the emphasis on strength of schedule is calculated, not set as some subjective input.  If you would like the details, we have documented them in nauseating detail in a technical paper available at Arxiv.  The second item is our predictions.  This season (2007) marked the first year we became very serious about producing good quality predictions.  After getting a few things ironed out in the first four weeks, we finished strong.  On Todd Beck's "The Prediction Tracker," we became only the second competitor in the last eight years to best "the line" in points accuracy (as measured by mean square error) for the second half of the season.  Not too bad for our first try.  Furthermore, if you threw in weeks 5, 6, and 7, we would have still been first.  Anyway, more later.  My kids need help with their school work.