Football Outsiders Information
...because you need to know
blog advertising is good for you
Methods To Our Madness
THE SHORT VERSION:
DVOA is a method of evaluating teams, units, or players. It takes every single play during the NFL season and compares each one to a league-average baseline based on situation. DVOA measures not just yardage, but yardage towards a first down: Five yards on third-and-4 are worth more than five yards on first-and-10 and much more than five yards on third-and-12. Red zone plays are worth more than other plays. Performance is also adjusted for the quality of the opponent. DVOA is a percentage, so a team with a DVOA of 10.0% is 10 percent better than the average team, and a quarterback with a DVOA of -20.0% is 20 percent worse than the average quarterback. Because DVOA measures scoring, defenses are better when they are negative. For more detail, read below.
Please feel free to contact us with questions and comments about our original statistics using the contact form.
- DVOA EXPLAINED
- DYAR EXPLAINED
- SPECIAL TEAMS STATS
- DRIVE STATS
- ADJUSTED LINE YARDS and other offensive/defensive line stats
One running back runs for three yards. Another running back runs for three yards. Which is the better run? This sounds like a stupid question, but it isn’t. In fact, this question is at the heart of nearly all of the analysis on Football Outsiders.
Several factors can differentiate one three-yard run from another. What is the down and distance? Is it third-and-2 or second-and-15? Where on the field is the ball? Does the player get only three yards because he hits the goal line and scores? Is the player’s team up by two touchdowns in the fourth quarter, and thus running out the clock; or down by two touchdowns, and thus facing a defense that is playing purely against the pass? Is the running back playing against the porous defense of the Raiders, or the stalwart defense of the Bears?
Conventional NFL statistics value plays based solely on their net yardage. The NFL determines the best players by adding up all their yards no matter what situations they came in or how many plays it took to get them. Now, why would they do that? Football has one objective -- to get to the end zone -- and two ways to achieve that -- by gaining yards and achieving first downs. These two goals need to be balanced to determine a player’s value or a team’s performance. All the yards in the world won’t help a team win if they all come in six-yard chunks on third-and-10.
The popularity of fantasy football only exacerbates the problem. Fans have gotten used to judging players based on how much they help fantasy teams win and lose, not how much they help real teams win and lose. Typical fantasy scoring further skews things by counting the yard between the one and the goal line as 61 times more important than all the other yards on the field (each yard worth 0.1 points, a touchdown worth 6). Let’s say Larry Fitzgerald catches a pass on third-and-15 and goes 50 yards but gets tackled two yards from the goal line, and then Beanie Wells takes the ball on first-and-goal from the two-yard line and plunges in for the score. Has Beanie Wells done something special? Not really. When an offense gets the ball on first-and-goal at the two-yard line, they're expected to score a touchdown five out of six times. Wells is getting credit for the work done by the passing game.
Doing a better job of distributing credit for scoring points and winning games is the goal of DVOA, or Defense-adjusted Value Over Average. DVOA breaks down every single play of the NFL season, assigning each play a value based on both total yards and yards towards a first down, based on work done by Pete Palmer, Bob Carroll, and John Thorn in their seminal book, The Hidden Game of Football. On first down, a play is considered a success if it gains 45 percent of needed yards; on second down, a play needs to gain 60 percent of needed yards; on third or fourth down, only gaining a new first down is considered success.
We then expand upon that basic idea with a more complicated system of “success points,” improved over the past few years with a lot of mathematics and a bit of trial and error. A successful play is worth one point; an unsuccessful play, zero points with fractional points in between (e.g., eight yards on third-and-10 is worth 0.54 “success points”). Extra points are awarded for big plays, gradually increasing to three points for 10 yards (assuming those yards result in a first down), four points for 20 yards, and five points for 40 yards or more. Losing three or more yards is -1 point. Interceptions occurring on fourth down during the last two minutes of a game incur no penalty whatsoever, but all others average -6 points, with an adjustment for the length of the pass and the location of the interception (since an interception tipped at the line is more likely to produce a long return than an interception on a 40-yard pass). A fumble is worth anywhere from -1.7 to -4.0 points depending on how often a fumble in that situation is lost to the defense -- no matter who actually recovers the fumble. Red zone plays get a bonus: 20 percent for team offense, five percent for team defense, and 10 percent for individual players. There is a bonus given for a touchdown, which acknowledges that the goal line is significantly more difficult to cross than the previous 99 yards (although this bonus is nowhere near as large as the one used in fantasy football).
(Our system is a bit more complex than the one in Hidden Game thanks to our subsequent research, which added larger penalties for turnovers, the fractional points, and a slightly higher baseline for success on first down. The reason why all fumbles are counted, no matter whether they are recovered by the offense or defense, is explained in FO Basics.)
Every single play run in the NFL gets a “success value” based on this system, and then that number gets compared to the average success values of plays in similar situations for all players, adjusted for a number of variables. These include down and distance, field location, time remaining in game, and the team’s lead or deficit in the game score. Teams are always compared to the overall offensive average, as the team made its own choice whether to pass or rush. When it comes to individual players, however, rushing plays are compared to other rushing plays, passing plays to other passing plays, tight ends to tight ends, wideouts to wideouts, and so on.
Going back to our example of the three-yard rush, if Player A gains three yards under a set of circumstances in which the average NFL running back gains only one yard, then Player A has a certain amount of value above others at his position. Likewise, if Player B gains three yards on a play on which, under similar circumstances, an average NFL back gains four yards, that Player B has negative value relative to others at his position. Once we make all our adjustments, we can evaluate the difference between this player’s rate of success and the expected success rate of an average running back in the same situation (or between the opposing defense and the average defense in the same situation, etc.). Add up every play by a certain team or player, divide by the total of the various baselines* for success in all those situations, and you get VOA, or Value Over Average.
The biggest variable in football is the fact that each team plays a different schedule against teams of disparate quality. By adjusting each play based on the opposing defense’s average success in stopping that type of play over the course of a season, we get DVOA, or Defense-adjusted Value Over Average. Rushing and passing plays are adjusted based on down and location on the field; passing plays are also adjusted based on how the defense performs against passes to running backs, tight ends, or wide receivers. Defenses are adjusted based on the average success of the offenses they are facing. (Yes, technically the defensive stats are actually “offense-adjusted.” If it seems weird, think of the “D” in “DVOA” as standing for “opponent-Dependent” or something.)
The final step in calculating DVOA involves normalizing each year's ratings. As you may know, offensive levels in the NFL have gone up and down over the years. Right now, the overall level of offense in the league is probably at its highest level of all time. Therefore, we need to ensure that DVOA in a given season isn't skewed by league environment.
For teams, DVOA is normalized so that league averages for offense and defense are 0%. (However, because pass plays are more efficient than run plays, league averages for team passing and team rushing are not zero.) For players, DVOA is normalized separately for individual passing, individual rushing, and the three individual receiving groups (wide receivers, tight ends, and running backs) so that the league average for each is 0%.
Of course, one of the hardest parts of understanding a new statistic is interpreting its scale. To use DVOA, you have to know what numbers represent good performance and what numbers represent bad performance. We’ve made that easy. In all cases, 0% represents league-average. A positive DVOA represents a situation that favors the offense, while a negative DVOA represents a situation that favors the defense. This is why the best offenses have positive DVOA ratings (last year, Green Bay led the league at +33.8%) and the best defenses have negative DVOA ratings (with Baltimore number one in 2011 at -17.1%). In most years, the best and worst offenses tend to rate around ± 30%, while the best and worst defenses tend to rate around ± 25%. For starting players, the scale tends to reach roughly ± 40% for passing and receiving, and ± 30% for rushing. As you might imagine, some players with fewer attempts will surpass both extremes.
DVOA has three main advantages over more traditional ways to judge NFL performance. First, by subtracting defense DVOA from offense DVOA (and adding in special teams DVOA, which is described below), we can create a set of team rankings that's based on play-by-play efficiency rather than total yards. Because DVOA does a better job of explaining past wins and predicting future wins than total yards, it gives a more accurate picture of how much better (or worse) a team really is relative to the rest of the league.
Because it compares each play only to plays with similar circumstances, this advantage also applies vis-a-vis situational team rankings. The list of top DVOA offenses on third down, for example, is more accurate than the conventional NFL conversion statistic because it takes into account that converting third-and-long is more difficult than converting third-and-short, and that a turnover is worse than an incomplete pass because it eliminates the opportunity to move the other team back with a punt on fourth down. The same could be said about plays on fourth down or in the red zone.
Second, unlike formulas based on comparing drives rather than individual plays, DVOA can be separated into a myriad of splits (e.g., by down, by week, by distance needed for a first down, etc.). Therefore, we're able to break teams and players down to find strengths and weaknesses in a variety of situations. All Pittsburgh third downs can be compared to how an average team does on third down. Kevin Kolb and John Skelton can each be compared to how an average quarterback performs in the red zone, or with a lead, or in the second half of the game. This doesn't just give us a better idea of which team or player is better. More importantly, it helps us understand why they're better, and therefore allows us to offer prescriptions for improvement in the future.
Finally, a third advantage of DVOA is that normalization makes our comparisons of current teams and players to past teams and players (going back to 1991) more accurate than those based on traditional statistics like wins or total yards, as well as those based on more sophisticated metrics that aren't normalized (e.g., expected points added, passer rating differential, etc.). For instance, which team had the better offense: the 2011 New Orleans Saints or the 1998 Denver Broncos? Going by total yardage (7,474 vs. 6,092) or even yards per play (6.7 vs. 5.9), it's not even a contest. The Saints were clearly better. However, this ignores the fact that the average NFL offense was much more pass-oriented, and thus more efficient, in 2011 than in 1998. If we take the difference in offensive environment into account by using DVOA, it turns out that Denver's offense was slightly better relative to the rest of the league (34.5% to 33.0%).
*It should be noted that certain plays are included in DVOA for offense but not for defense. Other plays are included for both, but scored differently. This leads to separate baselines on each side of the ball. For instance
- Only four total penalties are included. Two penalties count as pass plays on both sides of the ball: intentional grounding and defensive pass interference. The other two penalties are included for offense only: false starts and delay of game. Because the inclusion of these penalties means a group of negative plays that don’t count as either passes or runs, the league averages for pass offense and run offense are higher than the league averages for pass defense and run defense.
- Aborted snaps and incomplete backwards lateral passes are only penalized on offense, not rewarded on defense.
- Adjustments for playing from behind or with a lead in the fourth quarter are different for offense and defense, as are adjustments for the final two minutes of the first half when the offense is not near field-goal range.
- Offense gets a slight penalty and defense gets a slight bonus for games indoors.
After using DVOA for a few months, we came across a strange phenomenon: well-regarded players, particularly those known for their durability, had DVOA ratings that came out around average. The reason is that DVOA, by virtue of being a percentage or rate statistic, doesn’t take into account the cumulative value of having a player producing at a league-average level over the course of an above-average number of plays. By definition, an average level of performance is better than that provided by half of the league and the ability to maintain that level of performance while carrying a heavy workload is very valuable indeed. In addition, a player who is involved in a high number of plays can draw the defense’s attention away from other parts of the offense, and, if that player is a running back, he can take time off the clock with repeated runs.
Let’s say you have a running back who carries the ball 300 times in a season. What would happen if you were to remove this player from his team’s offense? What would happen to those 300 plays? Those plays don’t disappear with the player, though some might be lost to the defense because of the associated loss of first downs. Rather those plays would have to be distributed among the remaining players in the offense, with the bulk of them being given to a replacement running back. This is where we arrive at the concept of replacement level, borrowed from our partners at Baseball Prospectus. When a player is removed from an offense, he is usually not replaced by a player of similar ability. Nearly every starting player in the NFL is a starter because he is better than the alternative. Those 300 plays will typically be given to a significantly worse player, someone who is the backup because he doesn’t have as much experience and/or talent. A player’s true value can then be measured by the level of performance he provides above that replacement level baseline, totaled over all of his run or pass attempts.
Of course, the real replacement player is different for each team in the NFL. In 2011, the second-string running back in Washington (Roy Helu) had a higher DVOA than the original starter (Tim Hightower), and the third-string running back (Evan Royster) had a higher DVOA than either of them. Sometimes a player like Ryan Grant or Danny Woodhead will be cut by one team and turn into a star for another. On other teams, the drop from the starter to the backup can be even greater than the general drop to replacement level. The 2011 Indianapolis Colts will now be the hallmark example of this until the end of time. The choice to start an inferior player or to employ a sub-replacement level backup, however, falls to the team, not the starter being evaluated. Thus, we generalize replacement level for the league as a whole, as the ultimate goal is to evaluate players independent of the quality of their teammates.
Our estimates of replacement level were re-done during the 2008 season and are computed differently for each position. For quarterbacks, we analyzed situations where two or more quarterbacks had played meaningful snaps for a team in the same season, then compared the overall DVOA of the original starters to the overall DVOA of the replacements. We did not include situations where the backup was actually a top prospect waiting his turn on the bench, since a first-round pick is by no means a "replacement-level" player.
At other positions, there is no easy way to separate players into "starters" and "replacements," since unlike at quarterback, being the starter doesn't make you the only guy who gets in the game. Instead, we used a simpler method, ranking players at each position in each season by attempts. The players who made up the final 10 percent of passes or runs were split out as "replacement players" and then compared to the players making up the other 90 percent of plays at that position. This took care of the fact that not every non-starter at running back or wide receiver is a freely available talent. (Think of Jonathan Stewart or Randall Cobb, for example.)
As noted earlier, the challenge of any new stat is to present it on a scale that’s meaningful to those attempting to use it. Saying that Tony Romo's passes were worth 131 success value points over replacement in 2011 has very little value without a context to tell us if 131 is good total or a bad one. Therefore, we translate these success values into a number called "Defense-adjusted Yards Above Replacement, or DYAR. Thus, Romo was fourth among quarterbacks with 1,344 passing DYAR. It is our estimate that a generic replacement-level quarterback, throwing in the same situations as Romo, would have been worth 1,344 fewer yards. Note that this doesn’t mean the replacement level quarterback would have gained exactly 1,344 fewer yards. First downs, touchdowns, and turnovers all have an estimated yardage value in this system, so what we are saying is that a generic replacement-level quarterback would have fewer yards and touchdowns (and more turnovers) that would total up to be equivalent to the value of 1,344 yards.
(Note: Prior to the 2008 season, DYAR was translated in terms of points rather than yardage, and old articles will refer to these stats as "DPAR" instead.)
HOW CAN A 16-GAME SEASON BE SIGNIFICANT?
Football statistics can't be analyzed in the same way baseball statistics are. After all, there are only 16 games in a season. Baseball has over ten times more, and even the NBA and NHL offer over five times more. The more games, the more events to analyze, and the more events to analyze, the more statistical significance.
That is true, but the trick is to consider each play in an NFL game as a separate event. For example, Drew Brees played only 16 games in 2011, but in those 16 games he had 678 passing plays (including sacks) and 16 rushing plays (including scrambles) for a total of 704 events. Adrian Gonzalez in 2011 played in 159 games and had 715 plate appearances. For the most part, a quarterback who plays a full season will have almost the same number of plays as a baseball hitter who plays in most of his team's games.
A running back will have fewer plays than a quarterback, and wide receivers and tight ends will have even fewer. But there should still be enough plays with most starting running backs and receivers to allow for analysis with some significance. As an example, Maurice Jones-Drew ran the ball 343 times in 2011, and was the target of 63 passes (including incompletes), for a total of 406 plays. In general, a starting running back will have 375-450 plays over 16 games. Receivers are used a bit less, and therefore their stats are likely not as accurate. In general, starting wide receivers have 75-150 pass targets over a full season.
ISSUES WITH DVOA/DYAR
DVOA is limited by what’s included in the official NFL play-by-play or tracked by the Football Outsiders game charting project. Because we need to have the entire play-by-play of a season in order to compute DVOA and DYAR, these metrics are not yet ready to compare players of today to players throughout the league’s history. As of this writing, we have processed 21 seasons, 1991 through 2011, and we add seasons at a rate of roughly two per year (the most recent season, plus one season back into history.)
Football is a game in which nearly every action requires the work of two or more teammates -- in fact, usually 11 teammates all working in unison. Unfortunately, when it comes to individual player ratings, we are still far from the point at which we can determine the value of a player independent from the performance of his teammates. That means that when we say, "In 2011, Matt Forte had a DVOA of -5.6%, what we are really saying is “In 2011, Matt Forte, playing in Mike Martz’s offensive system with the Chicago offensive line blocking for him and Jay Cutler or Caleb Hanie selling the fake when necessary, had a DVOA of -5.6%."
With fewer situations to measure, the numbers spread out a bit more, so you'll see more extreme DVOA ratings for part-time players and for measurements of teams in more specific situations (for example, passing on third downs). The charts listing players in order of DVOA have cut-offs for number of attempts, because players with just a handful of plays end up with absurd VOA and DVOA numbers. (In 2009, for instance, Tarvaris Jackson had a 75.7% passing DVOA in 21 plays.)
Passing statistics include sacks as well as fumbles on aborted snaps. Receiving statistics include all passes intended for the receiver in question, including those that are incomplete or intercepted. At some point, we hope to be able to determine just how much impact different receivers have on completes vs. incomplete passes, but various regression analyses make it clear that both quarterback and receiver have an impact on whether a pass is complete or not. The word passes refers to both complete and incomplete pass attempts.
Unless we say otherwise, all references to third down also include the handful of rushing and passing plays that take place on fourth down (primarily fourth-and-1).
DVOA FOR SPECIAL TEAMS
The problem with a system based on measuring both yardage and yardage towards a first down, of course, is what to do with plays that don't have the possibility of a first down. Special teams are an important part of football and we needed a way to add that performance to the team DVOA ranking. Our special teams metric includes five separate measurements: field goals (and extra points), net punting, punt returns, net kickoffs, and kick returns.
The foundation of most of these special teams ratings is the concept that each yard line has a different value based on how the likelihood of scoring changes with better field position. In Hidden Game, the authors suggested that the value of field position for the offense existed on a straight line with your own goal line being worth -2 points, the 50-yard line 2 points, and the opposing goal line 6 points. (-2 points isn't just the value of a safety; it also reflects the fact that when you are backed up in your own zone, you are likely going to see your drive stall, and you'll need to punt and give the ball to the other team in good field position. Thus, the defense is more likely to score next.) We use a more refined set of values based on our research, but the idea is the same.
Our special teams ratings compare each kick or punt to the league average for based on the point value of field position at the position of each kick, catch, and return. We've determined a league average for how far a kick goes based on the yard line from where the kick occurs (almost always the 35-yard line for kickoffs, variable for punts) and a league average for how far a return goes based on both the yard line where the ball is caught and the distance that it traveled in the air.
The kicking or punting team is rated based on net points compared to average, taking into account both the kick and the return if there is one. Because the average return is always positive, punts that are not returnable (touchbacks, out of bounds, fair catches, and punts downed by the coverage unit) will rate higher than punts of the same distance which are returnable. (This is also true of touchbacks on kickoffs.) There are also separate individual ratings for kickers and punters that are based only on distance and whether the kick is returnable, otherwise assuming an average return in order to judge the kicker separate from the coverage.
For the return team, the rating is based on how many points the return is worth compared to average, based on the location of the catch and the distance the ball traveled in the air. Return teams are not judged on the distance of kicks, nor are they judged on kicks that cannot be returned. As explained below, blocked kicks are so rare as to be statistically insignificant as predictors for future performance and are thus ignored.
Field goal kicking is measured differently. Measuring kickers by field goal percentage is a bit absurd, as it assumes that all field goals are of equal difficulty. In our metric, each field goal is compared to the average number of points scored on all field goal attempts from that distance over the past 15 years. The value of a field goal increases as distance from the goal line increases.
Kickoffs, punts, and field goals are then adjusted based on weather and altitude. It will surprise no one to learn that it is easier to kick the ball in Denver or a dome than it is to kick the ball in Buffalo in December. Because we do not yet have enough data to tailor our adjustments specifically to each stadium, each one is assigned to one of four categories: Cold, Warm, Dome, and Denver. There is also an additional adjustment dropping the value of field goals in Florida (because the warm temperatures allow the ball to carry better) and raising the value of punts in San Fran¬cisco (because of those infamous winds).
The baselines for special teams are adjusted in each year for rule changes such as the introduction of the special-teams-use-only “k-ball” in 1999 as well as the move of the kickoff line from the 35 to the 30 in 1994 and then back to the 35 in 2011. Baselines have also been adjusted each year to make up for the gradual improvement of kickers over the last two decades.
Once we’ve totaled how many points above or below average can be attributed to special teams, we translate those points into DVOA so the ratings can be added to offense and defense to get total team DVOA. As a final step, we then normalize special teams DVOA to reflect the league environment in a given year. It should be noted, however, that we don’t yet have a method to perfectly normalize each year of special teams, so the league average for special teams deviates from 0.0% in certain years, although not by more than 0.5%.
There are three aspects of special teams that have an impact on wins and losses, but don’t show up in the standard special teams rating because a team has little or no influence on them. The first is the length of kick¬offs by the opposing team, with an asterisk. Obviously, there are no defenders standing on the 35-yard line, ready to block a kickoff after the whistle blows. However, over the past few years, some teams have delib¬erately kicked short in order to avoid certain top return men, such as Devin Hester and Josh Cribbs. The special teams formula now includes adjustments to give teams extra credit for field position on kick returns if kickers are deliberately trying to avoid a return.
The other two items that special teams have little control over are field goals against your team, and punt distance against your team. Research shows no indication that teams can influence the accuracy or strength of field-goal kickers and punters, except for blocks. As mentioned above, although blocked field goals and punts are definitely skillful plays, they are so rare that they have no correlation to how well teams have played in the past or will play in the future, thus they are included here as if they were any other missed field goal or botched punt, giving the defense no additional credit for their efforts. The value of these three elements is listed separately as “Hidden” value.
Special teams ratings also do not include two-point conversions or onside kick attempts, both of which, like blocks, are so infrequent as to be statistically insignificant in judging future performance.
ADJUSTED LINE YARDS EXPLAINED
One exception to the use of DVOA/DYAR, and the use of "play success" instead of raw yardage, i