SABRmetrics invades the NFL?
It's opening weekend of the 2015 NFL season, so it's as good a time as any to talk about a statistical model that allows a defensive coordinator to know with about 90% accuracy what play the opposing offensive is going to call. This could change football as much as SABRmetrics has changed baseball.
Of course, offensive coordinators can use the model to understand the tendencies of his own team and call plays outside their team's propensities.
The one coach that this probably wouldn't have effected was Vince Lombardi. During the Packers run in the sixties, he called a limited number of plays. It was going to be student body right or student body left, usually to the right. In practice, Lombardi ran plays until they were thoroughly ingrained in his players. Every other defensive coordinator, coach and player knew what the Packers were going to call on any given play. Bart Starr would take the snap, Jerry Kramer and Fuzzy Thurston would pull and lead the halfback, first Paul Horning then Donny Anderson around the right end. Stop us if you can. Other teams couldn't. Five NFL Championships in the the decade. Second place twice. Everyone knew what the Pack going to do. It didn't matter. Kramer and Thurston just ran over opposing players.
Well, times change. Are we entering the day when statistics controls the NFL game? Here's the story.
Statistical model predicts with high accuracy
play-calling tendency of NFL teams
If a defensive coordinator of a National Football League team could predict with high accuracy whether their team's opponent will call a pass or run play during a game, he would become a rock star in the league and soon be a head coach candidate. A new statistical model that predicts the play-calling tendency of NFL teams with high accuracy has been unveiled.
|Sports Illustrated |
Monday Morning Quarterback:
A Fully Caffeinated Guide to Everything
You Need to Know about the NFL
by Peter King
Order new or used from
Their model, which correctly called run and pass plays at a high rate when tested using play-by-play data from actual NFL games, could be used by casual fans and even NFL defensive coordinators during real games to predict their opponent's next play.
"A valuable skill for NFL coaches is to be able to anticipate whether the opposing team will call a pass or run play. If the offensive play type can be predicted--say a pass--the defensive coordinator can call a blitz or coverage play to gain an advantage," explained Burton during his presentation.
Burton and Dickey used 2000 through 2014 NFL play-by-play data from ArmChair Analysis to conduct an initial analysis of the probability of a pass in a NFL game. This analysis revealed that pass probability in NFL games has risen by more than 2 percentage points from 54.4% during the 2000 season to 56.7% during the 2014 season. Armed with this information, they determined the model should be developed using data from the 2011-2014 seasons.
Next, they had to decide which factors most influence an offensive team's play selection. These include yards to go, the play down (first, second, third or fourth), time remaining, point differential, offensive points, defensive points, interaction between yards to go and down, cumulative number of fumbles, cumulative number of interceptions, field position, timeouts remaining for the offense, timeouts remaining for the defense and yards gained on the previous plays. They considered many other variables, such as play lag (what had occurred on the previous play) and current weather conditions (precipitation/wind speed), but found they did not have a significant effect on play-calling.
Burton and Dickey then developed logistic regression and random forest models using the ArmChair Analysis play-by-play data seasons to predict future play types. While building the logistic regression model, they determined separate models needed to be created for each quarter of a game because the behavior of the selected variables change by quarter. For example, if a team is losing in the fourth quarter, it is much more likely to throw a pass, while the winning team is less likely to call a pass play. Conversely, in the first quarter, point differential has no benefit to predicting play type.
Each quarter has its own quirks that are not picked up if modeled together. As a result, six unique logistic regression models were created--one each for the first quarter, second quarter, third quarter, fourth quarter winning, fourth quarter losing and fourth quarter tied.
To test their model, Burton and Dickey randomly selected 20 games from completed NFL seasons. The model's best result was correctly predicting 91.6% of plays in a 2014 game between the Jacksonville Jaguars and Dallas Cowboys, with the average prediction accuracy over all 20 games 75%.
The following is a list of five games with the highest prediction accuracy rates from the 20 tested. (Note: Only pass or runs plays are included; punts and field goal attempts are not included.)
2014 Dallas Cowboys at Jacksonville Jaguars
- Total # of Plays: 119
- Total # of Plays Correctly Predicted: 109
- Total # of Plays Incorrectly Predicted: 10
- Percent of Plays Correctly Predicted: 91.6%
- Total # of Plays: 148
- Total # of Plays Correctly Predicted: 134
- Total # of Plays Incorrectly Predicted: 14
- Percent of Plays Correctly Predicted: 90.5%
- Total # of Plays: 128
- Total # of Plays Correctly Predicted: 109
- Total # of Plays Incorrectly Predicted: 19
- Percent of Plays Correctly Predicted: 85.16%
- Total # of Plays: 121
- Total # of Plays Correctly Predicted: 96
- Total # of Plays Incorrectly Predicted: 25
- Percent of Plays Correctly Predicted: 79.33%
- Total # of Plays: 107
- Total # of Plays Correctly Predicted: 84
- Total # of Plays Incorrectly Predicted: 23
- Percent of Plays Correctly Predicted: 78.50%
Story Source: Materials provided by American Statistical Association. "Statistical model predicts with high accuracy play-calling tendency of NFL teams." ScienceDaily, 12 August 2015.