(Re)Introducing NFL Plus/Minus, a Superior Player Valuation Metric
Assigning point values to player contributions through cluster-based modeling of team efficiency differentials
Player valuation is one of the hardest puzzles to solve in sports. At our disposal we have player-level on-field stats, PFF player grading and other tape-based evaluation. We also have advanced metrics like expected points added, which translates what happens on the field into a points-based denomination, but is difficult to assign credit to various players.
Quarterbacks, due to their uniquely high importance, are can more confidently be assigned the team-level value on the plays which they’re involved, but even then there are many contextual factors that need to be removed to better isolate their impact, which I attempted to do with my Adjusted Quarterback Efficiency (AQE) metric. For other players on the field, it becomes even more difficult.
In this analysis, I’m going to focus on how to use advanced stats and player participation to value non-quarterbacks, focusing specifically on the value wide receivers bring when they’re running routes. This exercise below can be replicated and applied to all positions and all facets of play (run blocking, pass blocking, pass rush, run defense, coverage, etc).
I titled with piece re-introducing NFL Plus-Minus because it’s an improved and updated version of one of my most impactful pieces of research at PFF. I’ve translated the advanced stats and participation data to the robust sets available using nflreadR, which has advanced stats going back to 1999, and participation data from 2016.
Plus/minus data has been used (or more often misused) in sports analysis for as long as analyst have been able to separate player participation on a game- or play-level basis. The problem with plus/minus data and on/off splits is that it’s difficult to pinpoint the effect of one player on the field, especially with small samples. We intuitively know that the field-stretching capabilities of a receiver are valuable, but their presence doesn’t show up in box scores or even PFF grading when they aren’t targeted.
Looking at the overall team efficiency effects when players are on and off the field helps tease out the more subtle aspects of value, but can also be extremely noisy for any single player. The question is how to tease out as much of the noise as possible in plus/minus data for a sport with only 17 games per season and no regular rotations to build the “off” sample for star players. Joe Thomas famously played over 10,000 straight offensive snaps for the Browns, so good luck calculating meaningful plus/minus numbers for him.
My solution for reducing the noise in a single plus/minus split is growing the sample. While we can’t grow one player’s sample, we can find that player’s closest counterparts and add their numbers to the sample. If one player provides a few hundred snaps on and off the field a season, finding 10 similar players will provide a few thousand. The higher you can reasonably build the sample, the more you can minimize noise and boost signal.
In this analysis, I walk through how to build similar groups of receivers by statistical similarity and then use the larger sample of the group to calculate more meaningful estimates for the value of its constituents. This lays the foundation to replicate the process further, producing estimates for the value of each receiver.
When applied to all positions, the NFL Plus/Minus metric will give us a better metric to use for valuations of all sorts, from free agency to trades to the NFL draft projections.
PLAYER CLUSTERING
For this analysis, I’m using every season since 2016, and I'm only looking at receivers who ran at least 50 routes in a season.
For each receiver season, I calculated a number of efficiency and volume statistics and settled upon six for best differentiation receiver types: routes per game, slot rate, yards per route run, touchdown as a percentage of routes, first-down percentage, deep rate (percentage of 20+ yard targets) and yards per reception. I translated these seven features into principal components to minimize multicollinearity and make for easier visualization. The technique I used to form groups of similar receiver seasons is called k-means clustering. With this clustering technique, you choose the number of clusters, or groups, to form.
Here, I’ll walk through an example of the clustering process. In this example, I chose to divide the roughly 1,500 receiver seasons into only five clusters for the sake of simplicity of illustration.
Every cluster is represented by a different color, and I highlighted two players’ 2022 seasons that was assigned into each cluster. It’s within these five clusters that the individual numbers for each receiver plus/minus are aggregated to determine the overall cluster plus/minuses. For the remainder of this analysis, I will refer to the five clusters by the last names of the representative players, rather than cluster number.
The dashed arrows extending out from the center of the plot show the directional force of the different features and the length of the arrows indicates the importance of each feature on the cluster assignment. The volume of routes and route-based efficiency for yards, first downs and touchdowns all pull in the same direction to the upper-left. Touchdowns are the least impactful feature, which makes sense as they are the noisiest stat of the group. Higher slot rate is directionally indicated by player seasons at the top of the plot, with yards per reception and deep rate mostly being opposing features. It makes sense that slot receivers are less likely to go deep and have higher yards per reception, and vice versa for receivers who mostly line up out wide.
A better view of how the players in each cluster differ can be seen using the spider charts below. Starting at the top, the features are in clockwise rotation: routes per game, yards per route run, first down rate, touchdown rate, yards per reception, deep rate and slot rate. Remember, these are the average numbers for all the receivers in the cluster, not only the players the cluster is labeled under.
Keep reading with a 7-day free trial
Subscribe to Unexpected Points to keep reading this post and get 7 days of free access to the full post archives.