This off season I decided I wanted to use my programming and statistics skills to put together my own analytics model wholly independent of any other bodies of work that are currently publicly available. I’ve bounced a lot of ideas off of some established analytics content creators (Peter Howard) and believe that I’ve produced something unique and of value here.
Every direction you look you will see a new metric or a new argument about why a specific trait in a prospect proves the point of this analyst or another. I think this selective cherry picking of traits to prove a point rather than looking at the full picture is one of the worst things you will find in the football community.
I’m sure I will get similar responses about how I’m doing the same exact thing creating yet another set of metrics, but I will make a few guidelines to my study to help keep it objective:
- I do not watch film on players at all. I leave it to the people that actually know what they are doing. I will bounce ideas off of qualified individuals to see if my research lines up with their tape. If it doesn’t it won’t change my analytics, but it may lead me to look into another iteration to improve the connection.
- I will not double count stats. Anything I do will only be counted a single time. I use the metric with the best correlation to the sample of a type.
- EX: You could use height and weight for a prospect in the model or you could use BMI. If you use all 3 at the same time you will be overweight in your model on the players size detracting from other areas of the prospect.
Going back to my earlier statement I used my programming skills to build a new id based database for every college prospect back to the year 2000. For years 2000 to 2015 I’ve included every prospect that managed to register at least 1 fantasy point at the QB, RB, WR or TE position at some point in their respective careers. For 2016 to 2019 I have an unfiltered set of prospects that can be adapted as players become relevant.
For each unique player id I have them mapped as you can see above. Every year for the prospect can by viewed on an individual basis and analyzed against the players age at that time assuming it is known. This structure allows you to analyze stats based on:
- Best year during a college
- Final year during college
- Average year over the course of college
- Analyze stats scaled up on a per game basis
- Best of all it allows you to couple prospect traits with age to factor in early production or downplay late production
Short answer… Everything I could get my hands on.
- Draft Capitol (Round and overall pick)
- Combine data including drills and measurable’s
- College and conference strength of schedule based on the Sports-Reference Simple Rating System
- Team Data:
- Games played and Games won each relevant year
- Total counting stats
- Player Data:
- Rushing, Receiving, and Passing counting stats
- Efficiency stats such as yards per attempt
- Return yards
- Finally and most importantly derived or advanced stats… As I said I wanted to see what is real and what was created to help win an argument. I’ve tried to look at everything publicly available.
Currently my database has 400 columns and 3600 rows of compiled and consolidated prospect data mapped uniformly.
If you have ever listened to the Late Round Podcast with JJ Zachariason
you’ve probably heard him say “Now this is how I defined player success and its very likely that you and I define success in a different way”. I’m going to steal a page out of his book and assume the same thing.
After much debate I decided to use the Pro-Football-Reference Player Season Finder to create an a list of every single player from 2000 to 2018 that played their first 3 or 4 seasons of their careers. I pulled the resultant average fantasy football points per game (0.5 PPR) for each player over their respective careers.
This is meant to tie into the value based drafting approach that most fantasy football players follow that we want to draft rookies that hit early and hit hard. Draft for early value and trade for players that you missed on later.
There are several types of prospect models floating around the community. You will see threshold based approaches that say over the past X years on average the elite players have looked like Y. They try and find players that meet certain thresholds that allow them to fit into the mold of the elite tier.
The other main type of model you will see is a simple regression model where someone will choose a trait as the x variable and some sort of opportunity or success indicator as the Y variable. You can chart the data and see what type of linear correlation exists.
This model is most nearly a multiple regression model, but not quite. Its my bastardized version to fit the needs of the situation. If you are a nerd you can read about them here: http://www.stat.yale.edu/Courses/1997-98/101/linmult.htm
At the core of my model you look at each player on a positional basis. You take each trait such as the player’s weight and correlate it to our success indicator on the Y axis.
In the example above which happens to be Percentage of team total yardage for WRs it have an equation of best fit of Y= 18.787x + 1.5052 and a R^2 of .0928. In as layman’s terms the best I can come up with is if a player has 25% of his teams total yardage then he will score roughly 7 points per game over his rookie contract for 0.5 PPR Point Leagues.
Seems a little bit too simple right? Well that’s because it is that particular trait is only 9.28% accurate. This trait can only be used to explain 9.28% of the variation in the success of predicting NFL Players. Not great right?
So here is where my model comes in. If we use multiple traits at once we can create a multiple regression model where we are looking at the prospect’s entire portfolio. This brings up the accuracy substantially! I am still in the process of wrapping up all of my models following a database restructure, but as of today each of my positional models are hitting between 33% and 48% accuracy. You can see an overview of my initial model results here: https://imgur.com/gallery/BrdfxXq
Its important to note that if you are not sufficiently impressed by 33% to 48% then you should know that the hit rate for NFL GM’s drafting in the top 5 picks is historically 10.3% (Article on this)…
I will be going over each position and posting my results. Ill go over all of the traits explaining why some are near useless and why others we should really look into for prospects.