Last year the NCAA held a meeting with some of the best analytics minds in the college basketball business, and after decades of relying on data like the RPI the Selection Committee finally decided to tweak its terrific team sheets to add some advanced metrics to the mix. 1 of those beautiful minds belongs to Kevin Pauga, Michigan State’s Assistant Athletic Director for Administration and founder of analytics website www.kpisports.net. HoopsHD’s Jon Teitel got to chat with Kevin about the KPI, advanced metrics, and the evolution of the Selection Committee.
What exactly is the KPI, and how does it calculate the value of each game throughout the entire season? KPI is a results-based metric that ranks team resumes by assigning a value to each game played. The best win possible is worth about +1.0, the worst loss about -1.0, and a virtual tie at 0.0. Adjustments are made to each game’s value based on location of the game, opponent quality, and percentage of total points scored. Game values are averaged for a team’s KPI ranking (meaning each game counts the same, unlike the current RPI).
During the past year you and others from the world of analytics have attended meetings with the Selection Committee: how did the meetings go, and what changes have they instituted so far? The discussions have produced positive dialogue. More than anything, it has reinforced how complicated it can be to find a simple solution that is easy for everyone (coaches, fans, etc.) to understand.
The committee currently uses many different criteria to set the field every March: which of the old data points do you like and which ones do you think require further revision?
While criteria and analytics are important, there are subjective criteria that allow committee members to further study why a team may be ranked where they are in a given match-up. Context is important. The committee is working to evolve beyond the RPI, which has been used as a sorting mechanism for many years.
This season the committee has implemented a new 4-tier system that redefines “quality wins” to place more emphasis on road wins: are you happy with the new cut-off points, and why are they better than the previous ones? The quadrant system is not perfect (no system is) but it allows the committee to visualize the difference between road/home/neutral site wins and losses. The cut-off points were determined based on historical data and have rewarded teams for key road wins throughout their season.
If a team wants to make the NCAA tourney are they better off scheduling decent teams who they think they can beat, or great teams who they can only hope to upset, or a nice mix of both, or other? I like to say that you need to typically schedule the best teams possible that you think you have a realistic chance of beating. By definition, if you do not think that your team is good enough to beat an NCAA Tournament-quality team, then you likely do not think your team is postseason-worthy and are scheduling differently for other reasons. There is always room for bold risks and depending on your conference affiliation you may have quality games already built into your schedule. 30+ games provide a lot of opportunities to take risks.
How much importance do you place on margin of victory (MOV), and do you think that a team be rewarded for running up the score for 40 minutes rather than giving their bench players a chance for some quality playing time? Margin of victory does provide context, but any circumstance where a team is rewarded for running up a score late in a game is counterproductive to the spirit of sportsmanship. Predictive metrics prove that scoring margin leads to more accurate power rankings of team quality. I include a derivative of MOV in KPI that works to mitigate these very points and depreciates based on current criteria. A 1-point road win at Team A vs. a 20-point road win at that same Team A are different…but not dramatically.
Where does the human element fit into the whole equation, and why is it impossible for a computer to replicate it? The human element provides the art and context of the process. How do you measure why a result happened? An injury? Another factor that influenced an outcome? Computers could determine a field, but the human element is critical in correcting any outliers that may exist.
If I want to predict who is going to win the title, am I better off looking at the quality of a team’s wins, or its power ranking, or something else? If you’re looking at games moving forward, you are better served to look at predictive rankings. Beyond that, it is important to contextualize style of play tendencies that may make for a good or bad match-up for a certain team.
What kind of outliers do you take note of, and how do you place them into the correct context? The team who is “supposed to” win a game emerges victorious just under 80% of the time, so what a team does with the other 20% of their schedule often dictates the success of their season. Often times, outliers or upsets are easy to identify. Remember though: your biggest outliers are often times some of your highest quality wins and losses and make for the difference between the results-based and predictive-based metrics.
Assuming the committee incorporates all of the helpful information that is out there, how do you expect the selection process to change in the years ahead? I think it is too soon to know how the committee will evolve. The committee continues to improve year after year as more data is available to them. It is important that changes not be made quickly, but be made accurately so they can withstand the test of time.