Building a Bayesian Model: Part 2
By Matthew Buchalter, PlusEV Analytics
In Part 1, we built a very simple model for NBA shooting percentage defense that didn’t work very well. The underlying hypothesis was that all team-to-team variance in opponent shooting percentages is completely random. While overall we can say that hypothesis is busted, our back-test did suggest that it might be partially true, especially early in the season when a team’s observed percentages are based on smaller sample sizes. In Part 2, we’re going to take those ideas and run with them, by building something called a “ballast” model.
In physics, a ballast is a heavy object that anchors something down so that it doesn’t float away. Our ballast will serve exactly that function; it will control the amount by which our observed values are able to “float” away from the overall averages.
To select our ballast, we need to answer the following question:
“At what sample size do we think the observed results contain a 50-50 mix of predictive signal and random noise?”
I’m going to select a ballast of 1000 attempts for each of 3 pointers, 2 pointers and free throws. The overall averages get applied to the ballast as follows:
3 pointers (league average 35.5%) = 355 made in 1000 attempts.
2 pointers (league average 52.0%) = 520 made in 1000 attempts.
Free throws (league average 76.6%) = 766 made in 1000 attempts.
The way our ballast works is that it gets mixed in with the observed data. Using the December 4th Miami/Orlando game as an example:
Miami projected 3P% allowed: (234 observed + 355 ballast) / (658 observed + 1000 ballast) = 35.5%
Miami projected 2P% allowed: (639 observed + 520 ballast) / (1299 observed + 1000 ballast) = 50.4%
Miami projected FT% allowed: (431 observed + 766 ballast) / (567 observed + 1000 ballast) = 76.4%
Orlando projected 3P% allowed: (255 observed + 355 ballast) / (705 observed + 1000 ballast) = 35.8%
Orlando projected 2P% allowed: (664 observed + 520 ballast) / (1290 observed + 1000 ballast) = 51.7%
Orlando projected FT% allowed: (394 observed + 766 ballast) / (508 observed + 1000 ballast) = 76.9%
This is our “level 2” model.
A few useful properties of a ballast model:
- It will always give a projected result that is between the observed result and the overall average;
- The ballast has the most impact when the observed sample is small and the least impact when the observed sample is large;
- The larger the ballast, the closer the model gets to the simple “overall average” model we built in Part 1. The model from Part 1 can be thought of as a ballast model with ballast size = infinity.
We can calculate adjusted PPG allowed using the same method from Part 1:
Miami: (3 * 658 * 35.5% + 2 * 1299 * 50.4% + 1 * 567 * 76.4%) / (22 games) = 110.4 adjusted PPG allowed.
Orlando: (3 * 705 * 35.8% + 2 * 1290 * 51.7% + 1 * 508 * 76.9%) / (23 games) = 107.5 adjusted PPG allowed.
This gives us a different set of “adjusted PPG allowed” between the level 1 and level 2 models. So which one is better?
The simplest way to compare the two models is to look at the squared error between the projected and actual # of made shots. For Miami/Orlando:
Of course, one game tells us practically nothing, so we need to calculate the squared errors for the entire season:
So the level 2 model does look better from this perspective.
Now let’s back test the level 2 model:
Getting a little better…see how the bet volumes drop off considerably as the year progresses. That’s because the ballast has less impact, so the adjusted average rarely moves far enough from the observed average to exceed the 1.0 point threshold.
But wait, what basis did I have for selecting 1000 attempts as the ballast? None, I just made it up. But statisticians aren’t in the business of randomly making stuff up, right? At least not this blatantly? (Please don’t answer that)
Copyright in the contents of this blog are owned by Plus EV Sports Analytics Inc. and all related rights are reserved thereto.