Building a Bayesian Model: Part 3

By Matthew Buchalter, PlusEV Analytics

In Part 2, we introduced the concept of a “ballast model” and applied it to our NBA defensive shooting percentage data. Just one problem though…we selected our ballast values judgmentally (read: made them up randomly). Now, we’re going to approach the selection with a bit more rigor.

There were 1230 games played in the 2018-19 season, and two teams in each game. That gives us 2460 pairs of model projected makes and actual makes for each of 3 pointers, 2 pointers and free throws. If we adjust our three ballasts (one for each basket type), the projections will recalculate…obviously the actuals will stay fixed. We can look at the same squared errors that we discussed in Part 2 to decide whether our adjustment has made the model better or worse.

We could randomly make adjustments all day long until we reach a point where the model is as good as it can possibly get. Luckily, Excel has a built-in Solver function that does this for us. Solver comes with Excel but it may not come activated by default. Google how to enable the Solver add-in if you don’t see it.

We use Solver to find the values of the three ballasts that minimizes the overall squared error:

3 pointers: 6676 attempts (2370 makes)

2 pointers: 1069 attempts (556 makes)

Free throws: 16303 attempts (12493 makes)

Recall that the smaller the ballast, the more responsive your projections will be to the team’s observed season-to-date averages and the larger the ballast, the more your projections will be anchored to the overall league-wide averages. With that orientation in mind, these results have a nice common-sense appeal. All free throws are from exactly the same distance with exactly the same defense (none). With 3 pointers you can have corner 3s, center 3s, wide open looks, hand in your face…so there’s a little bit of variation where the defensive team could make a difference. And then you have 2 pointers, which include everything from slam dunks to contested midrange jumpers so it makes sense that that’s where you’d see the highest ratio of signal to noise in the defensive performance.

So there is our Level 3 model. Applied to Miami/Orlando:

Miami projected 3P% allowed: (234 observed + 2370 ballast) / (658 observed + 6676 ballast) = 35.5%

Miami projected 2P% allowed: (639 observed + 556 ballast) / (1299 observed + 1069 ballast) = 50.4%

Miami projected FT% allowed: (431 observed + 12493 ballast) / (567 observed + 16303 ballast) = 76.6%

Orlando projected 3P% allowed: (255 observed + 2370 ballast) / (705 observed + 6676 ballast) = 35.6%

Orlando projected 2P% allowed: (664 observed + 556 ballast) / (1290 observed + 1069 ballast) = 51.7%

Orlando projected FT% allowed: (394 observed + 12493 ballast) / (508 observed + 16303 ballast) = 76.7%

Squared errors:

Back test results:

This is the point where we start to get a bit excited…sure it’s only +11.8 units and +1.3% over the entire season, but we have a winning model! Using our “frequentist test” from Part 1, the p-value is 0.043, nicely inside the 0.05 threshold that “scientists” like to use all the time, we’re laughing! But before we start loading up our accounts with ammo, let’s do one final test. Let’s apply our Level 3 model to the 2019-20 season as an independent validation. By “independent” I mean that the data used to fit the model is totally separate from the data used to test the model.

Uh oh.

So what happened here? It could be one or both of two things. The model could be “overfit” to the 2018-19 data, meaning that it fits well to that data but gets worse when it gets applied to “out-of-sample” data. And/or, the good results in 2018-19 could have some good luck and/or the bad results in 2019-20 could have some back luck. The p-value of 0.043 is low but it’s not zero! So, on we go.

“But,” you say with an impatient look in your eyes, “the title of this whole deal is ‘Building a Bayesian Model’. You’re wasting my time with all this ballast crap. When do we build the Bayesian model?”

Well, my dear reader, here’s the twist:

We’ve been building a Bayesian model all along.

Click here for Part 4!

Copyright in the contents of this blog are owned by Plus EV Sports Analytics Inc. and all related rights are reserved thereto.

Leave a Comment

Your email address will not be published. Required fields are marked *

We uses cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.