My First Model

By Matthew Buchalter, PlusEV Analytics

Hi everyone! I’m going to use this “blog” to write some articles instead of publishing through a third party.

For my inaugural post, I thought I’d revisit the very first sports betting model I’ve ever built.

Sports betting has been legal in Canada since the 1990s…well, sort of. You can always count on government bureaucrats to misunderstand how the real world works. And thus, we have this amazingly ridiculous law that says that a single game wager is illegal -but as soon as you make a parlay, all element of skill is magically removed and your parlay is technically a lottery ticket, and lotteries of course are A-OK. And so, the Canadian sports lottery system was born.

As with several US state-sponsored systems in the news today (Looking at you, Oregon Fail), conditions for players are generally…uh…less than favourable. Government monopolies are priced like government monopolies. Also, ticket retailers (convenience stores, gas stations, lottery kiosks in malls, etc) earn a hefty commission for selling tickets and another commission for redeeming winners. The result? A game that’s 50-50 is priced at 1.70 odds on each side, that’s -143/-143 in American odds. Compound that vig over a 3-leg parlay, and you’re looking at -39% EV to the player, on par with lotteries and scratch-offs but without the jackpot potential that you get with those. It’s so bad that Wikipedia says “most experts agree that the odds offered on Sport Select are such that even the sharpest punter would have no hope of making a profit in the long term.” (Spoiler: most experts are wrong)

As a teenager, I would play occasionally just for fun. Sometimes if I asked my dad nicely, he would take me to the store for a pack of fruit tella and a $5 parlay on the way downtown to Maple Leaf Gardens to watch the Leafs play. I’d spend most of the game staring at the out of town scoreboard!

Fast forward to the spring of 2007. Ottawa vs Pittsburgh in round 1 of the NHL playoffs. Those carefree times when I was single and could sit on the couch watching sports all day. Let’s put a bet down, just for fun.

Ontario Lottery had a “player matchup” prop – 2 points for a goal, 1 for an assist, 3-way market (V player win / H player win / tie). You were allowed to include the game and the associated player prop in the same parlay. Now, it’s a well known fact that having correlated outcomes increases the EV of a parlay. I figured it wouldn’t be enough to overcome the -39% starting point, but it would at least bite into it enough that I didn’t feel like a total square.

So I went to the convenience store and used the little pencil to fill out my ticket. (Yes there was a time pre-COVID when sharing pencils was no big deal.) $20 bet, 4 leg parlay. Pittsburgh over Ottawa, Crosby over Alfredsson, and another team+player combo on a different game that I’ve forgotten since then. It probably had a potential payout around $150. I gave the slip to the cashier and he ran it through his lottery terminal. No ticket printed – my slip had been rejected. The convenience store guy, who didn’t speak English, turned the terminal screen to face me. On it were three words that changed my life:

LIABILITY LIMIT EXCEEDED

So here we have the government of Ontario, an economic entity comparable in size to a medium sized US state, rejecting a $20 bet with -39% EV because they can’t stomach the liability. By the time I got home from the store, I realized exactly what was going on. The combination of 2 teams + 2 correlated players had triggered some system-defined maximum amount of action because it was being hammered by professional bettors. I had accidentally stumbled into +EV. That same day, I began building my first sports betting model.

Objective: Find the joint probability of Pittsburgh beating Ottawa AND (2 x Crosby goals + Crosby assists) > (2 x Alfredsson goals + Alfredsson assists).

I had finished my university degree and my actuarial exams by then, so I had acquired some tools to work with. The one I chose for this job was the Poisson distribution. Two qualities of Poisson that make it especially useful for this application:

  • As a 1-parameter distribution, it’s the simplest to work with. If you know the mean of a Poisson variable, you can calculate everything there is to know about it.
  • It’s divisible. If cars cross an intersection at a Poisson rate of 100 per hour and 20% of cars are red, then red cars cross the intersection at a Poisson rate of 20 per hour and non-red cars cross at a Poisson rate of 80 per hour.

One teensy problem – Poisson requires that the thing being measured (in this case, goals) is evenly spaced across the time period being considered (in this case, a full hockey game). In reality, this is not the case – more goals are scored near the end of a game when empty-net situations and/or overtime occur. But, life is full of trade-offs – the small loss of accuracy is worth the large gain in simplicity. If our bet was on “3rd period total goals” or “will there be overtime?”, we wouldn’t be able to get away with this.

So, let’s model Pittsburgh goals and Ottawa goals as two independent Poisson variables. Again, they’re not totally independent – if Ottawa is leading 3-1 with 2 minutes left, Ottawa is much more likely than Pittsburgh to score the next goal due to the empty-net situation. Again, it doesn’t make enough of a difference for us to worry about.

Next, we can subdivide each of them so that we end up with six independent Poisson variables:

G1: Crosby goals

A1: Crosby assists

N1: Pittsburgh goals where Crosby neither scores nor assists

G2: Alfredsson goals

A2: Alfredsson assists

N2: Ottawa goals where Alfredsson neither scores nor assists

Each of those six Poisson distributions has its own parameter (its mean) for which we need to solve, so we’re looking for six values to complete the puzzle. Let’s use lowercase letters to denote each mean – E(G1) = g1, E(A1) = a1 etc. Standard practice is to use Greek letters for this but screw that, I’m not that pretentious.

If you remember your high school algebra, you’ll recall that a unique set of solutions for six unknowns requires a system of six simultaneous equations. So we need six statements that we can make about g1, a1, n1, g2, a2 and n2. Let’s get to it!

Equation #1 (Win probability): P(G1+A1+N1 > G2+A2+N2) + 0.5 * P(G1+A1+N1 = G2+A2+N2) = Pittsburgh’s implied win probability, taken from live Pinnacle odds, where (G1+A1+N1) is Poisson distributed with mean (g1+a1+n1) and (G2+A2+N2) is Poisson distributed with mean (g2+a2+n2).

Equation #2 (Total): P(G1+A1+N1+G2+A2+N2 > 5.5) = the implied probability of over 5.5, taken from live Pinnacle odds, where (G1+A1+N1+G2+A2+N2) is Poisson distributed with mean (g1+a1+n1+g2+a2+n2).

Equation #3 (Crosby goal rate): g1 / (g1+a1+n1) = (Crosby goals per game) / (Pittsburgh goals per game), calculated using current year + prior year where current year gets 2x weight.

Equation #4 (Crosby assist rate): a1 / (g1+a1+n1) = (Crosby assists per game) / (Pittsburgh goals per game), calculated using current year + prior year where current year gets 2x weight.

Equation #5 (Alfredsson goal rate): g2 / (g2+a2+n2) = (Alfredsson goals per game) / (Ottawa goals per game), calculated using current year + prior year where current year gets 2x weight.

Equation #6 (Alfredsson assist rate): a2 / (g2+a2+n2) = (Alfredsson assists per game) / (Ottawa goals per game), calculated using current year + prior year where current year gets 2x weight.

So now we have a complicated web of six equations that need to be solved simultaneously…which brings us to my favourite software tool in the entire world, Excel Solver. Put each of your six variables in a cell (make up any values to start) and use those cells to calculate the left and right sides of each of the six equations. Define “error” for each equation as abs(left side – right side) and calculate the sum of the six errors. Then use our trusty Solver to find the values of g1, a1, n1, g2, a2 and n2 that minimizes the sum of the errors. Voila, equations solved!

Using these six values, we can now make probability statements about any outcome or combination of outcomes. There are several ways to do this. My favourite is to use a VBA script (Excel 4 life, suck it haters) to cycle through each possible value, 0 through 9, of each of G1, A1, N1, G2, A2 and N2. That’s 1,000,000 combinations, easily done within a few seconds even on a 2007 computer.

For each combination, you can calculate:

  • Probability of the combination occurring = P(G1) * P(A1) * P(N1) * P(G2) * P(A2) * P(N2) where probabilities are calculated using the Poisson distribution. Call this value A.
  • Does Pittsburgh Win? 1 if G1+A1+N1 > G2+A2+N2, 0.5 if G1+A1+N1 = G2+A2+N2, 0 otherwise. Call this value B.
  • Does Crosby beat Alfredsson? 1 if 2*G1+A1 > 2*G2+A2, 0 otherwise. Call this value C.

Take A*B*C for each combination, sum it over all 1,000,000 combinations and there’s your probability. The probability * the decimal odds – 1 is your EV. If it’s positive, fire away!

I built this over 3 days and had an absolute blast doing it. For the next 4 or 5 years, until the player points prop was retired, there would be several +20% EV plays each day and the occasional +30% or +40% one too. Sometimes on the side + player, sometimes on the side -1.5 + player + over, and sometimes on the total under + player tie. Those were the days…it was a race to see which group of professionals could grab the best combos before the liability limits were triggered.

And that, my friends, is the origin story of Plus EV Analytics.

Copyright in the contents of this blog are owned by Plus EV Sports Analytics Inc. and all related rights are reserved thereto.

Leave a Comment

Your email address will not be published.

We uses cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.