# Sabermetrics, but for Pool

### 1. Introduction

I spend a lot of time playing pool during the school year. Unfortunately, I haven’t had access to pool tables for a while, so I haven’t been able to play. Since I can’t actually play pool, I decided instead to work on a problem involving it that occurred to me while playing a game earlier this year.

Some “team” sports are not really team sports in the sense that what matters is not the total strength of your team, but the strength of your best player. A good example of this is chess: when two people play together, the stronger player can just overrule the weaker player, and so a pair of chess players together is roughly as strong as the stronger player (this isn’t a perfect model, admittedly – for example, two grandmasters against a single grandmaster is probably in favor of the two, but this is definitely true when the abilities of the three players are well separated).

Pool is not like this at all though. When you play with another person, you alternate taking shots, and so playing with someone weaker than you can be a substantial handicap (and similarly for playing with someone much stronger than you). Playing with another person then, amounts to taking some mix of your two abilities. In this post, I try to quantify this mix, and ultimately answer the following question:

Question. Consider three players, ${A, B, C}$, with their skills quantified as ${s_A, s_B, s_C}$. What kind of condition can be given on ${s_C}$ (in terms of ${s_A, s_B}$) to guarantee that ${C}$ has an advantage over ${A}$ and ${B}$ combined?

### 2. Set-Up

Before we can do anything else, we need to develop a reasonable statistical model of how pool works. The key assumption underlying the entire discussion that follows is that there is a fixed probability that a player makes any given shot. This fixed probability will correspond to the quantified skill ${s_A}$ from before. This probability should somehow be weighted by the frequency of shots e.g. never making a shot that only shows up ${10\%}$ of the time should count less towards this probability than never making a shot that shows up ${50\%}$ of the time. Put another way, this is meant to be an empirical probability: you could estimate it by playing some large number of games, and measuring the proportion of shots you attempted that you made.

Some comments on this assumption: it’s pretty unrealistic. In particular, the game becomes much harder as it progresses, because there are fewer balls available to pocket. Furthermore, it’s not always in your best interest to pocket a ball. These problems can maybe be addressed by introducing some kind of decay factor or something, but this analysis is involved enough with this assumption that I’m wary of the complications that arise by relaxing it.

With this set up in mind, let ${X_i}$ be the random variable that is the number of balls a player pockets on turn ${i}$ (we assume that each of the ${X_i}$ is independent of the others). The number of balls a player pockets in the first ${n}$ turns then, is

$\displaystyle S_n=X_1+\cdots+X_n,$

and the player will win the game after ${N}$ turns, where

$\displaystyle N=\min\{n\mid S_n\geq 8\}$

i.e. once you pocket the seven object balls, and then the eight ball. The way to determine the winner of the match is to compare the values of ${N}$ for the player and her opponent, and whoever has the smaller value wins (with equality favoring the player who goes first).

This can also be imagined in the following way: the two players are playing on separate tables, each with eight balls, and counting the number of turns it takes them to clear the table. Whoever does it in fewer turns wins. Underlying this set-up is the assumption of independence between the shots of the two players, which is another unrealistic but unavoidable assumption.

With this set up in mind, here’s the executive summary of how we’ll attempt to answer our question:

1. Understand the distributions of ${X_i, S_n, N}$ for a single player
2. Understand the distribution of ${X_i, S_n, N}$ for a two-player team
3. Use the results of (i) and (ii) to give an answer

### 3. One Player

Suppose we have a single player with a fixed probability ${p}$ of making a shot. Then, each ${X_i}$ is distributed geometrically, so

$\displaystyle \mathop{\mathbb P}(X_i=k)=p^k(1-p),\quad k=0,1,\cdots.$

The next thing to do is look at the distribution of ${S_n}$. Suppose we had ${S_n=k}$. This would mean that we had attempted ${n+k}$ shots, of which ${n}$ had failed and ${k}$ had succeeded. This argument gives

$\displaystyle \mathop{\mathbb P}(S_n=k)=\dbinom{n+k-1}{k}(1-p)^np^k.$

Finally, we look at ${N}$:

\displaystyle \begin{aligned} \mathop{\mathbb P}(N=n)&=\mathop{\mathbb P}(S_{n-1}<8, S_n\geq 8)\\ &=\mathop{\mathbb P}(S_{n-1}<8, X_n>7-S_{n-1})\\ &=\sum_{j=0}^7 \mathop{\mathbb P}(S_{n-1}=j, X_n>7-j)\\ &=\sum_{j=0}^7 \mathop{\mathbb P}(S_{n-1}=j)\mathop{\mathbb P}(X_n>7-j),\\ &=\sum_{j=0}^7\dbinom{n+j-2}{j}(1-p)^{n-1}p^j\sum_{i=8-j}^{\infty} p^i(1-p),\\ &=p^8(1-p)^{n-1}\sum_{j=0}^7\dbinom{n+j-2}{j}. \end{aligned}

This isn’t pretty, but it’s tractable. Shown below are the distributions for ${p=0.25, 0.5, 0.75}$, to give some sense of what we’re looking at.

This slideshow requires JavaScript.

As a sanity check, these have the correct qualitative behavior: higher values of ${p}$ correspond to fewer shots needed to finish.

4. Two Players

Next, we look at a two-player team. Let their fixed probabilities be ${p}$ and ${q}$, where the player with probability ${p}$ is the first player, and the player with probability ${q}$ is the second player. Then, on the first players ${i^{\text{th}}}$ turn and the second players ${i^{\text{th}}}$ turn, the number of balls they pocket are given by random variables ${X_{2i-1}, Y_{2i}}$, where

$\displaystyle \mathop{\mathbb P}(X_{2i-1}=k)=p^k(1-p)\quad\text{and}\quad \mathop{\mathbb P}(Y_{2i}=k)=q^k(1-q),\quad k=0,1,\cdots.$

(We use two different letters for convenience.)

For the ${S_n}$, we need to distinguish based on parity: we have

$\displaystyle S_{2n-1}=X_1+Y_2+\cdots+Y_{2n-2}+X_{2n-1}\quad\text{and}\quad S_{2n}=X_1+Y_2+\cdots+X_{2n-1}+Y_{2n}.$

Computing the distributions of these is a bit trickier this time, but we use a similar method.

Suppose we had ${S_{2n-1}=k}$. For this to happen, we must have made ${2n+k-1}$ shots, of which ${2n-1}$ are failures and ${k}$ are successes. Of the failures, ${n}$ came from the first player, and ${n-1}$ came from the second player. Furthermore, suppose that ${i}$ of the successes came from the first player, and ${k-i}$ came from the second player. We construct a valid sequence of successes and failures by considering the two players separately, and then intercalating the outcomes of their turns.

From the first player, we have ${n+i}$ shots. The last shot must be a failure, and there are ${\binom{n+i-1}{n-1}}$ to arrange the remaining ones. Similarly for the second players shots, we find that there are ${\binom{n+k-i-2}{n-2}}$ arrangements. Then, for each pair of valid arrangements, we go along the ${X_i}$ till we have a failure; then go along the ${Y_i}$ till we have a failure, and so on, to construct a valid sequence of shots that would give ${S_{2n-1}=k}$. The probability of any such sequence is ${p^iq^{k-i}(1-p)^n(1-q)^{n-1}}$.

Using this argument, and allowing ${i}$ to vary, gives

$\displaystyle \mathop{\mathbb P}(S_{2n-1}=k)=\sum_{i=0}^k \dbinom{n+i-1}{n-1}\dbinom{n+k-i-2}{n-2}p^iq^{k-i}(1-p)^{n}(1-q)^{n-1}.$

Using an analogous argument for the even case, we find

$\displaystyle \mathop{\mathbb P}(S_{2n}=k)=\sum_{i=0}^k \dbinom{n+i-1}{n-1}\dbinom{n+k-i-1}{n-1}p^iq^{k-i}(1-p)^{n}(1-q)^{n-1}.$

As another sanity check, looking at ${S_2}$ does in fact give the convolution of ${X_1}$ and ${Y_2}$, so everything’s good.

Remark. When we plug ${p=q}$ into our formula, we recover the results of the single player section. This gives the identity

$\displaystyle \sum_{i=0}^k \dbinom{i+n-1}{n-1}\dbinom{n+k-i-2}{n-2}=\dbinom{2n+k-2}{2n-2}.$

Final note before moving on: these formulas fail in the case of ${k=1}$, because our counting argument breaks down. In that case though, we just have ${X_1}$, whose distribution we already know.

Next, we turn to ${N}$. Distinguishing cases based on parity and repeating the arguments of the previous section gives

$\displaystyle \mathop{\mathbb P}(N=2n-1)=\sum_{j=0}^7 \mathop{\mathbb P}(S_{2n-2}=j)p^{8-j}\quad\text{and}\quad \mathop{\mathbb P}(N=2n)=\sum_{j=0}^7 \mathop{\mathbb P}(S_{2n-1}=j)q^{8-j},$

and we could plug in the distribution of ${S_n}$ to get a single expression, but that would take too much space.

At any rate, these expressions are no problem for a computer, and some sample distributions are shown below.

This slideshow requires JavaScript.

These graphs also show the correct qualitative behavior: a team with one player much stronger than the other should be much more likely to finish on turns where the stronger player is up, and taking${p=q=0.5}$ recovers the distribution from the previous section.

### 5. Two on One Matches

We can now study a two on one game of pool, and see when the single player has the advantage. Consider three players with skills ${p,q,r}$, where the first two are playing together against the third. Let ${N_{p\oplus q}}$ be the number of turns taken for the first two to clear their balls (we use ${\oplus}$ to make it clear that we are not just adding the probabilities in the subscript) and let ${N_r}$ be the number of turns taken for the third to clear her balls. For the game to be in the single player’s favor, we need ${\mathop{\mathbb E}(N_{p\oplus q})\geq \mathop{\mathbb E}(N_r)}$.

Unfortunately, those expectations have no closed form, so this is where we start fudging and approximating.

To get a feel for things, we plot the value of ${\mathop{\mathbb E}(N_r)}$ for different values of ${r}$.

This looks vaguely exponential, so let’s do ${\log \mathop{\mathbb E}(N_r)}$ against ${r}$.

The logarithmic plot looks suitably linear, and running OLS gives

$\displaystyle \widehat{\log \mathop{\mathbb E}(N_r)}=-4.68r+4.69,$

with ${r^2=0.95}$. Of course, based on the graph, we know that this approximation is much worse for ${r<0.1}$, so if you’re terrible at pool, you should take this with a larger grain of salt (and maybe not play against two people at once).

It makes sense to try the same thing for ${N_{p\oplus q}}$. Of course, things are three-dimensional now, so patterns are a bit harder to spot. For that reason, we go straight to plotting ${\mathop{\mathbb E}(\log N_{p\oplus q})}$ for varying ${(p,q}$):

This has the same linear (or planar, I suppose) look, and running OLS again gives

$\displaystyle \widehat{\log N_{p\oplus q}}=-2.24p-1.9q+4.07,$

with ${r^2=0.88}$. Not quite as good, but still a reasonable approximation (with the same caveat of failure at the edges).

Now, if we replace the condition of

$\displaystyle \mathop{\mathbb E}(N_r)\leq \mathop{\mathbb E}(N_{p\oplus q})\quad\text{with}\quad \widehat{\log \mathop{\mathbb E}(N_r)}\leq \widehat{\log\mathop{\mathbb E}(N_{p\oplus q})},$

we have

$\displaystyle -4.68r+4.69=-2.24p-1.9q+4.07\implies r\geq 0.5p+0.4q+0.1,$

where we’ve rounded stuff off in that final result.

So this computation gives the following answer to our question:

Answer: Take a weighted average of skills that is ${50\%}$ of the first player, ${40\%}$ of the second player, and a correction factor of ${10\%}$ of a perfect player (i.e. on who never misses a shot). If you make a higher percentage of your shots than this, you should expect to beat them. $\Box$

To wrap up, let’s try an example.

Example. Suppose you are facing two players, one of whom makes ${90\%}$ of her shots and another who makes ${60\%}$ of her shots. Since

$\displaystyle 0.5\cdot 0.9+0.4\cdot 0.6+0.1\approx 0.8,$

you should expect to beat this pair if you can make ${80\%}$ of your shots.