Beyond Pythagorean Expectation: How Run Distributions Affect Win Percentage
The Pythagorean expectation formula, originally developed by Bill James, provides a reasonably good estimate of the win percentage of a baseball team using the number of runs scored and runs allowed by that team. Improvements on the formula, such as the Pythagenport by Davenport and Woolner, and the Pythagenpat by Smyth and Patriot, allow for variation of the Pythagorean exponent and give a very good estimate for win percentages over a wide range of run environments. This article looks at possible improvements on the Pythagorean formula and its variants that take into account the shapes of the run distributions in addition to the run environment. There is a clear pattern that teams which have a higher slugging percentage score runs more consistently, i.e., their run distributions have a smaller standard deviation, and they tend to win more games than their Pythagorean expectation. This article also examines how metrics that use runs for evaluating teams and players might be adjusted to account for these effects.
Derivations of the Pythagorean expectation formula have been made under certain assumptions. Hein Hundal showed that if run distributions are independent log-normal distributions, then the Pythagorean exponent can be approximated by a particular function of the standard deviation and mean of a typical run distribution. Steven Miller showed that if the run distributions are Weibull distributions, then win percentages are given exactly by the Pythagorean formula, where the Pythagorean exponent is a parameter used to fit the run distributions to data. In both cases the Pythagorean exponent inferred from run distribution data is very close to the empirical value.
In the two derivations described above, the shape of the run distribution is determined for a given average runs per game (RPG), i.e., they are single-parameter distributions. Consequently, a team with a higher RPG is always predicted to win more games. Actual run distributions depend on many variables, such as the rates for walks, singles, doubles, triples and home runs. A team with the same runs scored and runs allowed will not necessarily have a win percentage exactly equal to .500 if the shapes of those run distributions are different. This paper reports on an investigation of these effects and discusses their potential consequences:
- How the shapes of run distributions affect win percentage will be examined using four different sources for the distributions: (i) actual runs scored data, (ii) a log-normal distribution, which has two parameters that can be taken as the mean and standard deviation of the distribution, (iii) a toy model where a given team hits only one type of base hit (single, double, or home run), and (iv) a Markov chain model where a team can have a mixed batting profile. In each case the more consistent a team is in scoring runs (i.e., the narrower the run distribution, or the smaller the standard deviation), the better its win percentage, for fixed RPG.
- Modifications of the Pythagorean formula are proposed that fit all of these data sets significantly better than the standard forms (including the Pythagenpat version).
- Using the results from above, the implications for evaluating players and building a team will be discussed.
First, run distributions from 1999-2008 were used to find win percentages in head-to-head match-ups (435 combinations each year for a total of 4350 pairings of teams). This is done as follows. If team i has probability Pi(n) of scoring n runs in a game, then its win/loss ratio against team j with probabilities Pj(n) is
W/L = [Sum_n Pi(n) (Sum_(n>m) Pj(m))] / [Sum_n Pi(n) (Sum_(n<m) Pj(m))],
where it is assumed that ties are decided with the same ratio. For pairs of teams with approximately the same RPG (within 1%), the win percentage varies from .500 by as much as .035, compared to the maximum deviation of about .005 predicted by the Pythagorean formula for two teams with RPG within 1%.
The next step is to show that much of the deviation is due to the different shapes of the distributions. Using the standard deviation from the mean as a measure of distribution shape, a strong correlation (with correlation coefficient 0.92) was found between the win percentage and the ratio of the standard deviations of the two teams. The win percentage varies inversely with the standard deviation ratio, which means that a more consistent team (i.e., one with the smaller standard deviation) tends to do better than the Pythagorean expectation, while an inconsistent team (larger standard deviation) does worse.
There is a similar spread in win percentages and correlation with standard deviation ratios for teams with different RPG. Dividing the pairs of teams into groups with approximately the same RPG ratio shows that the teams with the smallest (largest) standard deviation within a group tend to win the most (least) games. The degree of correlation between the win percentage and standard deviation ratio goes down as the RPG ratio deviates from one, but generally remains above 0.80.
There is also a mild dependence on the run environment (defined as the total RPG for both teams). A good fit to the data (with root mean square (RMS) error .0046 in win percentage) for the win/loss ratio (for Team 1 playing Team 2) is
W1/L1 = (RPG1/RPG2)^a (SD2/SD1)^b,
where RPGi and SDi are the RPG and standard deviation for team i, and
a = 1.313 (RPG1 + RPG2)^.214, b = 1.020 (RPG1 + RPG2)^( -.356).
This can be compared with an RMS error of 0.0109 when using the Pythagenpat formula, W1/L1 = (RPG1/RPG2)^a with a = (RPG1 + RPG2)^.287, on the same data set.
The question now arises as to whether the differences in standard deviation are entirely due to random fluctuations (which are presumably not reproducible), or are due partly to the different shapes of the intrinsic run distributions of the teams. This was studied in three ways.
First, it was assumed that run distributions are log-normal. Log-normal distributions have two parameters, which can be taken as the mean and standard deviation. Allowing RPG to vary randomly between 4.0 and 6.0 and standard deviations to randomly vary between 2.0 and 3.0, a sample of 5000 pairs of teams was generated. Then for each team a run distribution for a 162-game season was randomly generated, to simulate the 1999-2008 data set. Then win percentages in the head-to-head match-ups were calculated as before.
As with the actual data, a strong correlation was found between the win percentage and standard deviation ratio for teams with approximately the same RPG ratio. However, although they are qualitatively similar, the win percentage did not have exactly the same relationship to RPG ratio and standard deviation ratio. This is not surprising since the intrinsic run distributions are not necessarily log-normal, but it does show that different shapes of the intrinsic run distribution can affect win percentage in a way similar to that seen in the actual data.
Next, if it is assumed that a given team has only one type of base hit (single with runners advancing one base, single with runners advancing two bases, double, or home run), then the runs per inning (RPI) distributions are relatively simple functions of the batting average (AVG). From these, RPG (actually, runs-per-27-outs) distributions were easily determined and win percentages for head-to-head match-ups were calculated. Ties were decided in extra innings using the RPI distributions. Since the RPI distributions are different from RPG distributions, the probability of winning in extra innings is not necessarily the same as the overall probability of winning the game.
The results for teams with RPG = 5.0 are displayed in the following table, which shows the win percentages for each possible match-up, as well as the standard deviation of a team’s run distribution and the prediction using the empirical formula from above.
|Actual Win Pct. Vs.||Predicted Win Pct. Vs.|
Clearly the team with the more consistent offense (lower standard deviation) wins more, even though the RPG are identical. The additional wins per 162-game season can be as much as eight in the most extreme case (the HR team versus the 1B team). Also, the empirical formula derived from actual data gives fairly good predictions for these win percentages.
The extra-innings win percentages are even more different from the Pythagorean expectation. The typical excess win percentage in extra innings is 2.35 to 2.50 times the overall excess win percentage. For example, the HR team beat the 1B team 62% of the time in extra innings, primarily due to the fact that the HR scores in 40% of innings while the 1B team scores in only 24% of innings (although when they do score, they tend to score more runs).
Finally, to make a more realistic model it was assumed that a team has a given set of probabilities for getting a walk, single, double, triple or home run per plate appearance. Then using a Markov chain analysis the RPI distribution was found, and then the RPG distribution. Runners were given MLB-average probabilities of advancing an extra base on a hit (e.g., going from first to third on a single) or advancing one base on an out (e.g., scoring from third on an out).
To reduce the size of the (five-dimensional) parameter space, the triples rate was set at 0.5% (about the MLB average in 2009) and the doubles rate was set to 31% of the singles rate (also about the MLB average). Triples were frozen because they are relatively rare, and the doubles-to-singles ratio was fixed because it has the smallest variation of all of the possible hit ratios, at least for team totals. Then the batting profile is uniquely determined by the walk, single and home run rates, which can be converted to the traditional slash stats, AVG/OBP/SLG, or vice versa.
A random sample of 100 teams was generated assuming the following ranges: .220 to .300 for AVG, 0.050 to 0.100 for (OBP – AVG), and 0.080 to 0.170 for (SLG – AVG). These ranges cover most of the values seen for teams since 1900 (although the extremes for AVG are actually about .200 and .320, and .050 and .200 for (SLG – AVG)). The differences (OBP – AVG) and (SLG – AVG) were used, rather than OBP and SLG directly, since they gave more realistic slash stats.
From these 100 teams, the win percentage for each of the possible 4950 pairs of teams was determined using the RPG and RPI distributions. Since the Markov chain is an exact calculation, there is no noise in these distributions. Then a least-squares fit is made using the modified Pythagorean W/L formula, with best fit parameters
a = 1.251 (RPG1 + RPG2)^.216, b = 1.328 (RPG1 + RPG2)^(-.430).
The RMS error in the win percentage was 0.00077, compared to 0.00234 for the Pythagenpat formula with a = (RPG1 + RPG2)^b, where b is allowed to vary. The parameters derived from the Markov chain data set provide a decent fit to the actual data from 1999-2008 (RMS error .0062, only .0016 above the best fit), while the parameters that fit the actual data do not do nearly as well on the Markov chain data set (RMS error .00778, ten times the best fit), which suggests that the Markov chain-derived parameters are better, presumably because they are derived from a data set based on the intrinsic batting profiles without any noise.
There is a strong anti-correlation between standard deviation and SLG in this data set, so instead of using standard deviation as a measure of consistency, SLG can also be used
W1/L1 = (RPG1/RPG2)^a (SLG1/SLG2)^b
a = 0.723 (RPG1 + RPG2)^.373 and b = 0.977 (RPG1 + RPG2)^( -.947)
are the best fit parameters to the Markov chain data set, with RMS error .00068 (slightly better than when the standard deviation is used). No other parameters were found that provided a better fit than RPG and SLG.
This formula may now be used to determined how much a higher SLG is worth given the same RPG. From the derivative of the W/L formula it is easy to show that the extra SLG (for fixed RPG) needed to give the same increase in win percentage as one additional run is (Delta SLG) = (a SLG)/(b R), where R is the number of runs in a season. For the current run environment, this gives (Delta SLG) = 0.008, so a player that has a SLG .072 higher than another player (so that the team SLG is .008 higher) should be rated one run better than they would be otherwise, compared to the other player. Similarly, a team SLG that is 0.080 higher for the same RPG would be worth the equivalent of ten more runs, or about one win per season for a typical team in the current run environment.
By varying base running (e.g., a runner taking an extra base on a hit), a similar analysis can be done. For example, if a team increases its probability of taking the extra base by .075 (e.g., going from first to third on a single 32.5% of the time instead of 25% of the time) without changing its RPG, it would be worth the equivalent of about one extra run per season. This is a much smaller effect than having a different batting profile, but it is consistent with the earlier findings that the 1B+ team won more games than the 1B team, even though they had the same RPG.
The consequences for building a team follow directly from these results. If two teams have the same value in runs (using whatever metric you prefer), and one team has a SLG that is .080 higher, then that team should expect to win about one more game a season, even though it has the same RPG as the other team. Therefore one should choose players with higher SLG if run values are equal, and in some cases a higher SLG even if their value in runs is slightly less.
What is good for the offense must be bad for the defense – for the same expected RPG allowed, the pitcher with the lower SLG allowed will help its team win more games. If one set of pitchers had a SLG allowed .080 less than another set of pitchers for the same RPG allowed (presumably because they allow more walks and singles but not as many extra base hits), the first set of pitchers would be worth about one extra win per season.
Although these effects are not large for a single player, a concerted effort by a team to fill its roster with offensive (defensive) players that have a higher (lower) SLG (SLG allowed) for the same overall run value could be worth as much as two additional wins per season.
This article is adapted from a paper submitted to the 2010 MIT Sloan Sports Analytics Conference.