## Beyond Pythagorean Expectation: How Run Distributions Affect Win Percentage

#### by KerryWhisnant

The Pythagorean expectation formula, originally developed by Bill James, provides a reasonably good estimate of the win percentage of a baseball team using the number of runs scored and runs allowed by that team. Improvements on the formula, such as the Pythagenport by Davenport and Woolner, and the Pythagenpat by Smyth and Patriot, allow for variation of the Pythagorean exponent and give a very good estimate for win percentages over a wide range of run environments. This article looks at possible improvements on the Pythagorean formula and its variants that take into account the shapes of the run distributions in addition to the run environment. There is a clear pattern that teams which have a higher slugging percentage score runs more consistently, i.e., their run distributions have a smaller standard deviation, and they tend to win more games than their Pythagorean expectation. This article also examines how metrics that use runs for evaluating teams and players might be adjusted to account for these effects.

Derivations of the Pythagorean expectation formula have been made under certain assumptions. Hein Hundal showed that if run distributions are independent log-normal distributions, then the Pythagorean exponent can be approximated by a particular function of the standard deviation and mean of a typical run distribution. Steven Miller showed that if the run distributions are Weibull distributions, then win percentages are given exactly by the Pythagorean formula, where the Pythagorean exponent is a parameter used to fit the run distributions to data. In both cases the Pythagorean exponent inferred from run distribution data is very close to the empirical value.

In the two derivations described above, the shape of the run distribution is determined for a given average runs per game (RPG), i.e., they are single-parameter distributions. Consequently, a team with a higher RPG is always predicted to win more games. Actual run distributions depend on many variables, such as the rates for walks, singles, doubles, triples and home runs. A team with the same runs scored and runs allowed will not necessarily have a win percentage exactly equal to .500 if the shapes of those run distributions are different. This paper reports on an investigation of these effects and discusses their potential consequences:

- How the shapes of run distributions affect win percentage will be examined using four different sources for the distributions: (i) actual runs scored data, (ii) a log-normal distribution, which has two parameters that can be taken as the mean and standard deviation of the distribution, (iii) a toy model where a given team hits only one type of base hit (single, double, or home run), and (iv) a Markov chain model where a team can have a mixed batting profile. In each case the more consistent a team is in scoring runs (i.e., the narrower the run distribution, or the smaller the standard deviation), the better its win percentage, for fixed RPG.

- Modifications of the Pythagorean formula are proposed that fit all of these data sets significantly better than the standard forms (including the Pythagenpat version).

- Using the results from above, the implications for evaluating players and building a team will be discussed.

First, run distributions from 1999-2008 were used to find win percentages in head-to-head match-ups (435 combinations each year for a total of 4350 pairings of teams). This is done as follows. If team *i* has probability *Pi(n)* of scoring *n* runs in a game, then its win/loss ratio against team *j* with probabilities *Pj(n)* is

* W/L = [Sum_n Pi(n) (Sum_(n>m) Pj(m))] / [Sum_n Pi(n) (Sum_(n<m) Pj(m))]*,

where it is assumed that ties are decided with the same ratio. For pairs of teams with approximately the same RPG (within 1%), the win percentage varies from .500 by as much as .035, compared to the maximum deviation of about .005 predicted by the Pythagorean formula for two teams with RPG within 1%.

The next step is to show that much of the deviation is due to the different shapes of the distributions. Using the standard deviation from the mean as a measure of distribution shape, a strong correlation (with correlation coefficient 0.92) was found between the win percentage and the ratio of the standard deviations of the two teams. The win percentage varies inversely with the standard deviation ratio, which means that a more consistent team (i.e., one with the smaller standard deviation) tends to do better than the Pythagorean expectation, while an inconsistent team (larger standard deviation) does worse.

There is a similar spread in win percentages and correlation with standard deviation ratios for teams with different RPG. Dividing the pairs of teams into groups with approximately the same RPG *ratio* shows that the teams with the smallest (largest) standard deviation within a group tend to win the most (least) games. The degree of correlation between the win percentage and standard deviation ratio goes down as the RPG ratio deviates from one, but generally remains above 0.80.

There is also a mild dependence on the run environment (defined as the total RPG for both teams). A good fit to the data (with root mean square (RMS) error .0046 in win percentage) for the win/loss ratio (for Team 1 playing Team 2) is

* W1/L1 = (RPG1/RPG2)^a (SD2/SD1)^b,*

where *RPGi* and *SDi* are the RPG and standard deviation for team *i*, and

*a = **1.313 (RPG1 + RPG2)^.214, b = **1.020 (RPG1 + RPG2)^(** -**.356).*

This can be compared with an RMS error of 0.0109 when using the Pythagenpat formula, *W1/L1 = (RPG1/RPG2**)^a *with *a = (RPG1 + RPG2)^.287*, on the same data set.

The question now arises as to whether the differences in standard deviation are entirely due to random fluctuations (which are presumably not reproducible), or are due partly to the different shapes of the intrinsic run distributions of the teams. This was studied in three ways.

First, it was assumed that run distributions are log-normal. Log-normal distributions have two parameters, which can be taken as the mean and standard deviation. Allowing RPG to vary randomly between 4.0 and 6.0 and standard deviations to randomly vary between 2.0 and 3.0, a sample of 5000 pairs of teams was generated. Then for each team a run distribution for a 162-game season was randomly generated, to simulate the 1999-2008 data set. Then win percentages in the head-to-head match-ups were calculated as before.

As with the actual data, a strong correlation was found between the win percentage and standard deviation ratio for teams with approximately the same RPG ratio. However, although they are qualitatively similar, the win percentage did not have exactly the same relationship to RPG ratio and standard deviation ratio. This is not surprising since the intrinsic run distributions are not necessarily log-normal, but it does show that different shapes of the *intrinsic* run distribution can affect win percentage in a way similar to that seen in the actual data.

Next, if it is assumed that a given team has only one type of base hit (single with runners advancing one base, single with runners advancing two bases, double, or home run), then the runs per inning (RPI) distributions are relatively simple functions of the batting average (AVG). From these, RPG (actually, runs-per-27-outs) distributions were easily determined and win percentages for head-to-head match-ups were calculated. Ties were decided in extra innings using the RPI distributions. Since the RPI distributions are different from RPG distributions, the probability of winning in extra innings is not necessarily the same as the overall probability of winning the game.

The results for teams with RPG = 5.0 are displayed in the following table, which shows the win percentages for each possible match-up, as well as the standard deviation of a team’s run distribution and the prediction using the empirical formula from above.

Actual Win Pct. Vs. | Predicted Win Pct. Vs. | |||||||||

Team | AVG | St.Dev. | HR | 2B | 1B+ | 1B | HR | 2B | 1B+ | 1B |

HR | .156 | 2.43 | .500 | .522 | .537 | .549 | .500 | .524 | .539 | .550 |

2B | .284 | 3.02 | .478 | .500 | .516 | .529 | .476 | .500 | .515 | .525 |

1B+ | .373 | 3.45 | .463 | .484 | .500 | .513 | .461 | .485 | .500 | .511 |

1B | .441 | 3.79 | .451 | .471 | .487 | .500 | .450 | .475 | .489 | .500 |

Clearly the team with the more consistent offense (lower standard deviation) wins more, *even though the RPG are identical*. The additional wins per 162-game season can be as much as eight in the most extreme case (the HR team versus the 1B team). Also, the empirical formula derived from actual data gives fairly good predictions for these win percentages.

The extra-innings win percentages are even more different from the Pythagorean expectation. The typical excess win percentage in extra innings is 2.35 to 2.50 times the overall excess win percentage. For example, the HR team beat the 1B team 62% of the time in extra innings, primarily due to the fact that the HR scores in 40% of innings while the 1B team scores in only 24% of innings (although when they do score, they tend to score more runs).

Finally, to make a more realistic model it was assumed that a team has a given set of probabilities for getting a walk, single, double, triple or home run per plate appearance. Then using a Markov chain analysis the RPI distribution was found, and then the RPG distribution. Runners were given MLB-average probabilities of advancing an extra base on a hit (e.g., going from first to third on a single) or advancing one base on an out (e.g., scoring from third on an out).

To reduce the size of the (five-dimensional) parameter space, the triples rate was set at 0.5% (about the MLB average in 2009) and the doubles rate was set to 31% of the singles rate (also about the MLB average). Triples were frozen because they are relatively rare, and the doubles-to-singles ratio was fixed because it has the smallest variation of all of the possible hit ratios, at least for team totals. Then the batting profile is uniquely determined by the walk, single and home run rates, which can be converted to the traditional slash stats, AVG/OBP/SLG, or vice versa.

A random sample of 100 teams was generated assuming the following ranges: .220 to .300 for AVG, 0.050 to 0.100 for (OBP – AVG), and 0.080 to 0.170 for (SLG – AVG). These ranges cover most of the values seen for teams since 1900 (although the extremes for AVG are actually about .200 and .320, and .050 and .200 for (SLG – AVG)). The differences (OBP – AVG) and (SLG – AVG) were used, rather than OBP and SLG directly, since they gave more realistic slash stats.

From these 100 teams, the win percentage for each of the possible 4950 pairs of teams was determined using the RPG and RPI distributions. Since the Markov chain is an exact calculation, there is no noise in these distributions. Then a least-squares fit is made using the modified Pythagorean W/L formula, with best fit parameters

*a = **1.251 (RPG1 + RPG2)^.216, b = **1.328 (RPG1 + RPG2)^(**-**.430).*

The RMS error in the win percentage was 0.00077, compared to 0.00234 for the Pythagenpat formula with *a = (RPG1 + RPG2)^b*, where *b* is allowed to vary. The parameters derived from the Markov chain data set provide a decent fit to the actual data from 1999-2008 (RMS error .0062, only .0016 above the best fit), while the parameters that fit the actual data do not do nearly as well on the Markov chain data set (RMS error .00778, ten times the best fit), which suggests that the Markov chain-derived parameters are better, presumably because they are derived from a data set based on the intrinsic batting profiles without any noise.

There is a strong anti-correlation between standard deviation and SLG in this data set, so instead of using standard deviation as a measure of consistency, SLG can also be used

* W1/L1 = (RPG1/RPG2)^a (SLG1/SLG2)^b*

where

*a = 0.723 (RPG1 + RPG2)^.373* and *b = 0.977 (RPG1 + RPG2)^(** -**.947)*

are the best fit parameters to the Markov chain data set, with RMS error .00068 (slightly better than when the standard deviation is used). No other parameters were found that provided a better fit than RPG and SLG.

This formula may now be used to determined how much a higher SLG is worth *given the same RPG*. From the derivative of the *W/L* formula it is easy to show that the extra SLG (for fixed RPG) needed to give the same increase in win percentage as one additional run is (D*elta SLG)* = (*a SLG)*/(*b R*), where *R* is the number of runs in a season. For the current run environment, this gives (*Delta* SLG) = 0.008, so a player that has a SLG .072 higher than another player (so that the team SLG is .008 higher) should be rated one run better than they would be otherwise, compared to the other player. Similarly, a team SLG that is 0.080 higher for the same RPG would be worth the equivalent of ten more runs, or about one win per season for a typical team in the current run environment.

By varying base running (e.g., a runner taking an extra base on a hit), a similar analysis can be done. For example, if a team increases its probability of taking the extra base by .075 (e.g., going from first to third on a single 32.5% of the time instead of 25% of the time) without changing its RPG, it would be worth the equivalent of about one extra run per season. This is a much smaller effect than having a different batting profile, but it is consistent with the earlier findings that the 1B+ team won more games than the 1B team, even though they had the same RPG.

The consequences for building a team follow directly from these results. If two teams have the same value in runs (using whatever metric you prefer), and one team has a SLG that is .080 higher, then that team should expect to win about one more game a season, *even though it has the same RPG as the other team*. Therefore one should choose players with higher SLG if run values are equal, and in some cases a higher SLG even if their value in runs is slightly less.

What is good for the offense must be bad for the defense – for the same expected RPG allowed, the pitcher with the *lower* SLG allowed will help its team win more games. If one set of pitchers had a SLG allowed .080 less than another set of pitchers for the same RPG allowed (presumably because they allow more walks and singles but not as many extra base hits), the first set of pitchers would be worth about one extra win per season.

Although these effects are not large for a single player, a concerted effort by a team to fill its roster with offensive (defensive) players that have a higher (lower) SLG (SLG allowed) for the same overall run value could be worth as much as two additional wins per season.

*This article is adapted from a paper submitted to the 2010 MIT Sloan Sports Analytics Conference.*

Tags: consistency, Pythagorean expectation, RPG, standard deviation

December 13th, 2009 at 9:21 am

“If two teams have the same value in runs (using whatever metric you prefer), and one team has a SLG that is .080 higher, then that team should expect to win about one more game a season, even though it has the same RPG as the other team. Therefore one should choose players with higher SLG if run values are equal, and in some cases a higher SLG even if their value in runs is slightly less.

Although these effects are not large for a single player, a concerted effort…could be worth as much as two additional wins per season.”

Kerry, shoulda titled this article “How To Get Two Extra Wins From The Same Number Of Runs.” The key sentences are very near to the end of the article.

Other than that, good analysis! I’d write more but my brain hurts.

December 13th, 2009 at 9:35 am

Bill,

Hah, I wrote the article as an academic paper, hence the dry tone. I changed the excerpt along the lines of your suggestion, but I like he title as is.

Sorry your brain hurts

December 13th, 2009 at 12:33 pm

I think I’ll wait for the movie..:)

December 13th, 2009 at 1:25 pm

Never before have my eyes glazed over BEFORE I started reading an article.

The sad thing is as I write that, I’m perfectly aware that it’s more of a comment on me than you.

Hey Chuck… I’ll buy the popcorn.

December 13th, 2009 at 10:11 pm

Kerry, will read this when time permits: can you check my query/study proposal to you in the comments for the 12-8 HOF article?

December 14th, 2009 at 6:20 am

Kerry, interesting article. Sometimes when I see so many numbers and terms, my eyes do a little glazing over, but it was neat to see this. Basically if two teams have the same RPG, the team with the higher SLG% will probably win more.

December 14th, 2009 at 8:21 am

Kerry, very interesting article.

There has been some previous work in the area of which I’m not sure if you’re aware. I thought these might help you flesh out your ideas.

The Weibull distribution has been found to model both run distribution and the Pythagorean formula very well. Here’s the theoretical paper by Steven Miller:

http://arxiv.org/PS_cache/math/pdf/0509/0509698v4.pdfI did a lot of work with the Weibull distribution and its relationship the Pythagorean record.

http://www.hardballtimes.com/main/article/feast-or-famine-first-draft/

http://www.hardballtimes.com/main/article/avoiding-the-famine/

http://www.hardballtimes.com/main/article/consistency-is-key/

http://www.hardballtimes.com/main/article/consistency-is-key-part-two/

http://www.hardballtimes.com/main/article/consistency-is-inconsistent/

Keith Woolner looked at this a while ago, too:

http://www.baseballprospectus.com/article.php?articleid=472

I hope you find these articles helpful in going forward with your study.

December 14th, 2009 at 8:53 am

I tried to post a comment with some links, but maybe it was marked as spam. Kerry, I just wanted to let you know about some previous work done in this area. I have a bunch of articles that you will probably find useful. I’ve posted my comment at the Baseball Think Factory, which has a discussion thread about this article. Please email me if you want to talk further. My gmail username is sbaxamusa.

December 14th, 2009 at 9:06 am

Lee, yes, and a .080 higher SLG is worth about one win per season. And if your pitchers have a .080 lower SLG allowed, that’s worth a win per season also (each comparison is being made to some other team with the same RPG and RPG allowed).

December 14th, 2009 at 9:20 am

Mike, I replied to your HoF question in that thread.

December 14th, 2009 at 8:08 pm

salb918, your first (and perhaps most important) link seems not to work.

December 14th, 2009 at 8:25 pm

Bill, Sal accidentally included an ‘I’ at the end, I think. Try:

http://arxiv.org/PS_cache/math/pdf/0509/0509698v4.pdf

This worked for me.

December 14th, 2009 at 8:29 pm

I should mention that the Miller paper is the one that ‘proved’ the Pythagorean formula using Weibull distributions that I mentioned (very cool BTW).

December 14th, 2009 at 8:46 pm

Kerry, thanks, the new link works.

“Proved,” though? Not quite: the proof assumes that “the runs scored and allowed are statistically independent,” which disregards Park Factor and, probably more important, “Meteorology Factor.” All teams score more runs in Wrigley, Coors and Fenway; all teams score fewer runs in PETCO; and all teams certainly score more runs on dry, hot summer days than they score in early April or late September sleet or freezing rain.

December 14th, 2009 at 9:37 pm

Hah, well if you want to quibble (which is fine BTW), Weibull distributions are continuous, whereas run distributions are discrete. But given the assumptions, it’s a very nice proof.

And the fact that it comes up with a value for the Pythagorean exponent that is close to the actual value suggests that the assumptions aren’t far off the mark.

In physics we make simplifying assumptions all the time, and as long as it’s a good approximation they can lead to very useful results. OTOH, there’s a joke about physicists who make assumptions that are too simple, such that they no longer describes the actual situation:

A mechanic, an engineer and a physicist are asked by a farmer to design a chicken-plucking machine. They all go off to think, and the physicist returns almost immediately. The farmer is amazed that the physicist has solved the problem so fast and asks for the design. “First,” says the physicist,” you assume a spherical chicken.”

There are many versions of this joke, this is the one taught me in E&M class as an undergrad.

December 16th, 2009 at 2:24 am

I hated math in school, just write me a very condensed summary Kerry.

December 16th, 2009 at 11:22 am

LOL, I asked Adam if I should post a DC-friendly version, but he wanted the whole thing.

Bottom line: More consistent teams (narrower run distribution) tend to win more games for the same RPG. Teams with higher SLG tend to have a narrower run distribution. Given two teams with the same RPG, a team with a SLG .080 higher will on average win one more game a season. If their pitching/defense has the same RPG allowed but a SLG allowed .080 lower, that would add another game.

December 16th, 2009 at 9:45 pm

Gotcha

March 1st, 2010 at 12:47 pm

Kerry, having read some of the feedback on this article, it seems like the big issue that people want more information on is the correlation between SLG and standard deviation of run distribution. You use only a single sentence to establish this relationship–”There is a strong anti-correlation between standard deviation and SLG in this data set, so instead of using standard deviation as a measure of consistency, SLG can also be used.” Perhaps you could expand a bit more on that relationship–a graph would go a long way.

March 1st, 2010 at 12:52 pm

Re: anti-correlation between SLG and standard deviation of run distribution

A graph showing the correlation in the simulation would be great. If there were a way to showing the correlation in real MLB data, that might be even better, tho if you just mapped SLG vs standard deviation of runs scored, you’d probably drown out the relationship because higher SLG will result in more runs scored, in turn inflating the standard deviation. You’d have to somehow compensate for the relationship between SLG and runs scored.

March 28th, 2010 at 2:26 pm

[...] a previous Dugout Central article, it was shown that scoring consistency – the tendency of a team to score roughly the same number [...]