Reconstituted Record

This is a fairly stat-heavy post, so please feel free to ignore it if that’s not your bag. If you don’t want to learn about stats, but still want to learn, this is a really cool site. This defies description.

I trust that most people have realized at this point that a team’s W-L record doesn’t tell the whole story. A great case is this year’s Colts; they scored fewer points than they allowed yet carried an 11-5 record. It’s a part of baseball and life that sometimes things don’t turn out the way statistics say they should (and in fact, it’s much more interesting that the better team doesn’t always win, or that the Orioles are allowed to make the playoffs once a decade). 

There are many attempts to synthesize a more accurate rating system for a team. You’ve likely heard of Pythagorean W-L, in which a teams runs are taken to some exponent and divided by that same number plus that team’s runs allowed to the same exponent.

Pythagorean

This is generally a pretty good way to get closer to a team’s true W-L record. Clay Davenport has done some good work on this subject in the past. However, my problem with the stat is that it just isn’t intuitive. It makes sense from an esoteric standpoint, but there don’t seem to be baseball reasons why 1.73 is a better exponent to use than 1.78, except from a baseball fit standpoint.

There are also regression models that one can use: for instance; the formula for a team’s winning percentage last year was .500057 + .098455*RDPG (Run Difference Per Game) (current to 07/26 of last year, I gave up after the Cubs were so awful. Suffice to say, it does not change much). The Adjusted R-square for this formula was .732398. This is around as accurate a formula as I’ve found (I do the regressions in excel), but it’s actually more esoteric.

Eventually, I decided to come at it from a different angle. I asked myself what would happen if I compared the Cubs’ run total game by game with how many runs they allowed in every game?

First, an obvious caveat. What happens in game 52 is more-or-less dependent on what has already happened in game 52. While a bunt/sac fly is clearly not optimal in every case, in some cases it is more optimal then others, and so it may not be fair to compare game 52′s 4 runs with game 107′s 5 runs allowed (as those runs may have been reached in much different ways). 

That being said, I think it does a pretty good job of telling you what a team’s true talent level is. Essentially, you record the number of times a team has scored and allowed each number of runs. You then compare the two columns to see how often each run total would have won or lost. Then, once you have a winning percentage based on how many times each run total exceeded the number of runs allowed (in ties, you win 50% of the time), you calculate expected wins from the number of times each scenario actually occured. 

For example:

2012 Cubs Scored Allowed Win% EffWins
0 16 9 0.028 0.4
1 24 19 0.114 2.7
2 16 20 0.235 3.8
3 25 23 0.367 9.2
4 25 16 0.488 12.2
5 24 17 0.590 14.1
6 8 8 0.667 5.3
7 8 18 0.747 6.0
8 7 10 0.833 5.8
9 2 9 0.892 1.8
10 1 7 0.941 0.9
11 1 3 0.972 1.0
12 3 1 0.985 3.0
13 1   0.988 1.0
14 1   0.988 1.0
15   1 0.991 0.0
16     0.994 0.0
17   1 0.997 0.0
  162 162   68.2

The Cubs scored 0 runs a staggering 16 times. They obviously could only win a game in which their opponent scored 0 as well: that happened 9 times, so they win 4.5 of those games. 4.5/162 is a winning percentage of 2.8% (5.6% of the time your opponent will score 0 runs at the “end of regulation,” and 50% of the time you win those games if you score 0 runs too), and 2.8% * 16 (the number of times this scenario occured) is ~ 0.4 Effective Wins.

When the Cubs score 1 run, they usually don’t win…but they would have any time they allowed 0 (and again, half of the times they allowed 1). The 24 times they scored 1 run resulted in around 2.7 Effective Wins.

When you’ve solved that for every total the Cubs reached this year, you add up all the Effective Wins and get a reconstituted record. In this case, the Cubs were 68.2 and 93.8. That means they were unlucky last year, to the tune of 7.2 wins.

This is where that caveat comes in. First, it’s difficult to attribute all of those 7.2 wins to luck, because in many cases (where close games are concerned) bullpens and other circumstances have a much greater leverage than they would otherwise: in those cases, it might be disingenuous to assume each team wins half of their ties, for instance, or even win games in which they are 1 run ahead or behind at the same rate. Perhaps that team loses (or wins) as many as 55% of those close games (this seems EXTREMELY unlikely, but not unthinkable). In that case, you can perform a similar action to see how many of these games occur:

2012 Cubs Scored Allowed Win% EffWins “Close”% Eclose
0 16 9 0.028 0.4 0.2 2.8
1 24 19 0.114 2.7 0.3 7.1
2 16 20 0.235 3.8 0.4 6.1
3 25 23 0.367 9.2 0.4 9.1
4 25 16 0.488 12.2 0.3 8.6
5 24 17 0.590 14.1 0.3 6.1
6 8 8 0.667 5.3 0.3 2.1
7 8 18 0.747 6.0 0.2 1.8
8 7 10 0.833 5.8 0.2 1.6
9 2 9 0.892 1.8 0.2 0.3
10 1 7 0.941 0.9 0.1 0.1
11 1 3 0.972 1.0 0.1 0.1
12 3 1 0.985 3.0 0.0 0.1
13 1   0.988 1.0 0.0 0.0
14 1   0.988 1.0 0.0 0.0
15   1 0.991 0.0 0.0 0.0
16     0.994 0.0 0.0 0.0
17   1 0.997 0.0 1.0 0.0
  162 162   68.2 3.9 45.9

As you can see, the Cubs played in 45.9 effective games where the score difference was -1 to 1. If instead of winning “50%” of those games (which this model does not believe; it is not centered about any win percentage in games decided by precisely 1 run in either direction), you could theoretically argue up to some number 45-55% of the time. That would give you a “margin of error” of around 2.25 wins. This won’t always be the case (and there isn’t necessarily a good reason to believe the 45-55% number), but it’s good enough for this exercise. 

Bringing it all back to the Cubs, I’d say that the Cubs “deserved” to win 68.2 games, with a +/- of 2.3 taking bullpen considerations and some nebulous clutchiness into account. For reference, the Cubs had a Pythagenpat (a further developed Pyth W/L) record of 65.2-96.8 and an actual record of 61-101. I’m not certain if my system is more or less accurate than the more refined methods out there, but it DOES seem intuitive to me, and it is almost certainly better than W/L. 

This is a rating system that seems pretty new to me, so it’s not well-tested or even necessarily well-thought out. If you’ve heard of this before, please let me know, along with any critiques of the system you might have. Hopefully this has been entertaining.

 Update: Full list here.

Team Recon Wins Diff
CHC 68.2 61 7.2
HOU 61.2 55 6.2
BOS 73.7 69 4.7
KCR 76.5 72 4.5
NYM 77.8 74 3.8
COL 66.9 64 2.9
STL 90.8 88 2.8
DET 90.7 88 2.7
TBR 92.5 90 2.5
TOR 75.0 73 2.0
ARI 82.7 81 1.7
CHW 86.7 85 1.7
MIA 70.6 69 1.6
MIL 84.5 83 1.5
NYY 95.3 95 0.3
SEA 74.8 75 -0.2
LAD 85.6 86 -0.4
PIT 78.5 79 -0.5
LAA 88.2 89 -0.8
PHI 80.1 81 -0.9
SDP 74.6 76 -1.4
TEX 90.0 93 -3.0
WSN 94.6 98 -3.4
MIN 61.7 66 -4.3
ATL 89.6 94 -4.4
CLE 63.4 68 -4.6
SFG 88.7 94 -5.3
CIN 91.6 97 -5.4
OAK 88.2 94 -5.8
BAL 84.3 93 -8.7
Quantcast