On early season Pythagorean Win Expectancy and the Chicago Cubs

Through the month of April the Cubs have allowed 31 more runs than they’ve scored. This differential ranks 15th in the NL. Only Houston (-34) has been worse. In the American League the White Sox (-37) and Twins (-57) have been worse. Even the Pirates (-24) have been better. The differential between runs score and allowed is important.

Runs scored and runs allowed determines who wins and loses. Teams that have a good differential are almost always the better teams. Having the second worst differential in the NL, even this early in the season, isn’t a sign of good things to come.

Using runs scored and allowed we can calculate an expected winning percentage. Bill James called it the Pythagorean expectation because the formula he used reminded him of the Pythagorean Theorem. It’s since been altered as better predictors were found. The formula James used was this: Win% = RS2 / (RS2 + RA2).

It was later discovered that if you use an exponent around 1.8 you get better results. There’s yet another method that is called PythagenPat, which was developed by David Smyth and Patriot. That method uses an exponent based on the run scoring environment of each team. The formula for PythagenPat is this: RSx / (RSx + RAx) where x = RPG.287. RPG = runs per game (runs scored and runs allowed divided by the number of games).

Early in a season these predictors aren’t going to be that accurate. Imagine a 12-12 team who has scored 120 runs and allowed 120 runs. In the 25th game they get beat 16-2. Before that game their Pythag was .500 (same number of runs scored and allowed). After that one game PythagenPat drops to .461.

However, that doesn’t mean we can’t learn something from it. Obviously after 25 or so games we’ve learned something about the team and the teams they’ve played. Not enough to draw conclusions about the strength of the teams, but we’ve learned something.

I looked back at all the teams from 2001 through 2010. I calculated their Pythag% through April and then for the rest of the season. There are 300 teams over these 10 seasons. If we a look at the difference in Pythag between April and May through the end of the season we can get an idea of how much a team improved or worsened.

In 2006 the Minnesota Twins got off to a horrible start in April. After the calendar turned to May the Twins were unbelievably good. They had the largest positive differential between April Pythag and Rest of season Pythag at .321. Only four other teams were higher than .194: 2001 A’s (.289), 2004 Giants (.231), 2004 Expos (.225), 2002 Braves (.206). On the other end you also had five teams who worsened similar amounts: 2005 Marlins (-.278), 2001 Red Sox (-.262), 2009 Pirates (-.252), 2008 Diamondbacks (-.223), and the 2010 Rays (-.208). 

That doesn’t tell us a whole lot though. That just tells us what we already knew: teams regress towards the mean. We expect to find that the teams with the most improvement would be teams with a PythagenPat under .500 and vice versa. The five teams at the top had an average PythagenPat W% of .314. They got off to horrible starts. The bottom five were at .714. The .314 teams improved to .595 while the .714 teams were .469 the rest of the way. They were obviously very good teams that got off to horrible starts. The .714 teams were slightly worse than average.

The 36 teams that improved the most were all under .500. The 37th was .500 exactly. The 50 teams that had the largest negative differential were all over .500 in April. You know how they say you’re never as good as you are at your best or as bad as you are at your worst? Well, that’s what this is. Those who did exceptionally well in April were naturally going to get worse and the same is true with the teams who performed the worst in April.

The numbers tell us what expected to find. We’re more interested in using the data to see how teams similar to the Cubs performed the rest of the season. The Cubs scored 106 runs and allowed 137. The Cubs PythagenPat is .381. That’s about a 62 win team over 162 games. Obviously the Cubs aren’t that bad. We know teams regress toward the mean and we expect the Cubs will too. The Cubs performance in April is equal to what the 2001 White Sox and 2002 Phillies did. Among all 300 teams, those White Sox and Phillies teams had the 259th and 260th lowest PythagenPat in April. The Cubs current PythagenPat ranks in the bottom 13.3% of what all teams have done in April from 2001-2010.

If we break the 300 teams into 6 different groups it gives us an idea how much each team in each range improved or worsened. The top 50 teams had an average PythagenPat of .656 in April and .530 after. It’s .573 and .524 for the next 50 and on down to the final 50 which is .344 in April and .438 after. That’s the group the Cubs would fit into. They improved by .094 points the rest of the way.

The Cubs have been better than the average team in this group. The 50 just above the Cubs had a PythagenPat of .426 in April and .504 after (improvement of .078). The worst you were in April, the more you’ll likely improve the rest of the way and vice versa.

The 2011 Cubs are somewhere in between the final group and the one before it. A typical team in a similar situation over the last decade has improved by about .085. We can’t just conclude that the Cubs will have  PythagenPat of about .465 the rest of the way because of this. There’s a great deal of uncertainty involved. Also, the teams at the bottom were on average, significantly worse teams than those above them when the season began. This is evident by how well some of the teams in the group performed the rest of the season.

The Cubs sit at 12-14 right now and have some ground to make up already. The three teams ahead of the Cubs were all projected to win around 85 games. The Reds and Brewers are each about a game or two behind that pace while the Cardinals are ahead of that by a few games. Let’s say the Cubs have to win 88 games to reach the playoffs. The Cubs will have to play .559 baseball the rest of the season to get there. That’s roughly 91 wins over a full season. In order for the Cardinals to get to 88 wins they only need to play .533 baseball the rest of the way.

There were a few teams around where the Cubs would rank in PythagenPat who did significantly improve.  The 2006 Twins went from .305 to .626. The 2004 Giants went from .351 to .582. The 2005 Phillies went from .375 to .579. There were several other teams too. In fact. among the 90 worst in April Pythag, 16 of the went on to have a Pythag the rest of the season that was .559 or higher. 20 were .550 or higher. That’s a surprisingly high percentage (17-22%).

I wouldn’t count on it, but enough teams have done it that it wouldn’t be the least bit surprising. It’s a safe bet that nearly every one of those teams was better than the 2011 Cubs when the season began though. I can’t be sure without spending more time on this. Since we’re only talking about a month it just seems a waste of time to spend any more time on it than I already did.