Cubs Games Scores #1 (April 2013)

At the beginning of the season, I made a post with a new take on Game Score. That was the DiPS Game Score, found here.

Tom Tango, a consultant for the Cubs, did similar work along these lines. He showed up in the comments (I still can't believe how cool that is, by the way) and pointed me towards his previous work, which can be found here. He expressed interest in some further analysis along these lines; with that in mind, I'm going to review each Cubs start this year, assigning each of the (now 6, including Bill James' original and my modified DiPS) scores to all of the games.

First, we'll look at the top and bottom 3 games for each score.

Bill James

#1 Jeff Samardzija, 4/1/2013 (8 IP, 2 H, 0 R, 1 BB, 9 SO, 0 HR) – 86

#2 Scott Feldman, 5/1/2013 (9 IP, 3 H, 2 R, 1 BB, 12 SO, 2 HR) – 84

#3 Carlos Villaneuva, 4/12/2013 (8 IP, 3 H, 0 R, 1 BB, 3 SO, 0 HR) – 74

#3 Scott Feldman, 4/5/2013 (4.2 IP, 5 H, 4 R, 4 BB, 1 SO, 1 HR) – 35

#2 Scott Feldman, 4/11/2013 (4.1 IP, 7 H, 6 R, 3 BB, 3 SO, 0 HR) – 33

#1 Edwin Jackson, 4/30/2013 (4.2 IP, 11 H, 8 R, 2 BB, 6 SO, 1 HR) – 14

Pure Runs

#1 Jeff Samardzija, 4/1/2013 (8 IP, 2 H, 0 R, 1 BB, 9 SO, 0 HR) – 91

#2 Carlos Villaneuva, 4/12/2013 (8 IP, 3 H, 0 R, 1 BB, 3 SO, 0 HR) – 87

#3 Travis Wood, 4/4/2013 (6 IP, 1 H, 0 R, 2 BB, 4 SO, 0 HR) – 78

#3 Scott Feldman, 5/1/2013 (9 IP, 3 H, 2 R, 1 BB, 12 SO, 2 HR) – 78

#3 Edwin Jackson, 4/14/2013 (5.1 IP, 5 H, 5 R, 4 BB, 9 SO, 0 HR) – 24

#2 Scott Feldman, 4/11/2013 (4.1 IP, 7 H, 6 R, 3 BB, 3 SO, 0 HR) – 8

#1 Edwin Jackson, 4/30/2013 (4.2 IP, 11 H, 8 R, 2 BB, 6 SO, 1 HR) – -10

SO-BB

#1 Scott Feldman, 5/1/2013 (9 IP, 3 H, 2 R, 1 BB, 12 SO, 2 HR) – 77

#2 Jeff Samardzija, 4/7/2013 (5.2 IP, 4 H, 4 R, 4 BB, 13 SO, 0 HR) – 69

#3 Jeff Samardzija, 4/1/2013 (8 IP, 2 H, 0 R, 1 BB, 9 SO, 0 HR) – 67

#3 Edwin Jackson, 4/25/2013 (6 IP, 5 H, 3 R, 4 BB, 4 SO, 0 HR) – 42

#2 Scott Feldman, 4/11/2013 (4.1 IP, 7 H, 6 R, 3 BB, 3 SO, 0 HR) – 42

#1 Scott Feldman, 4/5/2013 (4.2 IP, 5 H, 4 R, 4 BB, 1 SO, 1 HR) – 33

FIP (Tom Tango)

#1 Jeff Samardzija, 4/1/2013 (8 IP, 2 H, 0 R, 1 BB, 9 SO, 0 HR) – 75

#2 Jeff Samardzija, 4/7/2013 (5.2 IP, 4 H, 4 R, 4 BB, 13 SO, 0 HR) – 68

#3 Jeff Samardzija, 4/13/2013 (6 IP, 7 H, 2 R, 4 BB, 13 SO, 0 HR) – 62

#3 Jeff Samardzija, 4/19/2013 (7 IP, 6 H, 5 R, 1 BB, 4 SO, 2 HR) – 37

#2 Travis Wood, 4/27/2013 (6 IP, 3 H, 2 R, 1 BB, 5 SO, 2 HR) – 36

#1 Scott Feldman, 4/5/2013 (4.2 IP, 5 H, 4 R, 4 BB, 1 SO, 1 HR) – 29

DiPS (Myles Handley)

#1 Jeff Samardzija, 4/1/2013 (8 IP, 2 H, 0 R, 1 BB, 9 SO, 0 HR) – 86

#2 Scott Feldman, 5/1/2013 (9 IP, 3 H, 2 R, 1 BB, 12 SO, 2 HR) – 77

#3 Jeff Samardzija, 4/7/2013 (5.2 IP, 4 H, 4 R, 4 BB, 13 SO, 0 HR) – 72

#2 Scott Feldman, 4/11/2013 (4.1 IP, 7 H, 6 R, 3 BB, 3 SO, 0 HR) – 38

#2 Travis Wood, 4/27/2013 (6 IP, 3 H, 2 R, 1 BB, 5 SO, 2 HR) – 38

#1 Scott Feldman, 4/5/2013 (4.2 IP, 5 H, 4 R, 4 BB, 1 SO, 1 HR) – 18

Linear Weights

#1 Jeff Samardzija, 4/1/2013 (8 IP, 2 H, 0 R, 1 BB, 9 SO, 0 HR) – 94

#2 Carlos Villaneuva, 4/12/2013 (8 IP, 3 H, 0 R, 1 BB, 3 SO, 0 HR) – 84

#3 Scott Feldman, 5/1/2013 (9 IP, 3 H, 2 R, 1 BB, 12 SO, 2 HR) – 82

#3 Scott Feldman, 4/5/2013 (4.2 IP, 5 H, 4 R, 4 BB, 1 SO, 1 HR) – 34

#2 Scott Feldman, 4/11/2013 (4.1 IP, 7 H, 6 R, 3 BB, 3 SO, 0 HR) – 32

#1 Edwin Jackson, 4/30/2013 (4.2 IP, 11 H, 8 R, 2 BB, 6 SO, 1 HR) – 10

I thought it would be illustrative to show first what the consensus outlier performances were. In that vein, I gave each system a "vote" for the 3 worst and 3 best (3-2-1). These were those results:

Best

#1 Jeff Samardzija, 4/1/2013 (16 pts) (3-3-1-3-3-3)

#2 Scott Feldman, 5/1/2013 (9 pts) (2-1-3-0-2-1)

#3 Carlos Villaneuva, 4/12/2013 (5 pts) (1-2-0-0-0-2)

#3 Jeff Samardzija, 4/7/2013 (5 pts) (0-0-2-2-1-0)

Worst

#1 Scott Feldman, 4/5/2013 (11 pts) (1-0-3-1-3-3)

#2 Scott Feldman, 4/11/2013 (10 pts) (2-2-2-0-2-2)

#3 Edwin Jackson, 4/30/2013 (7 pts) (3-3-0-0-0-1)

Additionally, I took a look at the games with the lowest standard deviation among game scores. These are sort of the platonic ideal of what a Game Score is (because, for good or ill, all of these types come to a relative consensus).

#1 Edwin Jackson, 4/25/2013 (6 IP, 5 H, 3 R, 3 ER, 4 BB, 4 SO, 0 HR), 3.8 point standard deviation (Largest Deviation: SO/BB)

#2 Jeff Samardzija, 4/13/2013 (6 IP, 7 H, 2 R, 2 ER, 1 BB, 5 SO, 0 HR), 4.2 point standard deviation (Largest Deviation: LW)

#3 Edwin Jackson, 4/3/2013 (5 IP, 3 H, 2 R, 2 ER, 1 BB, 5 SO, 0 HR), 4.2 point standard deviation (Largest Deviation: LW) Interestingly enough, LW "overvalues" this start and "undervalues" the second one.

These are all what you would put in the family of "quality starts." The average game scores (respectively) are 48.7, 57.2, 57.1.  These are perhaps not as illustrative as you'd like, so here are the closest "bad" and "good" starts by standard deviation so far:

Scott Feldman, 4/5/2013 (4.2 IP, 5 H, 4 R, 4 BB, 1 SO, 1 HR), 6.3 point standard deviation (Largest Deviation: DiPS)

Scott Feldman, 5/1/2013 (9 IP, 3 H, 2 R, 1 BB, 12 SO, 2 HR), 9.4 point standard deviation (Largest Deviation: FIP)

Something very important to consider when comparing one Game Score metric to another is that their variances, well, vary. FIP by nature has a very low variance, as does SO/BB. On the other hand, both LW and Runs have much higher than normal variance. To get a more accurate picture of how these systems have varied historically (and I'd like to thank Berselius for reminding me that this is important to do), I took the first 300 games pitched in 2012 and computed the scores for each of them.

BJ: 16.5 standard deviation, 51.6 average

Runs: 25.9 standard deviation, 48.9 average

SO/BB: 9.5 standard deviation, 50.9 average

FIP: 15.7 standard deviation, 48.4 average

DiPS: 21.3 standard deviation, 46.3 average

LW: 21.1 standard deviation, 48.2 average

The upshot of this is that a 65 in FIP is much more impressive than a 65 in Runs (to whit: of the 27 starts the Cubs have made this year, 9 of them have a 65 or greater in Runs, and just 2 in FIP). This makes a direct comparison like the one above quite a bit more difficult. To really do so, you'd likely have to rebrand each game score by standard deviations from the mean, and make comparisons that way. That's both beyond the scope of the analysis I wish to perform, and perhaps obfuscating as well.

Going back to the first bit (the 3 best and worst Cubs starts so far by score aggregate), it's fairly interesting to see how often each game score put a game in the top 3.

BJ: 3 of 4 best, 3 of 3 worst

Runs: 3 of 4 best, 2 of 3 worst

SO/BB: 2 of 4 best, 2 of 3 worst

FIP: 2 of 4 best, 1 of 3 worst

DiPS: 3 of 4 best, 2 of 3 worst

LW: 3 of 4 best, 3 of 3 worst

FIP was the worst at placing it's extremes into the consensus, and LW and BJ were both perfect at it. I'm not sure this is entirely relevant, but it's kind of fun.

It's difficult to really say which of these are better or worse. I'm partial to my own system (the standard deviation is such that the majority of starts should ball between 25-67; that is enough to clearly show which start is "better" without such a huge deviation as to brand merely good starts as stellar) and FIP (for the same reason, but his is even better because of the near-perfect standard deviation), but they all have merits. The differences between my system and Tom's are really not so different (and his are perhaps rooted in a stronger statistical background than my own, for what that's worth); we merely give slightly different weights to SO, BB, HR, and I give a slight incentive to accumulate outs (he has 40 + 2.5 per inning, I use 15 + 6 per inning). Runs is clearly the most volatile (and in my opinion least illustrative) score available (and a good way to show discrepancy is Edwin Jackson's March 30th start: -10 in Runs, 54 in SO/BB). The most optimal standard deviation IMO is from Bill James' version: I'd like a standard deviation of 15 centered around 50 if possible.

If we were to synthesize these scores into a cohesive group, we would unfortunately have to start by adjusting each score for their variance. To do so, I weighted each score in such a way that the ones with a higher variance were weighted lower and vice versa. 

BJ: 16.8%
Runs: 10.6%
SO/BB: 28.9%
FIP: 17.6%
DiPS: 13.0%
LW: 13.1%

If you wanted to "ballpark it," you could go (15/10/30/20/10/15).

This change in weight puts each score on the same playing field. What follows would be the top 3 scores if all systems were weighted equally in this way:

#1 Jeff Samardzija, 4/1/2013: 80.2 SuperScore (#1 consensus score and "dumb" average of 83.3)

#2 Scott Feldman, 5/1/2013: 75.3 SuperScore (#2 consensus score and "dumb" average of 75.7)

#3 Carlos Villaneuva, 4/12/2013: 65.9 SuperScore (#3 consensus score and "dumb" average of 69.8)

#1 Scott Feldman, 4/5/2013: 30.4 SuperScore (#1 consensus score and "dumb" average of 29.8)

#2 Edwin Jackson, 4/30/2013: 31.4 SuperScore (#3 consensus score and "dumb" average of 25.6)

#3 Scott Feldman, 4/11/2013: 36.0 SuperScore (#2 consensus score and "dumb" average of 33.5)

For reference, the Wood 20-K game would have a SuperScore of 107.2. That's pretty cool.

The closest game to 50 in this group would then be Travis Wood, 4/27/2013. That's 6 IP, 3 H, 2 R, 1 BB, 5 SO, 2 HR. 

A way to know that we're likely on the right track is to consider what standard deviation really means. In a normal distribution, 68.2% of each data point should fall within a single standard deviation of the mean. In this case, the standard deviation of our new score is 11.3 (and the average of 53.1 illustrates that Cubs' starting pitching is slightly better than average thus far). We would therefore expect 18 of the 27 data points to fall within the range (41.8, 64.4); so far, 21 do. On the extreme end, we'd expect either 0 or 1 value both 2 standard deviations higher and lower then the mean of 53.1; we've got one on the high end and one on the low. I imagine that as the sample size grows, these will hold as well.

This, of course does little to adjust for the "accuracy" of any of the 6 game scores. Really, that seems difficult to run meaningful statistical analysis on. I don't like Bill James' version or Runs because they use things that are out of the pitcher's control, but you might like them because they use the currency of games (the run - not the unearned run) where as FIP/DiPS/LW use runs only tangentially and SO/BB not at all. I could run tests on interconnectivity and things like that, or regress each game score for a correlation with wins (and I might, someday), but honestly that stuff is somewhat beyond the scope of what I feel comfortable doing (and let's face it: this post is already extremely long). 

To summarize (and to answer Tom's question partways):

The "beauty" of the different game scores are such that no one person could probably answer it correctly (as to which one is better). The other cool thing is how relatively easy it is to compute it: to do this adjustment probably increases your work (over just using BJ) by an order of magnitude. However, we can start that discussion by realizing that the way the measure games is different as to need a weighting adjustment anyways. Those weights are above: I've also included the weights if you remove my personal game score from consideration here:

BJ: 19.3%
Runs: 12.2%
SO/BB: 33.3%
FIP: 20.2%
LW: 15.0%

Which reduces pretty nicely to the generalization of (20/15/30/20/15).

If you want to really stay in the spirit of Tom Tango's original piece, you'd also omit BJ from this. In that case, you've got these 4 coefficients:

Runs: 15.1%
SO/BB: 41.2%
FIP: 25.0%
LW: 18.6%

Which reduces to (15/40/25/20).

The next step, if you're adventurous, is to then toy with the new number to play with the standard deviation (currently 11.3, optimally whatever you want it to be, and remember the ultimate goal of Game Score is to simulate a winning percentage). 

When I get a good chunk of spare time (with a 4-week old at home, not the easiest thing to do), I'd really like to expand this to all games pitched this year. I built my file with this in mind, so it actually won't be too big of a pain.

Date Rslt AppDec GSc Runs SO/BB FIP DiPS LW
4/1/2013 W  3-1 GS-8  W 86 91 67 75 86 94
4/3/2013 L  0-3 GS-5  L 57 52 54 60 56 64
4/4/2013 W  3-2 GS-6  W 72 78 48 57 55 79
4/5/2013 L  1-4 GS-5  L 35 30 33 29 18 34
4/6/2013 L  5-6 GS-7 62 73 55 50 53 52
4/7/2013 L  1-5 GS-6  L 54 36 69 68 72 56
4/8/2013 L  4-7 GS-6  L 37 28 45 52 49 35
4/9/2013 W  6-3 GS-7 52 51 52 59 59 49
4/11/2013 L  6-7 GS-5  L 33 8 42 48 38 32
4/12/2013 W  4-3 GS-8 74 87 49 61 64 84
4/13/2013 L  2-3 GS-6  L 54 58 54 62 62 52
4/14/2013 L  7-10 GS-6 45 24 57 59 58 48
4/16/2013 L  2-4 GS-8  L 60 67 43 55 56 68
4/18/2013 W  6-2 GS-7  W 66 65 58 41 47 60
4/19/2013 L  4-5 GS-7  L 50 35 52 37 41 50
4/20/2013 L  1-5 GS-6  L 55 28 51 47 47 59
4/21/2013 L  2-4 GS-5  L 54 32 51 43 39 50
4/22/2013 L  4-5 GS-7 60 61 52 47 48 59
4/23/2013 W  4-2 GS-9 73 73 55 54 63 81
4/24/2013 L  0-1 GS-6  L 59 68 57 49 51 38
4/25/2013 W  4-3 GS-6 50 48 42 51 47 53
4/26/2013 W  4-2 GS-7  W 52 63 43 42 41 47
4/27/2013 W  3-2 GS-6  W 62 58 54 36 38 56
4/28/2013 L  4-6 GS-6  L 53 38 57 49 51 53
4/29/2013 W  5-3 GS-5 51 42 54 57 53 50
4/30/2013 L  7-13 GS-5  L 14 -10 54 45 41 10
5/1/2013 W  6-2 CG 9  W 84 78 77 58 77 82
Quantcast