### Wednesday, December 21, 2005

## The Power of OPS

Last year Depo went after players with more power, this year Mr. Ned is chasing guys with higher batting averages and less power. Many people use OPS (on base plus slugging) as a quick approximation of hitting skill. However, in Moneyball, Depo was quoted as saying OBP was four times as important as slugging. So what really leads to scoring runs? Or, put in statistical jargon, what stat or combination of stats has the highest correlation to runs scored?

I thought I'd take a look into these questions with a focus on the Dodgers. So I ran some regressions on Dodger runs scored per year compared to various stats in the same year since 1962 (i.e., since the club moved into Dodger Stadium). Actually I compared runs scored per game, otherwise the number of games would have the highest correlation. Here are the results:

stat | r^2 |

BA | 0.7192 |

OBP | 0.8226 |

SLG | 0.8169 |

OPS | 0.8868 |

In other words, variation in the Dodger's team batting average explains about 72% of the variation in runs scored by the Dodgers, OBP or SLG about 82%, and OPS about 89%. OPS is clearly the best of these stats, but what of Depo's idea of four times OBP plus slugging? Does this only apply to the A's? And what of other combinations of OBP and SLG? I found that on base percentage is more important than slugging for the Dodgers, but not quite as much as Depo found (for the A's?). Depo's 4*OPB+SLG has an r^2 of 0.8959, but the highest correlation I found was 2.4*OBP+SLG with an r^2 of 0.9017.

So why did Depo go after power hitters? Was it just that they were underpriced, and Moneyball had driven up the price of OBP? Take a look at the numbers above. While they show the importance of OBP, it's also clear that SLG is only slightly less important.

Or was it something else? Has the relative importance of slugging increased since the days of Koufax and Wills, at least for the Dodgers? To figure out if things had changed over time, I looked at the value of different combinations of OBP and SLG in predicting runs scored for each decade (note that Dodger Stadium opened in 1962, so for the sixties I used the years 1962 through 1970; other decades have the full ten years except the current one). Here are the combinations of OBP and SLG with the highest correlations to Dodger runs scored per decade, the corresponding r squared values, and for comparison, the r squared values for OPS over the same decade:

decade | formula | r^2 | r^2 for OPS | delta |

60s | 5.4*OBP+SLG | 0.9658 | 0.9525 | 0.0133 |

70s | 4.4*OBP+SLG | 0.9321 | 0.8904 | 0.0417 |

80s | 1.0*OBP+SLG | 0.8502 | 0.8502 | 0.0000 |

90s | 0.8*OBP+SLG | 0.9312 | 0.9307 | 0.0005 |

00s | 0.6*OBP+SLG | 0.9983 | 0.9943 | 0.0040 |

DS era | 2.4*OBP+SLG | 0.9017 | 0.8868 | 0.0149 |

So slugging clearly has become more important over time. There are many theories why this is the case, such as expansion watered down the quality of pitching, new parks like Coors, steroids, etc., but why is an entirely different topic and I choose to punt. The important fact here is that for the Dodgers in recent years, getting on base holds no great edge over power. Further, the gain in predictive value of other combinations of OBP and SLG over the simple 1 and 1 addition represented by OPS is minimum (except in the 70s).

What about the more involved metrics like runs created (RC), equivalent average (EqA), etc? Well, Baseball Reference doesn't have the component stats (such as HBP and SF) necessary to calculate these metrics, and I was too lazy to look them up elsewhere. I did find that adding stolen bases and caught stealing to the mix would allow you to squeeze a little more predictive power. For the Dodger Stadium era, the maximizing formula is 1.8*OBP+SLG+(1.9*SB-CS)/(AB+BB). In correlation to runs scored, it yields an r squared of 0.9294 -- or nearly 4% better than OPS alone at predicting runs scored. However, as shown in the chart below, the value and predictive ability of SB and CS has decreased over time.

decade | formula | r^2 | r^2 for OPS | delta |

60s | 1.9*OBP+SLG+(1.9*SB-CS)/(AB+BB) | 0.9907 | 0.9525 | 0.0382 |

70s | 4.1*OBP+SLG+(1.3*SB-CS)/(AB+BB) | 0.9332 | 0.8904 | 0.0428 |

80s | 1.3*OBP+SLG+(0.8*SB-CS)/(AB+BB) | 0.8679 | 0.8502 | 0.0177 |

90s | 1.0*OBP+SLG+(0.4*SB-CS)/(AB+BB) | 0.9397 | 0.9307 | 0.0090 |

00s | 0.4*OBP+SLG+(-0.4*SB-CS)/(AB+BB) | 0.9996 | 0.9943 | 0.0053 |

DS era | 1.8*OBP+SLG+(1.9*SB-CS)/(AB+BB) | 0.9294 | 0.8868 | 0.0426 |

Hence, for purposes of comparing historical teams and players, employing formulas that incorporate SB and CS stats would be helpful. On the other hand, for evaluating current players or projecting runs scored for current Dodger teams, SB and CS add very little value. Further, OBP and SLG are typically more consistent for a given player from one year to the next than number of SB and CS -- so OPS may be predicted with higher accuracy than SB. Given its ease of use and fairly accurate predictive ability, it's hard to beat OPS.