Wednesday, December 21, 2005


The Power of OPS

Last year Depo went after players with more power, this year Mr. Ned is chasing guys with higher batting averages and less power. Many people use OPS (on base plus slugging) as a quick approximation of hitting skill. However, in Moneyball, Depo was quoted as saying OBP was four times as important as slugging. So what really leads to scoring runs? Or, put in statistical jargon, what stat or combination of stats has the highest correlation to runs scored?

I thought I'd take a look into these questions with a focus on the Dodgers. So I ran some regressions on Dodger runs scored per year compared to various stats in the same year since 1962 (i.e., since the club moved into Dodger Stadium). Actually I compared runs scored per game, otherwise the number of games would have the highest correlation. Here are the results:

stat r^2
BA 0.7192
OBP 0.8226
SLG 0.8169
OPS 0.8868

In other words, variation in the Dodger's team batting average explains about 72% of the variation in runs scored by the Dodgers, OBP or SLG about 82%, and OPS about 89%. OPS is clearly the best of these stats, but what of Depo's idea of four times OBP plus slugging? Does this only apply to the A's? And what of other combinations of OBP and SLG? I found that on base percentage is more important than slugging for the Dodgers, but not quite as much as Depo found (for the A's?). Depo's 4*OPB+SLG has an r^2 of 0.8959, but the highest correlation I found was 2.4*OBP+SLG with an r^2 of 0.9017.

So why did Depo go after power hitters? Was it just that they were underpriced, and Moneyball had driven up the price of OBP? Take a look at the numbers above. While they show the importance of OBP, it's also clear that SLG is only slightly less important.

Or was it something else? Has the relative importance of slugging increased since the days of Koufax and Wills, at least for the Dodgers? To figure out if things had changed over time, I looked at the value of different combinations of OBP and SLG in predicting runs scored for each decade (note that Dodger Stadium opened in 1962, so for the sixties I used the years 1962 through 1970; other decades have the full ten years except the current one). Here are the combinations of OBP and SLG with the highest correlations to Dodger runs scored per decade, the corresponding r squared values, and for comparison, the r squared values for OPS over the same decade:

decade formula r^2 r^2 for OPS delta
60s 5.4*OBP+SLG 0.9658 0.9525 0.0133
70s 4.4*OBP+SLG 0.9321 0.8904 0.0417
80s 1.0*OBP+SLG 0.8502 0.8502 0.0000
90s 0.8*OBP+SLG 0.9312 0.9307 0.0005
00s 0.6*OBP+SLG 0.9983 0.9943 0.0040

DS era 2.4*OBP+SLG 0.9017 0.8868 0.0149

So slugging clearly has become more important over time. There are many theories why this is the case, such as expansion watered down the quality of pitching, new parks like Coors, steroids, etc., but why is an entirely different topic and I choose to punt. The important fact here is that for the Dodgers in recent years, getting on base holds no great edge over power. Further, the gain in predictive value of other combinations of OBP and SLG over the simple 1 and 1 addition represented by OPS is minimum (except in the 70s).

What about the more involved metrics like runs created (RC), equivalent average (EqA), etc? Well, Baseball Reference doesn't have the component stats (such as HBP and SF) necessary to calculate these metrics, and I was too lazy to look them up elsewhere. I did find that adding stolen bases and caught stealing to the mix would allow you to squeeze a little more predictive power. For the Dodger Stadium era, the maximizing formula is 1.8*OBP+SLG+(1.9*SB-CS)/(AB+BB). In correlation to runs scored, it yields an r squared of 0.9294 -- or nearly 4% better
than OPS alone at predicting runs scored. However, as shown in the chart below, the value and predictive ability of SB and CS has decreased over time.

decade formula r^2 r^2 for OPS delta
60s 1.9*OBP+SLG+(1.9*SB-CS)/(AB+BB) 0.9907 0.9525 0.0382
70s 4.1*OBP+SLG+(1.3*SB-CS)/(AB+BB) 0.9332 0.8904 0.0428
80s 1.3*OBP+SLG+(0.8*SB-CS)/(AB+BB) 0.8679 0.8502 0.0177
90s 1.0*OBP+SLG+(0.4*SB-CS)/(AB+BB) 0.9397 0.9307 0.0090
00s 0.4*OBP+SLG+(-0.4*SB-CS)/(AB+BB) 0.9996 0.9943 0.0053

DS era 1.8*OBP+SLG+(1.9*SB-CS)/(AB+BB) 0.9294 0.8868 0.0426

Hence, for purposes of comparing historical teams and players, employing formulas that incorporate SB and CS stats would be helpful. On the other hand, for evaluating current players or projecting runs scored for current Dodger teams, SB and CS add very little value. Further, OBP and SLG are typically more consistent for a given player from one year to the next than number of SB and CS -- so OPS may be predicted with higher accuracy than SB. Given its ease of use and fairly accurate predictive ability, it's hard to beat OPS.

Good stuff.

The one error I noticed is that I'm pretty sure DePo said that it was 3*OBP in Moneyball.
May well be 3 instead of 4. It's been a while since I read the book and I've seen posters claim things all over the place.

FYI, in the DS era 3*OBP+SLG correlates to runs scored at an r^2 value of .9005, compared to .9017 for 2.4 and .8959 for 4 -- so they are all really close.
according to the numbers, the value of the successful SB over time has decreased while the value of SLG has increased. managerial cause and effect or vice versa?
Smog -- that's the big question. To look into that you'd have to look beyond just Dodger numbers and compare performance of teams with different styles of play. But you'd need a huge sample size because a team with good sluggers will succeed even if running is generally the preferred stategy, and vice versa with a team full of rabbits if power really controlled. Add to that the changing park effects, expansion, and potential game altering effects of steroids, and I think it would be really hard to reach statistically significant results on the why question.
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?