Looking at Statcast’s leaderboard of the hardest-hitting players in baseball makes it all seem so easy: simply acquire the guys who smash the ball the hardest, and then profit!
It looks almost foolproof - only Kyle Schwarber and Giancarlo Stanton put up below-average WAR figures from this group of 15 hitters, and most of the obvious best hitters in the league (Judge, Acuña, Ohtani, Seager, Alvarez, Soto) are prominently featured:
Digging into the Data
After that sneak peek at the top of the list, I was convinced that plotting avg_exit_velocity
against WAR
would show a strong, positive relationship: how hard a player hits would dictate how valuable of a player they are, all else equal.
However, after building the scatterplot below, we see a surprising trend emerge - the relationship between exit_velocity
and WAR
is nowhere near as strong as expected! Instead of dots on an slope, it’s more of an upwards tilting cloud with most of the best players in the top right quadrant, but plenty of exceptions to the rule. Qualitatively, this results in an R² of only 0.12, and an R of 0.34 -> a fairly weak linear relationship.
After spending some time thinking about what might be driving this trend, I arrived at a few plausible explanations:
Ceteris Paribus Assumption of Linear Regression
“All else equal” is awfully convenient for running regressions on two numbers, but in the case of baseball players, the “all else equal” assumption doesn’t make a lot of sense…
- Variable importance of power by position: many positions don’t necessarily require a player to be a slugger to be valuable contributor to the team - think catcher, second base, and shortstop. Comparing these players to corner outfielders naturally leads to a poor comparison, as the corner outfielder usually needs to hit the ball hard to create value for their team, while the shortstop (like Andrés Giménez) may create huge value with the glove, so softer contact can be a natural trade-off
- Contact vs Power: while hitting the ball very hard is probably the single biggest key to being a productive hitter, it does seem possible to be a an asset solely from contact skills. I find Jeff McNeil and Jose Altuve to be two great examples of this: players who have won batting titles and World Series, routinely hitting for high averages and accruing strong WAR totals despite lacking great power. Conversely, there are also players who absolutely smash the baseball who are not particularly good baseball players, in the great tradition of sluggers like Adam Dunn, Dave Kingman, and Chris Davis. In the 2023 season, we saw Joc Pederson, Kyle Schwarber, and Giancarlo Stanton fall into this bucket, with low WAR production largely due to high strikeout rates and low contact rates
Survivorship Bias
Most players who don’t hit the ball hard don’t stick in the majors long enough to become a qualified batter… and because WAR is a counting measure, having a meaningful and similar amount of volume across the cohort of subjects is critical. Hence, the bottom left quadrant on our chart is probably “light” vs real-life production by players of that quality. I expect that including those hitters would probably create a tighter fit in our regression of avg_exit_velocity
against WAR
.
A Better Way?
Ultimately, the concerns above have made me realize a better way of conducting this analysis might be to analyze each position group in tranches (to account for differences in defense and athleticism), change the player value metric from WAR
to something like wRC+
so that it is a rate statistic instead of a counting measure, and finally, remove the qualified batter cutoff so that our denominator of batters is larger and less impacted by survivorship bias.
However, I thought this visualization and analysis was still pretty interesting to read as-is, so I figured I would simply stop worrying about it and hit publish!