How Likely is it that Someone will Hit .400 in a Shortened MLB Season?

Over Memorial Day weekend, I read a great article by The Athletic musing upon what might happen if Major League baseball decides to play a half season of 82 games. One section in particular caught my eye - “What if somebody hits .400?” The author does a great job of delving into what that might mean for baseball, and whether or not it might be feasible. The article also links to an amazing writeup from STATS, which gets even deeper into the numbers of it all, with lots of historical streaks to compare to.

As I kept thinking about the question, I realized that this might be something I could take a crack at myself! If we could measure the probability that a given batter were to hit .400 over a full 162 game season, then we could compare that to the probability of hitting .400 over a shortened 82 game season. Given the increased variance that a smaller sample size tends to create, it intuitively makes sense that a shortened season makes it more likely for a high-variance outcome, such as batting 0.400 for the season, to take place.

Approach

Simulate an At-Bat

To model the variance of outcomes across a host of seasons, I ran a large Monte Carlo simulation for each user. This simulation would start by simulating a single at bat for a given player. It would “roll” a number from 0 to 1, and if that number was less than that player’s batting average, they would “get a hit” - if greater than the batting average, no hit for you.

gets_hit <- function(avg){
  return(avg >= runif(1))
}

# See whether or not a .307 hitter gets a hit in a single AB
gets_hit(0.307)

To determine the input averages in a straightforward way, I took all batting title qualified hitters from 2019, and attributed to them the greater of their 2019 batting average and their combined batting average from 2016 to 2019, the results of which can be seen below:

Historical Batting Averages

Simulate a Season

Now armed with a function to determine the success of a single at-bat, I created another function to simulate a season, returning the batter’s simulated average, given their historical average and number of at-bats. For this model, I’m using 550 at-bats as a proxy for a full season, and 275 at-bats for a half season, which seems reasonable based on recent data and past research.

bats_season <- function(at_bats, avg){
  return(sum(replicate(at_bats, gets_hit(avg))) / at_bats)    
}

# Mike Trout hit 0.307 last season
bats_season(275, 0.307)

Simulate a Ton of Seasons

Finally, now that we can simulate a single season, why not simulate a bunch of them? The function below returns a dataframe of the player’s name and a listing of simulated batting averages, for however many seasons we’d like.

sim_seasons <- function(player, seasons, at_bats, avg){
  sim <- replicate(seasons, bats_season(at_bats, avg))
  return(enframe(sim) %>% mutate(name = player))
}

# Simulate ten thousand half seasons for Mike Trout
sim_seasons("Mike Trout", 10000, 275, 0.307)

One important thing to note with this approach is that the end result of the simulations will end up approximating the inputs it is fed! For example, given that Mike Trout’s batting average this past season was .307, we would expect to see, over many seasons of simulations, his long-run simulated average to also be .307.

That’s all okay though - what we are really looking to see is how much more extreme the predictions of Mike Trout’s batting average are when computed over an 82 game window, versus a 162 game window.

Results

Full Season Simulation of Batting Average

Half Season Simulation of Batting Average

Half Season Simulation - Batting 400

The Players

With the exciting news that it is indeed possible to hit .400, albeit not very likely, over the course of a half season, I wanted to check out the numbers and see who exactly is predicted to top that mark, and at what rate they might be expected to do it.

Surely enough, the rank-order of this list precisely matches our input data of batting averages, led by Tim Anderson, who hit .335 in the 2019 season. It is heavily driven by the top six batters, who have somewhere between a 0.32% and 1.43% chance of hitting .400 in a given shortened season, according to our simulations.

Likelihood of Batting 400 by Player

Given that the performance of each player is more or less independent, we are free to combine the likelihoods of individual players reaching .400 to get the likelihood that any single player across baseball might reach .400. After doing so, we end up with a 6.02% chance that a batter will hit .400 in a shortened season!

It wouldn’t be quite the same as Ted William’s 1941 season, the last to reach .400 and where he hit .406, but it would be unbelievably fun to watch. Let’s hope we get baseball back soon to see if it happens!