Regression Towards the Mean Still Happened in 2020

Written by: Justin Choi (@justinochoi)

Follow us on Twitter! @Prospects365

There’s so much debate in baseball circles right now—even as I type this—about how much the shortened 2020 season mattered. So much so, in fact, that I absolutely cannot wait for Spring Training. I’m sick of dwelling on ambiguous results. Maybe you are, too. 

But, there is one thing that is worth going over as we bide our time. It’s the oft-used term ‘regression towards the mean,’ or ‘regression to the mean,’ or just plain ole ‘regression.’ The idea is simple: In a new season, over-performers and under-performers from the previous year will drift back to their true talent level, keeping our expectations in check. That late-August call-up who posted a 160 wRC+ until season’s end? Better not be the fool who overpays for him in your fantasy league the following preseason. It’s a traditional unlike any other. 

Analysts—both amateurs and professionals—are using regression towards the mean to navigate the 2020 season. The consensus, it seems, is that whatever breakouts or slumps occurred are irrelevant, and players will return to normal (i.e. regress) in 2021. You can throw the whole season away. Nothing mattered. 

The first part seems correct –– unless backed by a significant change, we should be skeptical of breakouts. The second part, maybe not so much. Because here’s an interesting question: What if those ‘slumps’ are, in fact, regressions towards players’ means? 

Or, here’s another way to put it: If we’re sure that regression will do its job in 2021, on what basis are we refuting its existence in 2020? 

So I conducted a simple experiment. To see if regression towards the mean occurred between 2019 and 2020, I first gathered a sample of hitters who had at least 100 plate appearances in 2019. Among those hitters, I kept the 352 who also recorded at least 40 plate appearances in the 2020. 

Why 40, not 100? Had I stuck with the 100 PA threshold, there’d be a bit of selection bias. Hitters who recorded 100 PA in both seasons had an average wRC+ of 105, or 5% higher than the league average. Divide 162 by 60, and you get 2.7 –– applying this same idea to 100 PA produces a number that’s close enough to 40, a benchmark that’s searchable on FanGraphs. 

Next, I divided the 352 hitters into 3 groups: (1) those with a wRC+ less than 90, (2) those with a wRC+ between 90 and 110, and (3) those with a wRC+ greater than 110. These groups represent below-average hitters, roughly average hitters, and above-average hitters, respectively. If regression towards the mean occurred, we’d expect Group 1’s average wRC+ to have increased from 2019 to 2020. Group 3’s would have decreased. And Group 2 would experience minimal—if any—change: 

As shown in the graph, below-average hitters in 2019 experienced a rebound, even in a shortened season. Average hitters had a bigger change than I initially expected, but of the three groups, they were the least volatile year-to-year. The 110> wRC+ group experienced a crash of more than 19 wRC+ points, reflecting the struggles of stars like Christian Yelich and J.D. Martinez. That, perhaps, we can chalk up to sample size woes. 

Still, even though it’s not crystal clear, you can clearly see the effects of regression taking place. And it required not-so-many plate appearances. But is this similar to what happened between 2018 and 2019, two normal seasons? I went through the same process outlined above, then graphed the results: 

The changes are less extreme when there are more games played, but overall, the shape of regression is the same for both ranges of years. 

What about pitchers? Deciding on an IP benchmark was difficult –– there isn’t an obvious one like 100 PA –– but I ultimately settled on 50 IP, which gave me a sample size similar to the one used for hitters. And prorated, 50 becomes 20. The metric of choice was ERA-, but here’s the confusing part: Unlike wRC+, a lower ERA- is better. Therefore, the below-average crowd had an ERA- of 110 or higher, not 90 or lower. Don’t forget! 

So here’s 2019 to 2020: 

Followed by 2018 to 2019:

A few interesting things to note here. First, both graphs are nearly identical, despite having different IP thresholds. Does that mean regression towards the mean occurs more quickly and reliably for pitchers? Maybe –– after all, strikeout rate (K%) stabilizes faster than almost any other metric, and it’s also indicative of a pitcher’s skill. 

Second, the effects of regression towards the mean appear more pronounced. For example, below-average pitchers seem to experience a bigger rebound in performance compared to below-average hitters; twice did they improve by 20%, whereas hitters never hit that mark. Also, average pitchers are amazingly consistent. I’m reminded of someone like Mike Leake, who you know what to expect from despite putting up middling numbers each season, which is great. Perennial 2 WAR pitchers don’t grow on trees. 

About that first point –– the rebound is perhaps bigger because ERA is a volatile stat, even when factoring in league-wide and park environments in the form of ERA-. So I took a look at FIP-, which is just FIP adjusted for the same factors: 

As predicted, the changes are less drastic for all groups due to the nature of FIP, but again, the overall trends are all still there. More importantly, all our findings conclude that regression towards the mean happened in 2020, much like it happened in 2019, much like it happened for countless years before, in silence, until we noticed. 

One minor caveat, though: The sample sizes for pitchers were much smaller than that for hitters, which might have skewed the results. 116 below-average hitters had at least 100 PA in 2018 and 2019, while that number is just 48 for pitchers. It’s an oversight on my part –– I set the threshold to 50 IP because it gave me a similar initial sample, but after filtering out players, the return was much smaller. This could reflect the fact that pitchers are more prone to injuries (which I don’t know), or that they are judged solely on their ability to prevent runs (ERA-, FIP-), unlike hitters who are often kept around for their defensive capabilities despite mediocre offensive output (wRC+). 

Regardless, there’s more than enough evidence to show that regression towards the mean was real, even in a real season. This admittedly doesn’t clear up the fog because it’s impossible to point out who was and wasn’t affected by it. But there is a clear takeaway: The shortened season mattered less, not zero. Cody Bellinger should exceed his sprint season mark of 114 wRC+ in 2021, but within that disappointing outcome lies a movement towards his true talent level. That may seem obvious to you. It’s just nice to have the numbers confirm it. 

Follow P365 MLB Analyst Justin Choi on Twitter! @justinochoi

Follow us on Twitter! @Prospects365

All statistics courtesy of FanGraphs 

Featured image courtesy of photographer Joe Robbins and Getty Images

Leave a Reply