Written by: Cory Ott (@cory5ott)
Follow us on Twitter! @Prospects365
Primer
If you hadn’t already discovered this vital resource, it is officially time to flip over the rock and evolve the methods by which you conduct player analyses by integrating the Statcast Search tool into your repertoire. This tool is published and maintained by Baseball Savant. In its entirety, the tool is the most comprehensive and advantageous source available for acquiring MLB data relevant to fit your custom analysis of a certain pool of players. The creators of BaseballSavant.com deserve an enormous amount of gratitude from every one of its users for providing endless data points for just about every variable that can possibly surface throughout a baseball game. You can customize your search query to include any combination of variables available within the database.
For every active MLB player, there are datasets that exist for every game, every pitch, every outcome, and even every expected outcome. For this first rendition of my new Statcast Search Series, I take a statistical plunge into a few facets of Swing-Miss%, where I use the Statcast Search plot generator to highlight outlying qualified pitchers that may either be over-or-underperforming their Swing-Miss%. With a simple click of the mouse, interactive name labels can be toggled on each plot created with this tool for simplified empirical analyses instead of just staring at numbers all day.
Swing-Miss% vs. xBA
This section delves into the linear relationship between the number of swings-and-misses that a pitcher is able to induce and the expected batting average (xBA) that they possess. Knowing that an obvious relationship between these two variables exists on the field, I decided to dive a bit deeper in search of how strongly they were correlated. As a disclaimer, actual Batting Average (BA) was not used as the dependent variable in this analysis because of the weaker correlation existing between Swing-Miss% and BA. This can be observed in the chart below, as the r2 value for Swing-Miss% and BA is 0.41, compared to xBA having a significantly more correlated r2 = 0.61.
The next plot below shows the correlation between Swing-Miss% and xBA for all qualified pitchers in 2019 (Min. 1000 Pitches). Right off the bat, there are some interesting observations that can be made in regard to pitchers that you may either be high or low on for the 2020 season. It is to be noted that opaqueness of each point represents each pitcher’s actual BA, which can be contrasted to their xBA on the y-axis. Throughout this report, I am going to let the charts tell the majority of the story, while simply guiding you through the process of how to interpret the data that lies in front of you. This will be an unconventional article that focuses on sabermetric research, as my ultimate goal is for each of you to learn more about the process of player evaluation rather than provide my own evaluations for you.
As can be seen near the top of the chart above, the goodness of fit value for this relationship is a very feasible correlation of r2 = 0.61, which can be roughly translated to a hypothesis of sort:
The data shows that approximately 61% of the variance in a pitcher’s xBA could likely be explained by their present ability to force hitters to swing-and-miss more or less often. Therefore, if a pitcher were to increase their Swing-Miss% from year to year, there is theoretically a ~61% chance that you would see a subsequent decrease in their xBA. Correlations of this strength are fairly significant amongst varying metrics throughout baseball.
This information led to my search for pitchers who may be over-or-underperforming their Swing-Miss% rates from season to season, as limiting opposing hitters’ BA will likely lead to a decrease in their resulting ERA and WHIP – as this is important information for anyone who participates in a Roto-style fantasy league, as projecting increases or decreases of these two categories can be imperative to drafting potential breakout candidates in the later rounds. This type of analysis can be made easy by learning to read these helpful charts.
Sticky Data
One way to measure the validity of a relationship between two datasets from year to year – also commonly referred to as measuring the “stickiness” of the data – is to simply compare goodness of fit (r2) values from past seasons to identify if the current relationship remains viable over time or not. In this case, it absolutely does! The figure below shows an aggregated view of Swing-Miss% vs. xBA charts for the years of 2017-2019.
An even stronger correlation existed for this data in 2018, rendering a goodness of fit value of 0.65. Just to reinforce that this relationship has held true through recent seasons, I calculated the 3-year (2017-2019) average r2 value to be 0.63 – thus strengthening support for the hypothesis that a pitcher would expect to experience a decrease in xBA next season if they were improve their ability to induce more swings-and-misses. This is common baseball knowledge, but I wanted to statistically validate just how significant of a relationship truly exists here.
Overperformers – xBA Regression Candidates
Below you will find a derived version of the chart from above, which highlights a group of pitchers who appear to be over-performing their xBA in regard to their Swing-Miss% and actual BA. These are pitchers that could experience negative regression if their swing-miss rates do not improve in 2020. This particular group of pitchers yielded a lower xBA in relation to their Swing-Miss%, while also possessing a proportionally lower actual BA. They possess lower swing-miss rates and may also be over-performing their xBA. I highlight just a few that I think may regress this season.
Underperformers – xBA Reducer Candidates
Similar to the previous section where I highlighted a series of pitchers that appear to be over-performing in terms of their Swing-Miss% and xBA, this chart highlights a group of qualified pitchers who may have been underperforming in regard to these specific metrics. This group of pitchers possess a higher Swing-Miss% and have yielded a higher BA than their concurrent xBA, indicating that a potential improvement in performance could be on the horizon. On the chart above, I highlighted a few potential pitchers that could lower their xBA given an increase in Swing-Miss%. You may take from this what you will!
Swing-Miss% & Whiffs
There seems to be an almost non-existent correlation between total Whiffs and xBA, r2 = 0.06 to be exact, but there are still intriguing observations to be made regarding the proportion of total Whiffs to the number of swings-and-misses that a pitcher may receive throughout a given season. The lack of correlation between Whiffs and xBA is what led to the focus of this article to revolve around Swing-Miss% and xBA. Useful information can still be taken away from this chart, as I have highlighted pitchers whose performance may be suffering due largely in part to possessing low a Swing-Miss%.
As a refresher, “Whiff%” can be defined as the number of swings-and-misses divided by total swings, in contrast to the total number of pitches being the denominator for Swing-Miss%. In a sense, total Whiffs, or Whiff%, can become an even more valuable metric for assessing a pitcher’s true ability to attain swings-and-misses when a batter actually does swing the bat. Let’s take a look at a couple groups of pitchers who may or may not be making the most of the swings-and-misses they induce.
Partial Metrics: Z-Swing-Miss% vs. O-Swing-Miss%
Which is More Significant?
Upon initial investigation, Z-Swing-Miss% actually appears to have a stronger correlation to xBA than O-Swing-Miss% does. By breaking down Swing-Miss% into its partial-metrics, we are able to see if one could prove to have a greater impact on a pitcher’s resulting xBA. Therefore, when assessing pitchers’ xBA from year to year, one could potentially find more value in pitchers who force more swings at pitches thrown inside the zone versus those who gain most of their swings-and-misses on pitches thrown outside the zone. This can be observed in the figure below, which confirms the greater correlation between Z-Swing-Miss% and xBA. Though the goodness of fit value is only r2 = 0.35, it can simply be contrasted proportionally against the value of O-Swing-Miss%.
Leaders in Z-Swing-Miss% (2019)
Top qualified SP’s at forcing In-Zone Swings & Misses
[Z-SwingMiss%]
1. Cole ~ 28.4%
2. Verlander ~ 26.7
3. Giolito ~ 26.3 👀
4. Scherzer ~ 26.0
5. Castillo ~ 25
6. deGrom ~ 24.2
7. Odorizzi ~ 23.7 👀
8. Ray ~ 23.3 👀
9. Lynn/Maeda ~ 22.6 👀
10. Boyd ~ 22.5#FantasyBaseball https://t.co/N1nm29Gs4k— Cory ⚾️tt (@cory5ott) February 29, 2020
Just as Nick Pollack from PitcherList always says, “Aces gonna ace”. Gerrit Cole inevitably sits atop both Z-Swing-Miss% and O-Swing-Miss% leaderboards, imagine that. Luis Castillo and his magical changeup also appear within the Top 5 of both lists, supporting the notion that he has proven to become elite at forcing swings-and-misses on all pitches thrown. This is further confirmation that Castillo has been evolving into the true Ace that we all wanted him to be. Pitchers should strive to find themselves at the top of each Swing-Miss% list. On the flip side, there are some intriguing names on this list that could potentially expect to see an uptick in performance in 2020, a few being Giolito, Odorizzi, Ray, Maeda, and Boyd. If these pitchers can improve their Z-Swing-Miss% throughout this upcoming season, look for their xBA to be reduced even further.
Leaders in O-Swing-Miss% (2019)
Top qualified SP’s at forcing Out of Zone Swings & Misses
[O-SwingMiss%]
1. Cole ~ 56.9%
2. Bieber ~ 55.1 👀
3. Corbin ~ 53.4
4. Marquez ~ 53.0 👀
5. Castillo ~ 52.3
6. Morton ~ 52.2
7. Bauer ~ 52
8. Gray ~ 51.4
9. Boyd ~ 50.8
10. Darvish/Giolito ~ 50.7 👀#FantasyBaseball— Cory ⚾️tt (@cory5ott) February 29, 2020
No surprise, Gerrit Cole leads the pack yet again. Given the inherent lack of correlation between O-Swing-Miss% and xBA, I did have a few initial thoughts spark from this list. Lucas Giolito appearing on both lists is confirmation to me that he has optimized his pitch mix to match his strengths. The biggest standout here is surely Shane Bieber. It is no surprise to me that Bieber didn’t find himself within the Top 10 for Z-Swing-Miss%, but the fact that he sits at #2 for obtaining swings-and-misses on pitches thrown outside of the zone should be a very encouraging sign for pro-Bieber individuals, myself being one of them. I could go down this rabbit hole all day, but to keep it short, if Bieber can lower his Zone% and improve upon the command of his below-par fastball in and around the shadow zone, then we may see him take strides toward truly becoming elite. Yes, I said it. Many will read that last statement, refer back to his batted ball profile, then proceed to laugh in my face – but I believe that Bieber is young enough to make the necessary adjustments with his command, while also being cradled within a premier system for developing pitchers..
There are a couple of other names to note, as German Marquez, Sonny Gray and Trevor Bauer also found themselves as part of the elite O-Swing-Miss% club. Marquez is an absolute unicorn with those crazy home-away splits, but to think that if he can improve upon the number of swings-and-misses he receives on pitches inside the zone to compliment his O-Swing-Miss%, that’s a very scary thought. If it ever appears that he is near finding his way out of Coors for good, buy all of the shares that you can. It is also worth mentioning that Matt Boyd made the top-10 on both lists, displaying that he does have the swing-and-miss stuff to potentially jump a pitching tier if his command and sequencing can be improved upon. Pitchers that made the top-10 for either list should be closely watched throughout the upcoming season.
It Takes Two to Tango
In summary, a pitcher’s Swing-Miss% could potentially be more indicative of their resulting xBA than other metrics. The higher the BA that a pitcher allows is likely followed by an increase in their ERA – as this is a common category metric that we all can improve at projecting. When analyzing players that you desire to draft, start to take a deeper dive into the ratio between their In-Zone and Out-of-Zone swing-and-miss rates. This could help influence difficult draft decisions in the right direction. Research into other potential metrics that may also cause a significant fluctuation in the level of variance of a pitcher’s xBA is next on the docket. As is a prospective metric that is based upon the value of subtracting a pitcher’s O-Swing-Miss% from their Z-Swing-Miss%. More on that in the near future!
The entire analysis for this article was conducted using data obtained via the Statcast Search tool on BaseballSavant.com. If there is one analytics tool that I recommend you learn to operate, it would be this one. All in all, the best pitchers in the league possess a great-to-elite Swing-Miss%, while further research into how this rate is split between their O-Swing-Miss% and Z-Swing-Miss% partial-metrics may boost your ability to more accurately analyze pitchers. To be elite, one must possess the capability of getting just as many swings-and-misses on pitches thrown inside the zone as outside the zone, thus sparking my curiosity of developing a differential metric to track through the years. It is to be noted that this research is preliminary, as there is more supporting material to come in the future as part of my Statcast Search Series.
Thank you very much for reading!
Follow new P365 staff writer Cory Ott on Twitter! @cory5ott
Follow us on Twitter! @Prospects365
Note: All statistical datasets were extracted from the Statcast Search Tool on BaseballSavant.com. All included images, visuals, and graphics were extracted from BaseballSavant.com. Data for scatter plot portraying Z-Swing-Miss% / O-Swing-Miss% vs. xBA were extracted from BaseballSavant.com, and was projected for visualization in R-Studio. Ask for the code if you want computational confirmation. All data tables were created by myself, while utilizing data extracted from the above sources.