Written by: Estee Rivera (@esteerivera42)
Follow us on Twitter! @Prospects365
Intuition Becomes Reality
Now more than ever, this a topic that needs to be brought to the limelight. If you have yet to read the first two installments of this series on labor dynamics, primarily exploitation, in Major League Baseball, then I highly recommend you take the time to read them here and here.
For a while, I tippy-toed around the heart of this topic: race. It is not a mere coincidence that the exploitation we observe throughout baseball is of a minority class. That is why for the rest of this series, and my future work, I will approach the race issue in MLB head on, particularly with regards to Latinx players. In graduate school, I studied economic policy and history related to Latin America in general. The environment that Latino baseball players face is very similar to the Latin American experience in the United States.
Latinx culture in MLB has been the product of discrimination since the assimilation of Latin Americans into professional baseball. Latin Americans have a wide variance in physical features. In the early stages of MLB, light-skinned Latinos were able to break into the league before Jackie Robinson broke the black-white color barrier. Owners constructed their own racial categories as a strategy to keep African Americans out of the league. Light-skinned Latinos could easily pass as a white. This was attractive to owners who were tried to appeal to their white fans. In economics, we call this customer discrimination.
In 1910, the Cincinnati Reds had two talented Cuban players, Rafael Almeida and Armando Marsans. Because of their self-proclaimed “pure Caucasian blood” they were great assets to the Reds. Of course, when they asked for a pay raise a few years into their careers, a team official said, “we did not hire a pair of brown-skinned islanders to pay them Honus Wagner-money.” As Michael Simon Johnson and Daisy Rosario noted when discussing the story of these two players in detail in a trip to the National Baseball Hall of Fame, Latinos were welcomed to the league primarily as non-black, rather than as fellow whites.
Fast forward to today. Discrimination and exploitation have adapted. Unlike the 1910s, they have found other ways to do it and euphemistically discuss it. Asset valuation and other modern analytical methods have led to an obsession with “surplus.” As I have highlighted before, the league does a phenomenal job at scraping up all the surplus value they can get out of Latinos. Through a combination of below poverty-level minor league wages and tantalizing long-term contract offers early in players’ careers, the need to extract surplus has become standard practice in MLB. And it takes a lot to prove that these activities are exploitative against a certain group.
From a qualitative perspective, the presence of racial discrimination in MLB is indisputable. Talented Latino players like Ronald Acuña Jr. and Eloy Jiménez have been the face of the growing trend of players, many of them Latino, accepting long-term contracts in the early stages of their careers. Facts that I have previously addressed like criticizing players for bat-flips, jewelry, hair and other aspects of individualism all contribute in other ways to this reality.
From a quantitative perspective, the argument is cloudier, but plausible. When conducting econometric analysis on discrimination in the labor market, an analyst can find the presence of a pay gap (or not) by controlling for skills, education, and other personal characteristics. But just because I find that Group X experiences a 20% penalty in wages in comparison to Group Y, is not enough to conclude that discrimination accounts for 20% of the difference in wage. We need a few things to be confident race plays a role. If race is a significant explanatory variable for wages, then we can be more certain that it does matter.
In theory, pro sports should be an “equal pay for equal play” market. If two players reach free agency at the same point in their careers and both have 20 career WAR up until that point, then we would generally expect them to have similar contract offers in free agency. In orthodox economic theory, a key assumption is that there is perfect competition in the free market. This means, the market will pay the player what he deserves, conditional on the presence of perfect information and equal opportunities. This very basic economic assumption was the foundation of economic theory for decades, despite its idealistic, and obviously inaccurate, depiction of reality.
Baseball should be an environment conducive to perfect competition because we can accurately measure overall performance. Relative to general U.S. labor market studies, baseball has done a great job at measuring overall performance/value. There is no such thing as WAR for teachers, accountants, etc.
Trust me, I know it’s farfetched to assume perfect competition in MLB. The very presence of “small” and “large” market teams makes the whole idea a little ludicrous. And of course, there is the “teams know more about their own players than other teams” argument. Ya know, like when AJ Preller traded Colin Rea for Luis Castillo knowing very well his elbow was about to burst, only for the Marlins to realize they had been shorted (sorry Padres fans but one does not get old). Teams know their own player’s background, work ethic, character and their medical history better than everyone else. The one exception is when teams were confused about the Astros calling about “random” prospects and players.
Information and competition are not perfect, but this does not change our expectation that equally performing players with similar characteristics should make similar salaries. I wrote about the finer details of the economics of discrimination in MLB in detail in my M.S thesis. If you want access to it, then DM me on Twitter. Otherwise, I will do a quick run through of how I tried to quantify to whether Latinos are discriminated against in terms of wages in recent years in MLB.
When I conducted my research on all the prior work in the field, I was shocked to see that it has yet to be updated to include the advanced techniques of measuring performance that have gained popularity in recent years. We went from a batting average and RBI-obsessed culture to a wRC+, WAR and OPS culture. The tone of the academic research on discrimination up through 2010 suggested they had proved discrimination in wages had disappeared in MLB. It was frequent in the decades following Jackie Robinson but had been phased out when free agency was implemented. But innovation does not stop. New strategies lead to new outcomes.
But I recalled an amazing article that I read by Matt Swartz at the Hardball Times in 2014 (which I hope comes back ASAP). Swartz did an in-depth study of racial earnings differentials among several different player demographics between 2006 and 2013. This article was the inspiration for me in the last few years to do conduct this type of research.
The main takeaway is that on a Cost Per WAR ($MM) basis, Latino players who signed extensions over a year away from free agency were paid $4.9 million per WAR. When they signed within a year away from free agency, they were paid $6.8 million per WAR. In contrast, white players who signed over a year away from free agency were paid $7.9 million per WAR and $7.8 million per WAR when signing within a year away from free agency. This discrepancy was enough for me to get pissed off and see what else lies beneath the surface of wage differentials between Latino and white players.
Thanks to Cot’s Baseball Contracts, I had no issues getting access to all player’s contract details at the beginning of the season. I chose the beginning of the season and not the end. I did not want incentives and bonuses included in the analysis because those are not typically the same across all contracts. To cater to all crowds, I included both traditional and advanced analytical statistics in the analysis. It is possible teams still value all those numbers in some form, so my work reflects that.
I used a technique called quantile regression. Quantile regression lets us observe differences (wage differentials) between groups at different percentiles. We want to separate our analysis between low, middle and upper-tier earners, so that we know where (if any) discrimination exists. It comes down to this; I can tell whether a premium (or penalty) exists if I run a multivariable regression. But quantile regression will tell me how that coefficient (level of premium or penalty) varies across percentiles of earners. Are the Ronald Acuñas of the world experiencing discrimination? Or is it the Ronald Torreyeses?
Quantile regression has been used before in analysis of MLB labor markets, but only using traditional statistics. Essentially, I wanted to combine the great work from Swartz with the previous work in academia. There is certainly a benefit to bridging the gap between academia and advanced analytical baseball research. The audiences are wholly different but could benefit from understanding each other’s work.
WAR is better than anything else we have right now. WAR is also not perfect. It’s irksome when members of the community use WAR as their “absolute truth” statistic. Every stat, even the best we have right now, needs context. No one stat tells the ENTIRE story. I understand that is the nature of WAR, but perfection means perfection in all of its inputs and I cannot get on board with all of WAR’s inputs being perfect. When we know that OAA is a little better than UZR, how can we say that hitter fWAR should be the hierarchal way we analyze players? On the pitcher’s side, when we have analysts finding new and interesting insights about pitch quality (from Ethan Moore), how can we say pitcher fWAR is the absolute truth? Decimal differences in WAR cannot possibly mean that one player is without a doubt better than another. But before I get too off base, let us go back to the topic at hand.
WAR is the main X variable in my analysis, but I also include several other statistics. Throughout the analysis, WAR is almost always a significant estimator of salary, the Y variable (this is good). The goal was to come up with a model that tells us whether discrimination exists, not whether WAR is a significant predictor of salary, because we don’t need a regression to do that.
The expectation is that when we compare players with similar salaries, they should also have similar performance. Quantile regression lets us identify if players experience discrimination when comparing them to other players, assuming they have similar levels of performance and other characteristics, and they are in the same relative quantile among their peers. A conditional quantile regression lets me control for performance and other characteristics like service time and position.
The results in the table below present the coefficients and standard errors of the Latino race dummy variable present in each model for position players. The baseline is always the white player throughout all models. This means I am comparing Latino players relative to white players. A negative coefficient indicates a wage penalty and a positive coefficient indicates a wage premium. I personally labeled each player’s race/ethnicity based off of their name, country of origin and roster photo. In order to qualify to be in the sample a player needs to have had at least 300 plate appearances in any season between 2017-2019. I’ll leave my analysis of pitchers for another time.
The player must additionally have had six years of service time or more. This breaks the sample up into players who already reached free agency, and players with over six years of service that signed extensions through some free agent years. The performance variables vary from model to model. I am not presenting the results of each variable in this article—but I have it if you want it and would love to share it!
Source: Author’s Calculations. Coefficients are log point differences which is approximately equal to percentage change.
There is plenty of work that has been done on the linearity of wages in baseball. If you play better, you get a bigger payday. This is definitely true, but that scale varies between white and non-white players. When we focus on the 50th quantile and up, there is a surprising trend. The highest paid Latino players experience a wage penalty around 20ish% in comparison to white players. There may not be significance across the board, but the consistency in the signs of the coefficients adds robustness to the results. To put it simply, Latinos in 50th quantile/percentile of earners and up within the race have experienced a wage penalty relative to similarly earning white players in the 50th quantile and above when controlling for performance.
The model labeled Race*WAR is slightly different from the rest. I interacted the race variable with WAR. The benefit of this interaction in the model allows us to see if Latino players have the same “returns to WAR” that white players have. There is pretty convincing evidence that suggests otherwise. In the 75th and 90th quantiles, Latino players have around 14% fewer returns to WAR than their white counterparts. So yeah, the returns are still positive as WAR increases, but the slope of that scale is smaller for Latino players.
The definition of exploitation is “the action or fact of treating someone unfairly in order to benefit from their work.” In economics, the historical definition of exploitation is, paying a worker a wage that is not in line with the value of their productivity. We have a pretty good idea of how valuable baseball players are and how much money they make. And now, we have statistics that shows you whether there are significant differences in their salaries. The story makes sense and data is not a means to an end, rather, it starts the conversation. We know what the data says, and we have the stories in international scouting, minor leagues, and major leagues that go directly in line with the data. Teams want to get the most out of their best, and most valuable, players.
This brings us back to what Swartz proposed. The question is, are players who signed extensions before they hit free agency being systematically underpaid for their performance? This is the heart of the analysis. I cannot make these conclusions concretely without more hard-statistical evidence, but it makes sense as of now. To go further, I would need to break the sample into players who entered free agency versus players who did not, so that will be left up to you and my future research to decide.
I mean, the story makes too much sense, right? Sometimes the anecdotes add up and tell a bigger truth. For years, we have seen Latino players take deals early in their careers to lock up that big bag right away. At what point can we say this is the result of exploitation? The average to below average performers do not earn early extensions. The treatment of Latino players from the minors to the majors has always been sketchy, like I’ve spelled out in my previous work, so if I told you that even the elite performing players were underpaid would you be that surprised?
The story tracks. I know it will take more to prove them to be true, but this is not the last you will hear from me. If you have read this entire series, then I really appreciate you staying tuned through the last few months. I thought this would be the final installment—but anything can happen in this league!
Follow P365 MLB Analyst Estee Rivera on Twitter! @esteerivera42
Follow us on Twitter! @Prospects365
Featured image courtesy of photographer Patrick Gorski and USA Today Sports