Could it be that sluggers like Nelson Cruz are undervalued? (TheSportsPost) 
Introduction:
This study investigates the role
that certain statistics play in determining the salaries of major league
position players, with the overarching goal of determining whether some common
measures of hitter performance are being valued properly and if there has been
a change over time in how they influence salary. Michael Lewis’s 2003 book Moneyball revealed that for a long time,
teams were not valuing offensive performance properly, resulting in a market
failure. Smarter teams, like Billy
Beane’s Oakland A’s and Theo Epstein’s Boston Red Sox, were thus able to gain a
competitive edge by acquiring offensive talent cheaply on the open market. As
the median cost of purchasing a free agent continues to escalate and the levels
of runscoring keep dwindling, it’s becoming increasingly important that teams
allocate their finite financial resources wisely in order to stay competitive
(Gaines, 2014). The Rays, A’s, and Pirates have shown it possible for small
market teams to compete with wealthier teams through efficient payroll
utilization, whereas large market teams like the Phillies, Mets, and Yankees
have underperformed lately because they overpaid for poor players. The recent
paucity of offensive production and available free agents makes accurate
assessments of hitting talent even more imperative, as there’s simply not
enough hitting to go around.
By
looking at the effects of several traditional and newer statistics on salary,
this study hopes to reveal market inefficiencies that could be taken advantage
of by a general manager. The hypothesis is that sabermetrics that came into
vogue more recently, such as Runs Created, wOBA, and OPS, will be undervalued
whereas traditional metrics like the Triple Crown stats (batting average, home
runs, and runs batted in) will be overvalued. Therefore, players who excel in
sabermetric categories but have less impressive traditional statistics will be
undervalued whereas players with strong traditional statistics but weak
peripherals may be overvalued.
First,
this paper will provide background literature comprised of research studies,
economic papers, and expert analysis on some common popular and effective
measures of hitting performance that were incorporated into the model. Then the
variables and statistics that make up the model will be presented, describing
what each one means and why it is important. After that the results of this
study will be presented based on the 10 most recent years of data (20042013)
from the Lahman baseball database, covering the postMoneyball era as well as the poststeroid era, which saw more
stable offensive production following the implementation of leaguewide PED
testing and penalties in 2004. These results will be compared to those from the
years 1985 through 2003 to observe the Moneyball
effect, seeing what changes, if any, have come about in how offensive
performance is valued. This analysis will help determine what inefficiencies
there are, if any, in the labor market for position players that a general
manager could exploit. This study will then identify which statistics are not
being valued properly and explain why, given their relationship with team
success, they should be valued differently. Lastly, this paper expects t to
confirm the theory that aggregation is inappropriate when analyzing position
players because the structure of salary rewards varies significantly between
the four main groups of position players: infielders, outfielders, catchers,
and designated hitters (Hakes and Sauer, 2006). Therefore, it is likely that
the author will need to find disaggregate models for each type of position
player as well.
Moneyball: an important book and great movie 
Literature
Review
For
a variety of reasons, baseball has become offensestarved in recent years. Runs
per game have decreased steadily since 2006, falling from 4.86 per game per
team that year to 4.07 in 2014, the lowest rate since 1969. In 2014 there
wasn’t a single team that averaged 4.86 runs per game. There are many factors responsible
for this downward trend. One is that hitters as a group are striking out more
than ever due to increased acceptance of strikeoutprone hitters, a betterregulated
(and lower) strike zone, and harderthrowing pitchers. In 2006, 16.8 percent of
plate appearances ended in a strikeout; in 2014 that figure reached 20.4
percent. That means batters are putting the ball in play less frequently, and
when they do, often it’s into the teeth of a defensive shift, hit towards
betterpositioned fielders[1].
Fewer balls in play translate to fewer hits, and fewer hits mean fewer home
runs. Unsurprisingly, power is way down; there were 1,200 fewer home runs hit
in 2014 compared to 2006, a 22.3 percent decrease that coincides with stricter
performance enhancing drug penalties (BaseballReference, 2014). Pitchers also
have the advantage of advanced scouting reports and video preparation to
exploit batters’ weaknesses. All of these developments have combined to
suffocate offense, resulting in the sport’s lowest scoring period since the
pitchingdominated 1960s (a time commonly referred to as the second Deadball
Era). Pitching and defense rule the game, meaning talented hitters are harder
to come by these days (Schaal, 2014).
Because
quality hitting is at such a premium, available bats on the free agent market
are in high demand. Thanks to baseball’s recent influx of TV money and
exploding revenues across the sport, teams are becoming richer than ever
before. More teams can afford to spend money on free agents, thereby increasing
the demand for them (Chen, 2012). In the 20132014 offseason alone, the 30 MLB
teams spent around $2.4 billion dollars on roughly 140 free agents, over 92
percent of which was guaranteed money (Gaines, 2014). But because teams have
taken to locking up their young stars with belowmarketrate contract
extensions (thereby preventing them from reaching the open market in their
primes), combined with aging curves reverting to normal in the testing era[2],
free agent talent is becoming increasingly scarce. With a smaller talent pool
to choose from and more clubs actively pursuing free agents, teams—especially
small market teams with tight budgets—need to be smarter than ever before about
identifying which players are worth adding to their payroll (Sawchik, 2013).
The
best way to do that is through statistical analysis, once an innovative idea
now commonplace throughout the sport in large part because of Michael Lewis’s
landmark book Moneyball, which
popularized sabermetrics by detailing how the Oakland A’s used them to build
juggernaut rosters on shoestring budgets (Hakes and Sauer, 2006). While there
are numerous available statistics to measure hitting talent, some are more
correlated with team success than others. For instance, studies have shown that
an efficient labor market would reward players for their power and onbase
skills, the ones most correlated with run production and, by extension, winning.
In their 2006 study, Jahn Hakes and Raymond Sauer found that looking at a
team’s onbase and slugging percentages together, compared to those of its
opponent, explains 88.5 percent of variation in winning percentage (Hakes and
Sauer, 2006). Similarly, a 2005 study conducted by Adam Houser observed strong
correlations between winning and OBP and SLG. The best statistic, then, would measure
both of these skills. Three that do are OPS (OnBase Plus Slugging), secondary
average (SecA), and wOBA (weighted OnBase Average), all relatively new
sabermetrics strongly related to team success (Baseball and
Sabermetrics, 2013). wOBA in particular has curried favor with the sabermetric
community for its reputation as “a solid, contextneutral statistic that values
hitting properly” (Cameron, 2008). Because OPS, SecA, and wOBA combine power
and onbase skills, both of which are heavily correlated with winning, they
should drive salary and are thus present in this model.
Breaking
down each skill further, the beneficial effects of onbase percentage are
welldocumented in Moneyball as well
as the Hakes and Sauer paper. As the number of baserunners increases, the run
expectancy for that inning increases as well because there are more scoring
opportunities. This model measures a hitter’s batting
eye using walk to strikeout ratio and expects it to be significant. Likewise,
hitting for power also fuels runscoring, for three of the five events with the
highest run expectancy are extrabase hits (Tango, 2010). Intuitively this
makes sense, as a walk can only score a run when the bases or loaded and for a
single to score a run there almost always has to be a runner in scoring
position. A home run, on the other hand, is the most valuable outcome of a
plate appearance because it scores the batter and any runners on base. Extra
base hits are more likely to score baserunners regardless of which base they’re
on and put the batter in scoring position, improving the likelihood that he
scores as well. One would expect power (represented in this model by home runs
and Isolated Power) to be properly valued as sluggers have historically been
wellrewarded dating back to Babe Ruth, who in 1931 famously earned a higher
salary than President Hoover[3].
On the other hand, extra base hits that don’t go over the wall tend to be
underrated because they aren’t as exciting as home runs, so it’s possible that
slugging could be somewhat undervalued (Booth, 2013).
Another
catchall statistic is Runs Created, invented in 1979 by Bill James, the “father
of sabermetrics.” Runs Created estimates the number of runs a hitter
contributes to his team. It has been revised and updated over the years, with
the “Technical” version serving as the most widely used iteration because it
accounts for all basic, easily available offensive statistics. Runs Created has
been shown to be an accurate measure of an individual’s offensive contribution
because when used with aggregated team totals, the formula closely approximates
(within five percent) how many runs the team actually scores (Appelman, 2008). According
to Beyond the Batting Average, it “It
is useful to look at team RC because it is often a better reflection of a
team’s offensive ability and a better predictor of future performance than runs
scored” (Panas 39). It is also a key component in James’s popular Win Shares
formula and is used by Baseball
Prospectus to compute its Equivalent Average (EqA) sabermetric (SABR, 2014).
A recent study found that Runs Created predicts a team’s success most
effectively (Baseball and Sabermetrics, 2013). It is widely considered to be
one of the most accurate measures of offensive contributions.
These
statistics do a much better job reflecting a player’s performance than
traditional metrics, which individually fail to say much about a hitter’s skill
set. For example, the three statistics most commonly used to value hitters have
been the Triple Crown stats: batting average, home runs, and runs batted in,
none of which are strongly correlated with superior offensive statistics such
as OPS and Runs Created. Consequently, “none of the Triple Crown stats are that
accurate in telling a player's run producing ability” (Weber, 2014). Batting
average measures how often a player reaches base via a base hit but fails to
account for walks and hitbypitches. It also treats every hit equally, leading
John Thorn, official historian of the MLB, to criticize the statistic as a “venerable,
uncannily durable fraud” (Kenny, 2012). The aforementioned Houser study even
concluded that batting average was negatively correlated with wins. Home runs
measure power but ignore doubles and triples, and can also be heavily skewed by
a batter’s home park. Runs batted in, or RBI, supposedly measure a player’s
ability to drive in runs as well as hit for power to some extent (as hitters
with high home run totals tend to have high RBI figures), but has been exposed
as a teamdependent stat. Teamdependent stats are faulty because they rely too
much on outside factors such as luck or the performance of a player’s
teammates. When evaluating individual players, the goal should be to isolate
him from his teammates as much as possible (which rate stats and RC do). It’s
impossible to do so with a statistic like RBI, which is heavily dependent on
the amount of RBI opportunities a player has and how skilled the batters in
front of him are at reaching base. Runs scored is similarly teamdependent, for
a player needs the hitters behind him to drive him in. Recent baseball
literature has made it clear that these kinds of statistics, in addition to the
crude Triple Crown measurements, probably shouldn’t be the first ones a GM
looks at when deciding whether to pursue a player. However, since they are all
common backofthebaseball card statistics, they have been included in this
model.
The
model also must account for different salary structures between the different
subsets of position players. A recent study by Matt Swartz of The Hardball Times found that batfirst
positions (outfielders, designated hitters, and first basemen) are paid more
per win above replacement than glovefirst players. Teams have traditionally
valued good hitters over good fielders, partially because defensive statistics
are less reliable than offensive statistics, making fielding contributions more
difficult to judge (Swartz, 2014). Catchers in particular are penalized because
the wear and tear they endure typically limits their playing time and hinders
their offensive contributions. As a result, backstops have the lowest average
salary among position players[4]
and make less on the free agent market than all the other positions (Armstrong,
2014). On the other hand, designated hitters had the highest average salary in
2013[5]
because they hit well and are less likely to get hurt (Associated Press, 2013).
It’s also important to consider that statistics are not valued equally between
the groups. For instance, corner infielders and outfielders are expected to be
major sources of power and run production, whereas catchers and middle
infielders with power are exceptionally rare. Thus, it is likely that using an
aggregate model for analyzing salary determinants of position players is not
appropriate and will lead to inaccurate conclusions. This paper will implement
an Ftest to see which model is more appropriate, but expects to find that aggregation
is inappropriate and it will therefore be necessary to construct disaggregate models.
Joe Carter was overrated because of his shiny Triple Crown numbers (SportsNet) 
Data
Description:
The
basic model follows a semilog function:
Ln(salary) = B_{0
}+ B_{1}GSavg + B_{2}R + B_{3}HR + B_{4}RBI
+ B_{5}RC + B_{6}Avg + B_{7}OPS + B_{8}wOBA + B_{9}SecA+
B_{10}BBKratio + B_{11}ISO +_{ }B_{12}YearsinMLB+_{ }B_{13}YearsinMLBsquared
_{+ }e
All
of the variables in this model are baseball statistics and the data used to
calculate them came from the Lahman baseball database. Career statistics were
used because general managers typically pay attention to a player’s entire body
of work instead of just his most recent season, which could have been a fluke
year or cut short by injury. Rate stats were used in combination with raw
totals to account for playing time discrepancies and better compare across
positions, since upthemiddle players (particularly catchers) play more
physically demanding positions and typically suit up less than corner defenders.
Though
it’s possible general managers consider playoff performance when evaluating
hitters, this study focused solely on regular season statistics because not
every player gets the opportunity to appear in playoff games. The sample sizes
thus vary wildly from player to player and tend to be very small, sometimes
nonexistent. One cannot and should not draw meaningful conclusions about a
player’s true ability based on a handful of postseason games. Furthermore,
there is little to no evidence in literature supporting the notion that player
salary is dependent upon postseason success.
Lastly,
a mix of newer sabermetrics and traditional statistics were used to compare
which ones have a greater effect on salary. Each variable is described in
detail below:
Ln(salary):
The salary variable is the salary of the position player measured in dollars. The
natural log of salary was used to reduce the effect of extreme outliers on the
model.
GSavg:
The
player’s games started total divided by the number of seasons he’s played
measures the average number of games he starts per season. A player’s games
played total is a good measure of his durability and is essential to
determining his value because nobody is very valuable if he only plays half a
season. The best players are in the lineup every day and will typically play
140 to 160 games per year if healthy. While injuries are often beyond a
player’s control, he can maximize his durability by preparing for the season,
then staying in good shape and taking care of himself once the season is underway.
Players with healthy track records are also considered less risky than players
who are injuryprone and more likely to break down. Using games is a better
measure of durability than atbats because atbats are dependent on a player’s
batting order position.
R:
Runs
scored. Typically players who get on base a lot, hit for power, and run the
bases well score a lot of runs. This is largely a teamdependent statistic, as
the only way a player can drive himself in is with a home run, and also a
function of batting order position. Players at the top of the order tend to
score more runs than those at the bottom because they hit in front of the
team’s best runproducers and have more opportunities to score runs. However,
players can improve their likelihood of scoring by getting into scoring
position via extra base hits and with aggressive baserunning.
HR:
A
home run is an event where the batter comes around to score on a single play
without the benefit of the error from the defense. Most often this is done by
hitting the ball over one of the outfield fences in fair territory, but it’s
also possible to have an insidethepark home run that does not clear the
fence, but rather remains in the field of play long enough for the batter to
circle the bases. Home runs reflect a player’s power, as the balls that get hit
the farthest tend to leave the yard.
RBI:
Runs
Batted In counts the number of times a batter drives in a baserunner and
himself via a hit, walk, hit by pitch, sacrifice fly, or groundout (excluding
double plays). This statistic has been criticized recently because it’s more a
measure of opportunity than anything else: as the players with the most RBI
opportunities typically have a high output.
RC:
Runs
Created is a sabermetric developed by Bill James that estimates the number of
runs a player contributed to his team by taking into account all basic
offensive statistics. It is a good tool for comparing players against each
other because it measures a player’s contribution independent of his teammates
(Panas 40). When these totals are added together for all the players on a team,
the total is usually within five percent of the team’s real runs scored total
(SABR, 2014). The formula for RC is displayed below:
Avg:
Batting
average is a simple rate statistic that divides a player’s hit total by his
total number of atbats. It measures how often a player reaches base via a hit,
and is also a measure of speed to some extent as faster players are able to leg
out more infield hits than slower batters. It is also a good measure of a
player’s ability to make contact, as putting the ball in play more often
typically leads to more base hits and higher batting averages. The problem with
batting average is that it treats all hits equally, and does not differentiate
between singles and extra base hits. Thus, having a high batting average is not
very useful if it is “empty,” as in obtained mostly by singles. Furthermore,
batting average is susceptible to luck, for it is heavily tied to a player’s
Batting Average on Balls in Play (BABiP), which can fluctuate wildly from
season to season.
OPS:
OnBase
Percentage Plus Slugging—the sum of a player’s onbase percentage and slugging
percentage—is widely viewed as one of the best hitting statistics because it
measures a hitter’s ability to get on base and hit for power, the two skills
most correlated with winning. However, OPS is imperfect because it undervalues
getting on base relative to hitting for extra bases and does not properly weigh
each type of extra base hit (FanGraphs, 2014). OPS is also closely correlated
with Runs Created, meaning it accurately measures offensive value (Weber, 2014).
Using OPS also prohibited including OBP and SLG in the model because of
multicollinearity.
wOBA:
Weighted OnBase Average is a contextneutral sabermetric that measures a
hitter’s overall value by combining all the different aspects of hitting into
one statistic and weighting them with their actual run value. It can be used to
calculate how many runs above or below average a player was using these linear
weights. It is one of the best offensive statistics available for capturing a
player’s offensive contributions and is scaled to OBP, so .400 is great, .320
is average and anything below .300 is poor (FanGraphs, 2014). The formula used
to calculate wOBA is displayed below:
wOBA = (0.690x(BB
 IBB) + 0.722×HBP + 0.888×1B + 1.271×2B + 1.616×3B + 2.101×HR) (AB + BB – IBB + SF +
HBP)
SecA:
Secondary average is a sabermetric created by Bill James that divides the sum
of extra bases gained on hits (TBH), walks, and stolen bases (minus times
caught stealing) by atbats. It measures the number of bases a player gained
independent of batting average. By incorporating extra base hits, walks, and
stolen bases, it measures his power, discipline, and speed—the three most
important skills for a hitter. A player can thus have a low batting average but
high secondary average (and vice versa), so SecA helps identify players who are
productive offensive players despite poor batting averages. According to Scott
Gray, who worked with Bill James, "Secondary average is a much better
indicator of offensive ability than batting average" (Gray, 2006). The
formula used to calculate SecA is [BB + (TBH) + (SBCS)] / AB
BBKratio:
The
ratio of a player’s walk total to his strikeout total is an easy way to measure
his strike zone knowledge. The best hitters walk more often than they strike
out, but any ratio close to 1:1 is exceptional.
ISO:
Isolated
power is the difference between slugging percentage and batting average. By
stripping out the singles from slugging percentage, ISO measures the number of
extra base hits a player gets per atbat and is thus a good gauge of his raw
power. Good power hitters have an ISO over .200 while the league average tends
to fall around .140 (FanGraphs, 2014).
YearsinMLB
and YearsinMLBsq: These two stats
measure a player’s major league experience in years. The squared variable is
included to account for the fact that sometimes the relationship between
experience and salary is not linear. In fact, it is usually the shape of a bell
curve because a player’s salary is suppressed early in his career when he has
little bargaining power, peaks when he reaches free agency, and declines toward
the end of his career when his durability and skills diminish. Squaring
YearsinMLB presents experience as a linear relationship.
Dummy
Variables: The Lahman database provides data from the 1985
season through the 2013 season. The author compiled data using year dummies
from 1985 through 2003 for the preMoneyball
era and from 2004 through 2013 for the postMoneyball
era in both the aggregate and disaggregate models, omitting the 1985 and
2004 seasons to avoid collinearity.
The
aggregate models included infielders, outfielders, catchers, and designated
hitters as dummy variables to account for any effect position may have had on
salary and avoid omitted variable bias. They were replaced by the clause “if IF==1”
for the disaggregate infielder model, “if OF==1” for the disaggregate
outfielder model, “if C==1” for the disaggregate catcher model, and “if DH==1”
for the disaggregate designated hitter model.
It is also important to note that prearbitration eligible players are not included in the model because they are unable to negotiate their salaries, which are usually close to the league minimum regardless of their talent level. A player is typically arbitrationeligible after his third season, at which point he and his team can negotiate salaries agreed upon by an arbitrator. It is generally believed that arbitration players are paid close to what they would make in free agency, which a player becomes eligible for after his sixth season. Therefore, in order to exclude the statistics of any player who had not been in the league for three years, all players with less than four years of major league experience were dropped from the model.
Lastly,
in this data it was implied that the players signed new deals every year.
Mark Teixeira has good secondary averages despite poor batting averages (CSN) 
Results
& Analysis:
The aggregate and disaggregate results
are summarized in the following tables:
Table 1: Summary statistics (career)
Variable

Observations

Mean

Stand. Dev.

Min

Max

Lnsalary

7,782

14.15927

1.224417

11.0021

17.31202

GSavg

10,243

68.80587

37.19332

0

155.125

R

10,243

378.3355

329.0393

0

2,227

HR

10,243

80.3167

92.94184

0

762

RBI

10,243

353.5923

321.9295

0

1,996

RC

10,243

396.869

364.3221

.9371428

2,857.884

Avg

10,243

.2625083

.0260075

.1025641

.4444444

OPS

10,243

.7330512

.0914736

.2779204

1.555556

wOBA

10,243

.3424909

.046488

.1296341

.8615556

SecA

10,243

.248787

.0725125

0

1.11111

BBKratio

10,243

.587493

.2821653

0

2.246528

ISO

10,243

.14229

.0518394

0

.6666667

YearsinMLB

10,243

8.367666

3.807199

4

25

YearsinMLBsq

10,243

84.51118

79.49687

16

625

Table 2: Variable Correlations for
Aggregate Model 19852013
Variable

lnsalary

GSavg

R

HR

RBI

RC

Avg

OPS

wOBA

SecA

BBKra

ISO

YrsinMLB

YrsinMLBsq

lnsalary

1.000


GSavg

0.627

1.000


R

0.527

0.711

1.000


HR

0.539

0.562

0.831

1.000


RBI

0.539

0.666

0.938

0.939

1.000


RC

0.546

0.693

0.987

0.881

0.966

1.000


Avg

0.538

0.579

0.554

0.412

0.516

0.572

1.000


OPS

0.648

0.547

0.587

0.692

0.644

0.645

0.763

1.000


wOBA

0.645

0.542

0.585

0.667

0.627

0.639

0,771

0.995

1.000


SecA

0.503

0.382

0.485

0.660

0.532

0.538

0.358

0.843

0.844

1.000


BBKratio

0.141

0.323

0.413

0.192

0.316

0.427

0.415

0.319

0.360

0.274

1.000


ISO

0.517

0.346

0.387

0.690

0.532

0.456

0.333

0.831

0.800

0.853

0.053

1.000


YearsinMLB

0.235

0.356

0.792

0.634

0.772

0.776

0.286

0.283

0.281

0.216

0.308

0.158

1.000


YearsinMLBsq

0.180

0.330

0.792

0.638

0.771

0.780

0.267

0.269

0.267

0.212

0.303

0.152

0.974

1.000

*means significant at the 10% level
Table 3: Regression estimates of
ln(salary) aggregated 19852013
Variable

Coefficient

TStat

constant

4.934356

8.73*

GSavg

.0213333

12.95*

R

.0009888

1.73*

HR

1.75E6

0.00

RBI

.0001338

0.28

RC

.0015535

2.41*

Avg

27.66858

3.35*

OPS

8.493492

0.98

wOBA

7.94368

0.56

SecA

4.430566

2.25*

BBKratio

.1154513

0.43

ISO

9.260805

2.23*

YearsinMLB

.2434398

3.34*

YearsinMLBsq

.012538

14.73*

IF dummy

.050107

.84

OF dummy

.0161421

0.38

DH dummy

.0837987

1.21

C dummy

.0081691

0.07

observations

7,782 (1,475 clusters)


R^2

0.5277

*means significant at the 10% level
Table
4: Regression estimates of ln(salary) aggregated 19852003
Variable

Coefficient

Tstat


constant

4.921585

11.14*


GSavg

.0214335

13.07*


R

.0010057

1.76*


HR

.0000282

0.03


RBI

.0001129

0.23


RC

.0015793

2.46*


Avg

27.42153

3.40*


OPS

8.233114

0.96


wOBA

7.5354

0.53


SecA

4.346409

2.23*


BBKratio

.1101023

0.42


ISO

9.27618

2.25*


YearsinMLB

.2373602

13.87*


YearsinMLBsq

.0125539

14.70*


IF dummy

.0064763

0.11


OF dummy

.0178713

0.42


DH dummy

.0799779

1.14


C dummy

.0080076

0.07


observations

7,782
(1,475 clusters)


R^2

0.5513


*means
significant at the 10% level
Table
5: Regression estimates of ln(salary) aggregated 20042013
Variable

Coefficient

Tstat


constant

6.100364

14.07*


GSavg

.0221016

12.52*


R

.0010588

1.77*


HR

.0008603

0.77


RBI

.0003965

0.73


RC

.0014942

2.25*


Avg

16.09531

1.80*


OPS

5.327046

0.56


wOBA

13.65412

0.89


SecA

1.696535

0.79


BBKratio

.4374325

1.57


ISO

4.08551

0.91


YearsinMLB

.2482729

14.22*


YearsinMLBsq

.0124575

14.79*


IF dummy

.0376426

0.65


OF dummy

.0128129

0.29


DH dummy

.1259855

1.79*


C dummy

.0771986

0.66


observations

7,782 (1,475 clusters)


R^2

0.4929


*means significant at the 10% level
Table 6a: Regression estimates of
ln(salary) disaggregated 19852013
Infielder

Outfielder

Catcher

Designated hitter


Variable

Coefficient

Tstat

Coefficient

Tstat

Coefficient

Tstat

Coefficient

Tstat

constant

4.464285

6.58*

5.795562

5.76*

5.186077

4.20*

1.832539

0.36

GSavg

.0247885

14.56*

.0257263

11.49*

.0399854

11.19*

.0286461

3.07*

R

.0006152

0.94

.0018647

2.05*

.0019811

1.15

.014782

4.06*

HR

.00222

1.80*

.0001721

0.11

.000059

0.02

.0082065

1.02

RBI

.0003674

0.58

.000785

1.18

.0010553

0.48

.0096215

2.72*

RC

.0002972

0.40

.0031853

3.08*

.0008723

0.34

.0102492

3.39*

Avg

20.07381

1.96*

40.17295

3.12*

6.308781

0.35

25.45205

0.21

OPS

2.34936

0.23

22.42181

1.53

15.00767

0.96

236.9343

1.89*

wOBA

7.365766

0.45

22.0816

0.91

10.95484

0.52

577.0981

2.91*

SecA

3.467508

1.34

6.343863

2.19*

3.444201

0.60

81.68854

2.29*

BBKratio

.098286

0.32

.1357967

0.26

.4142728

0.73

12.27805

4.98*

ISO

3.763185

0.79

17.0216

2.32*

1.657506

0.24

125.1409

2.84*

YearsinMLB

.1812792

2.06*

.3465578

2.93*

.0774218

0.47

.1737398

0.23

YearsinMLBsq

.0122314

10.09*

.013856

10.50*

.011607

5.56*

.0218334

5.02*

Observations

4,677 (953 clusters)

2,768 (648 clusters)

1,213 (240 clusters)

337 (164 clusters)


R^2

0.7281

0.2443

0.6784

0.2665

*means significant at the 10% level
Table 6b: Regression estimates of
ln(salary) disaggregated 19852003
Infielder

Outfielder

Catcher

Designated hitter


Variable

Coefficient

Tstat

Coefficient

Tstat

Coefficient

Tstat

Coefficient

Tstat

constant

4.859204

9.74*

5.400055

6.47*

6.102587

8.14*

1.569499

0.27

GSavg

.024556

13.94*

.0258618

11.57*

.0398371

11.35*

.0199707

2.80*

R

.0006158

0.94

.001863

2.08*

.0019544

1.16

.0117227

3.23*

HR

.0023265

1.88*

.0002239

0.14

.0006741

0.23

.0120247

2.05*

RBI

.0004287

0.67

.0007821

1.18

.0007236

0.34

.0088861

2.54*

RC

.0002828

0.38

.0032359

3.18*

.0009353

0.37

.0089134

3.13*

Avg

20.99365

2.06*

40.69086

3.23*

11.02501

0.65

42.3543

0.40

OPS

1.662307

0.16

23.00249

1.59

20.40205

1.36

299.7009

2.74*

wOBA

6.957869

0.43

22.29363

0.92

18.11744

0.88

645.8685

3.35*

SecA

3.62852

1.39

6.826121

2.41*

4.568813

0.80

62.44421

1.88*

BBKratio

.1041646

0.34

.1726638

0.33

.3923753

0.69

12.72335

5.37*

ISO

4.203222

0.88

16.936

2.32*

3.250036

0.50

144.7962

3.80*

YearsinMLB

.2478163

10.21*

.2575067

9.40*

.229919

5.99*

.2241607

1.31

YearsinMLBsq

.0122722

9.94*

.0139752

10.60*

.0118973

5.59*

.0195147

4.89*

Observations

4,677 (953 clusters)

2,768 (649 clusters)

1,213 (240 clusters)

337 (164 clusters)


R^2

0.5718

0.5562

0.5701

0.0815

*means significant at the 10% level
Table 6c: Regression estimates of ln(salary)
disaggregated 20042013
Infielder

Outfielder

Catcher

Designated hitter


Variable

Coefficient

Tstat

Coefficient

Tstat

Coefficient

Tstat

Coefficient

Tstat

constant

5.946243

12.15*

6.641616

8.28*

6.809876

9.58*

2.989472

0.63

GSavg

.0251963

13.72*

.0266983

12.02*

.0411728

11.46*

.0222447

3.02*

R

.0004793

0.68

.0017887

1.85*

.0029403

1.69*

.0115076

3.49*

HR

.0016876

1.23

.0011048

0.63

.0002006

0.06

.0015865

0.23

RBI

.000241

0.35

.0005624

0.72

.0007453

0.36

.0045564

1.56

RC

.000287

0.36

.0031005

2.83*

.0018866

0.80

.0092429

3.31*

Avg

9.979166

0.93

24.56249

1.70*

10.39711

0.55

76.53192

0.67

OPS

12.8886

1.17

2.544007

0.16

19.34364

1.18

113.4643

1.02

wOBA

23.43315

1.35

8.359464

0.33

18.46234

0.86

349.7148

1.99*

SecA

.9833255

0.36

3.411589

1.05

4.960112

0.88

88.30981

2.58*

BBKratio

.4354201

1.36

.0969974

0.17

.0418472

0.10

11.75767

5.50*

ISO

.3225364

0.06

7.886789

0.99

3.547758

0.46

87.17831

2.27*

YearsinMLB

.2537545

10.61*

.273931

9.14*

.2091591

4.97*

.3969874

2.90*

YearsinMLBsq

.0119577

9.74*

.014342

10.63*

.0102489

4.43*

.0221777

6.02*

Observations

4,677 (953 clusters)

2,768 (648 clusters)

1,213 (240 clusters)

337 (164 clusters)


R^2

0.5180

0.5012

0.5135

0.0609

*means significant at the 10% level
Ttests
were performed on each variable in the aggregate model. Using a 10%
significance level with infinite degrees of freedom, the critical region was
greater than 1.645 and less than 1.645. Working under the null hypothesis that
each variable equals zero and has no influence on salary, the null was rejected
for the variables GSavg, runs, RC, batting average, SecA, ISO, yearsinMLB, and
yearsinMLBsq. Rejecting the null hypothesis disproved the notion that these
variables have no influential power and are not determinants of salary, but the
inability to reject the other variables means their lack of influential power
cannot be disproved. However, because preliminary research showed that the
other variables are likely to affect salary as well, variables that did not
appear significant (such as home runs, RBI, and OPS) were still included in
this model.
Aggregate
vs. Disaggregate models: To determine if aggregation of the
different position player types was necessary, an Ftest was implemented on the
pooled model at the 5% significance level. The resulting Fstat of 6.47 exceeded
the critical value of F>1.67 (13 degrees of freedom in the numerator,
infinite degrees of freedom in the denominator), allowing for the pooled model
to be rejected at both the 5% and 1% significance levels. Thus, there was strong
evidence of structural differences, implying aggregation was not appropriate.
To make valid conclusions about determinants of salary, disaggregate models were
used for infielders, outfielders, catchers, and designated hitters. These
models made it possible to isolate how the variables influenced salary in each
of the four player groups and compare differences.
Russell Martin notwithstanding, catchers tend to be underpaid (FoxSports) 
Infielder
model: Performing ttests on this group showed that average
games, home runs, batting average, and yearsinMLB were significant from 1985
through 2013, while SecA was nearly so. As expected, the market has rewarded
durability, power, the ability to hit for average, and experience. It was
surprising to see that RBI, RC, and wOBA had negative correlations, while OPS was
close to 0. The disproportionate weight given to traditional metrics and
general disregard for sabermetrics in this group seems to suggest that the
offensive skill sets of infielders are not being valued properly.
The data shows that after Moneyball, traditional statistics like home runs and batting
average had less effect on salary (they were significant before 2003, but not
after) while advanced stats like BB/K and OPS became more correlated with
salary. This shift is consistent with the belief that Moneyball popularized sabermetrics and altered the way front
offices evaluated players. However, wOBA and RC still had negative correlations
after the publication of Moneyball, possibly
indicating that these stats have not yet caught on and could be exploited by a
general manager.
Outfielder
model: Performing ttests on career statistics for
outfielders revealed average games, RC, batting average, SecA, ISO, and
yearsinMLB as significant at the 10 percent level between 1985 and 2013. Therefore,
outfielders appear to be more accurately valued than infielders, as more
attention is paid to their secondary average, runs created totals, and isolated
power. Again, it was not surprising to see durability, power, run production,
batting average, and experience be significant. It was very surprising,
however, to see negative correlations between salary and RBI, OPS, and BB/K
ratio. Teams tend to hide strong hitters at corner outfield spots, so one would
expect them to be sluggers with high RBI and walk totals in addition to robust
OPS figures.
Looking at changes over time, the most recent 10 years of
data show less emphasis on batting average and power (home runs and ISO) and
more emphasis on BB/K and OPS. This pattern supports the idea that Moneyball alerted baseball to the value
of walks and sabermetrics. It is surprising, then, that there has been less
attention paid to SecA, wOBA, and RC since 2003. These statistics should be
getting more attention, and consequently may be undervalued.
Catcher
model: Performing ttests on catchers revealed average
games and yearsinMLBsq to be the only significant determinants of salary from
this model at the 10 percent level. That was very surprising given the wide
range of statistics used in this paper, leaving one to wonder what statistics are important for catchers aside from
durability. It could be that general managers don’t put much stock in the
offensive numbers of catchers because they usually aren’t as impressive as
those of noncatchers due to the daily wear and tear that catchers must
withstand. As a result, they don’t play as often and their offense suffers.
Goodhitting catchers are extremely rare, and so most teams will tolerate a
mediocre offensive player behind the plate so long as he’s a dependable backstop
who works well with the pitching staff.
Overall, there was not much change in the correlations
before and after Moneyball. As
anticipated, BB/K ratio and OPS had stronger correlations with salary after the
book’s release, but most of the other correlations hardly changed and the
correlation between wOBA and salary actually went down, which was not expected.
As in the other groups, wOBA is being undervalued with catchers.
Designated
hitter model: Performing ttests on designated hitters
showed that average games, RBI, RC, wOBA, BB/K ratio, and ISO were significant
at the 10 percent level. Thus, players in this group appear to be fairly
valued, which is encouraging since all of their value stems from their batting
prowess. It was surprising, however, that many of the statistics were
negatively correlated with salary, including runs, home runs, batting average,
OPS, and SecA. The latter two statistics in particular should play a larger
role in determining salary given their influence on runs and team success. This
group was also the only one with a negative correlation for yearsinMLB, but
that makes sense considering most designated hitters are thirtysomething
veterans who can no longer play the field (i.e. David Ortiz, Victor Martinez, Adam Dunn).
Comparing pre2004 data with post2004 data reveals that
home runs, OPS, and experience have become more correlated with salary while
the correlations for most other statistics have remained similar or gone down.
This would seem to imply that not much has changed in regards to how hitting is
evaluated, since offense is the only way designated hitters are measured.
However, the sample size for this group is much smaller than the other groups,
since only American League teams can have DHs and most teams only carry one, so
we should be cautious about drawing conclusions from this group’s data.
Heteroskedsticity
and Autocorrelation: Because the panel data used is
timeseries and crosssectional, there is a high likelihood of
heteroskedasticity and autocorrelation errors. There’s also the possibility
that teams sign players more for their intangibles rather than their actual
skills (i.e. good clubhouse guys valued for their strong work ethics and
leadership abilities), which would lead to omitted variable bias and thus yield
inaccurate standard errors. To account for these possibilities the model was
estimated with fixed effects. The author also clustered by hitters in the form
of [xtreg lnsalary GSavg R…yearsinMLBsq, fe cluster(lahmanid)] to handle
heteroskedasticty and autocorrelation, fixing standard errors in the process.
Designated hitters like Billy Butler appear properly valued (ChatSports) 
Conclusion:
This data shows that while some
statistics are properly valued, there are several that are still being
overvalued relative to their relationship with team success while others
continue to fly under the radar.
Not surprisingly, the results showed increased
appreciation for BB/K ratio, which after Moneyball
became more correlated with salary in each of the four groups. This
supports the theory that teams were quick to embrace Billy Beane’s approach of
targeting selective hitters. Based on BB/K’s impact on team success, they were
right to copy him. Dan Weigel conducted a study using data from 2004 through
2013 to determine which offensive statistics have the highest correlation to
scoring runs, and found that BB/K ratio was significantly correlated with team
run totals in 2013 with a correlation of .6598 and over the 10year period with
a correlation of .6791 (Weigel, 2013).
The data also showed an increased emphasis on OPS after
the publication of Moneyball in three
of the four groups, reflecting the
statistic’s increased popularity as well as greater appreciation for OBP, one
of the two components of OPS along with slugging percentage. Beyond Batting Average showed that “As a
team’s OPS increased, their runs went up in proportion,” leading the author to
conclude that “the OPS statistic is a strong predictor of runs scored” (Panas
34). This conclusion has been confirmed by several recent studies, including
one by Dan Fox. Using five years of data from 2000 through 2004, Fox found that
OPS and Runs Created were among the two statistics most commonly associated
with runs scored, as both had coefficients of determination of 0.964. His
conclusion was that OPS is the best statistic for measuring offensive
performance because “it happens to be a kind of linear approximation of more
complex run estimation formulas. And of course it’s so simple that even my new
biggest fan can use it” (Fox, 2006). Another study found that OPS was second
only to Runs Created in predicting team success, confirming the former’s
importance (Baseball and Sabermetrics, 2013).
It
was very surprising to see that home runs were not chief determinants of salary
(except in the infielder model, which corresponds to teams historically paying
a premium for slugging middle infielders such as Alex Rodriguez, Miguel Tejada,
and Robinson Cano). Dating back to the days of Babe Ruth, sluggers have
typically been wellrewarded for their prodigious power, leading Hall of Famer
and seventime National League home run champion Ralph Kiner to claim “Home run
hitters drive Cadillacs; singles hitters drive Fords.” However, the data
indicates that home runs aren’t as much of a factor in salary anymore, at least
in the postMoneyball/poststeroid era. This could be explained by the
recent trend of teams preferring fast, defensively skilled players over
plodding, onedimensional sluggers (Gregory, 2014). It might also be that teams
have discovered home runs aren’t necessary to win ballgames, for long balls are
not strongly correlated with runs created and team success (Weber, 2014). There
is evidence supporting ISO’s relationship with team success in Weigel’s study, however,
which showed ISO having strong effect on run totals with a 0.8183 R value over
10 years of data from 2004 to 2013. So if a general manager is looking to add a
power bat to his lineup, he should consult the player’s ISO (which was only
significant in two of the four groups) rather than his home run totals.
It
was also surprising that runs scored was negatively correlated with salary for
three of the four groups, indicating runs are an undervalued statistic. This
has long been the case in baseball, as players who drive in runs tend to be
overvalued (especially in MVP voting[6]) whereas
player that score runs are frequently undervalued. This could be because big
RBI guys bat in the middle of the order and tend to have high home run totals
and batting averages, while players who score lots of runs are typically good
baserunners with excellent OBPs and thus bat near the top of the order. The
data confirmed this bias, as RBI was shown to be more strongly correlated with
salary than runs. However, Weber’s study showed runs and RBI as having
nearlyidentical correlations with runs created, indicating they should be
valued equally (but not much given their teamdependent nature and weak
correlations with runs created per his study).
As anticipated, the data shows that batting average is
still being overvalued, as it was a significant determinant of salary for
infielders and outfielders. However, teams appear to be wising up to the fact
that batting average is not a great statistic, as it became less correlated
with salary in three of the four groups after Moneyball. This trend is encouraging because Dan Fox’s study found
that of the three triple slash stats (AVG/OBP/SLG), batting average had the
lowest correlation with runs scored (Fox, 2006). Dan Weigel’s study confirmed Fox’s finding
that of the three triple slash numbers, batting average has the lowest
correlation to offensive success. His study thus disagreed with how valuation
of players and teams is often done by simply ranking batting average. “The goal
of baseball is not to hit for a high average,” Weigel wrote, “it is to score runs,
and the low R value for AVG shows that a different and more accurate valuation
tactic should be used for hitters, ideally wOBA” (Weigel, 2013). Beyond Batting Average reached a similar
conclusion by showing team OPS was more correlated with its run total than
batting average in 2008, noting that “the OPS measure is a better predictor of
runs scored than batting average,” and “On an individual level, a player’s OPS
reveals more about his contribution to his team’s runs than his BA” (Panas 34).
It was also alarming to see that the correlations between
secondary average and salary decreased following Moneyball’s publication in
each of the four groups. It was equally troubling that SecA was only positively
significant for one group (outfielders) when looking at the entire data set
(19852013). Given SecA’s proven relationship with team success, it should be a
significant factor in player salary. For instance, one study found that
secondary average predicted the success of eight teams most effectively from
1998 through 2012, with “success” defined by team record and run total
(Baseball and Sabermetrics, 2013). SecA hasn’t made its way into the mainstream
the same way walks and OPS have, so it appears to be an undervalued statistic
for now.
Another underappreciated sabermetric is wOBA, which was a
significant determinant of salary in just one of the four groups (designated
hitters). Furthermore, in three of the four groups wOBA had lower correlations
with salary after Moneyball than before
even though wOBA wasn’t even invented until 2008, five years after Moneyball’s publication. This development was puzzling since Baseball Examiner recently championed
wOBA as “the best allinclusive offensive stat.” This claim was reinforced by Weigel’s
study, which found that the correlation between wOBA and team runs “is higher
than any other individual statistic that does not require an additional formula
to reach,” at over 0.98 for just 2013 as well as the 10year period his study
covered. Arguably the best statistic for measuring a player’s offensive
contributions, wOBA should be one of the first statistics a general manager
considers and is thus extremely undervalued.
Therefore, it appears that Moneyball has had a noticeable effect on the way in which teams
conduct player evaluation. The data supports the notion that the book’s
emphasis on the importance of OBP led to a greater appreciation for plate
discipline and patient hitters, confirming the conclusion of the Hakes and
Sauer paper. This shift is reflected by how OPS and BB/K ratio have had a
greater impact on salary in the aftermath of the book’s release. As a result,
batting average has become less relevant, since it is already accounted for in
OBP and is not strongly correlated to scoring runs. Teams appear to have
recognized the value of reaching base and are rewarding players accordingly.
However,
market inefficiencies still exist in regards to power, runscoring ability, and
composite offensive statistics like wOBA and SecA. While the statistics
popularized in Moneyball are now commonplace,
newer sabermetrics have emerged as potential sources of value. With so many
teams embracing analytics and looking to exploit inefficiencies like these,
it’s becoming increasingly difficult to find them, especially since teams with
high payrolls can spend more to exploit them. When inefficiencies are
discovered, general managers must act on them immediately because the market will
correct itself before long[7].
With everyone looking for an edge, “Secrets are hard to keep…To play
consistently successful Moneyball, you have to stay ahead of the curve, and
that’s hard” (Cowen and Grier, 2011).
Hard,
but not impossible.
Works Cited
Appelman, David. “Get to Know: Runs
Created.”
Armstrong, Shane. “Why Catchers Should
be Paid More.”
Associated Press. “MLB Average Salary is
$3.39 M.”
(18 December 2013).
Baseball and Sabermetrics. “Which
Sabermetric Predicts a Team’s Success Most Effectively?”
predictsateamssuccessmosteffectively/ (7 August 2013).
Baseball Examiner. “The Best Offensive
Stat.”
(18 September 2014).
Booth, Chuck. “The Most Underrated
Statistic: Extra Base Hits (XBH).”
(25 November 2008).
Chen,
Albert. “Mega
television deals are changing baseball’s economic landscape.”
http://sportsillustrated.cnn.com/2012/writers/albert_chen/04/23/baseballtelevisiondeals/
(23 April 2012).
Cowen, Tyler and Kevin Grier.
“The Economics of Moneyball.”
(12 January 2006).
Gaines, Cork. “CHART: How Much Every MLB
Team Has Spent On Free Agents This Winter.”
(7 April 2014).
Gray, Scott. The Mind of Bill
James: How a Complete Outsider Changed Baseball. New York:
Doubleday, 2006. Print.
Gregory, Sean. “The Kansas City Royals
are the Future of Baseball.”
Hakes, Jahn and Raymond Sauer. “An
Economic Evaluation of the Moneyball
Hypothesis.”
Journal of Economic Perspectives (Summer 2006): 173185.
Houser, Adam. “Which Baseball Statistic
is the Most Important When Determining Team
(25 Sept. 2012).
Mittler, Doug. “Increase in defensive
shifts.”
(10 June 2014).
Panas, Lee. Beyond Batting Average. Selfpublished, 2010. Print.
SABR. “A Guide to Sabermetric Research:
A Primer on Statistics.”
Sawchik, Travis. “Production shift
changes MLB free agency.”
http://triblive.com/sports/pirates/496232174/freeplayersagency (2 November 2013).
Schaal,
Eric. “MLB
Power Outage: 5 Signs of a New Dead Ball Era in 2014.”
2014.html/?a=viewall (25 August 2014).
Swartz, Matt. “Searching for Racial
Earnings Differentials in Major League Baseball.”
baseball/ (14 August 2014).
Tango, Tom. “The Book—Playing The
Percentages In Baseball.”
(9 February 2010).
Weber, Roger. “Triple Crown Stats Not
the Best Way to Grade a Player.”
Weigel, Dan. “Finding the Cause of
Offensive Success.”
(11 November 2013).
[1] ESPN’s
Doug Mittler found that teams in 2014 shifted five times more often than they
did in 2011
[2]
Players have historically peaked between ages 25 and 29, but during the steroid
era peaked between 28 and 32
[3]
Ruth justified his $80,000 salary by quipping that he “had a better year than
(President Hoover) did”
[4]
The only group with a lower average salary than catchers was relief pitchers
[5]
The average DH made $10.5 million in 2013, $4 million more than the next
closest position (first base)
[6]
From 1993 to 2007 all but one AL MVP had at least 100 RBI and every NL MVP had
at least 90 RBI (BaseballRef)
[7] Within
a year of Moneyball’s publication, the market’s OBP inefficiency was
“substantially if not completely eroded” (Hakes and Sauer, 184).
No comments:
Post a Comment