Baseball is mathematically based. It is the best link between the generally nerdy domain of mathematics and generally manly domain of professional sports. The one downside to this statistically driven machine is that the stats can be selected and used to benefit almost any argument proposed. Major League Baseball keeps such extensive records of stats, that there is always an obscure one in support of your argument. Right now the league is in an uproar over the Toronto Blue Jays sign stealing controversy. Some anonymous players claimed that the Jays were stealing signs with a 3rd party and that effected the amount of home runs they hit at home. In an article by ESPN, they use the personal witnesses' accounts to bring up the topic, but claim that:
"Colin Wyers, a contributor to ESPN Insider who writes for Baseball Prospectus, provided independent analysis that showed statistical deviations in Toronto's hitting stats that he considered too great to be random chance."
If you want the full story visit the original source. This post will make much more sense if you do. Take note on the statistics used, and how convincing they are. You will be hard-pressed not to believe the story. When you are done, come back here and read this; hopefully the same effect occurs. The goal of this post is to selectively use statistics (much like Mr. Wyers) to make a very convincing case, and to show the dangers of a statistically driven world. There is a public myth that mathematics is completely objective--numbers don't lie. In this case, the numbers themselves don't lie, but they may or may not contribute to one.
Link: ESPN'S Article
I am going to begin my "independent analysis" by detailing the four sections of my argument. It should be known that I am not privy to the virtual cornucopia of stats that the nice folks at ESPN are, but the internet (specifically baseball-reference.com) provided all I needed. I will address the following points through my statistical lens:
Jose Bautista's Numbers
Vernon Wells' Trends
The Trade of Yunel Escobar
AL East Analysis
1) Jose Bautista's Numbers
Anyone who follows baseball knows of the emergence of @JoeyBats19. The reigning home-run king and highest vote-getter for this year's all-star game is continuing that pace again. The article claims that there is a significant difference between his on base plus slugging percentage (OPS) rating. The article claims that this discrepancy is too wide. This begs the first question, "How wide is, mathematically, too wide?". The answer for this (coming from statistics) is: "As wide as you need it to be." Switching your sensitivity (alpha value) is a key ploy for statisticians to get their result. But politics aside, what stat can counter the seemingly overwhelming evidence proposed by ESPN?
Let's look at the home runs. Bautista hit 33 homers at home and 21 on the road. That is a difference of 12. Is that large? Or does it prove that he has serious power. I say the latter. If we eliminate the home field home-runs (and the supposed sign stealing), and pretend that JoeyBats19 played every game on the road, he would have hit 42 home-runs--that is the statistical projection. Even with those stats, he still would have finished first in the home-run race league wide. This shows Jose's power in all ball parks and renders that 22% increase due to home field as insignificant. Twenty-two percent could be explained by many factors: more rest at home, comfortability, knowledge of the ball park, or even hitting in the bottom of the inning. Sure, ESPN's stats truthfully show the discrepancy in homers, but fail to illuminate the insignificance of the gap.
2) Vernon Wells' Trends
The article claims that the descent of Vernon Wells' numbers is proof that he is no longer getting help from the sign stealing at home. Although his average has dropped since the trade to Los Angeles, there are more statistics in play here than ESPN wants to divulge. They cite his OPS and home runs as evidence of his immediate decline, but fail to inform readers of many other statistical factors.
First, Vernon turned 32 when he was traded to the Angels. I am sure MLB has some pretty juicy statistics comparing players' ages and their slugging percentage, but I will have to stick to what I have access to. ESPN doesn't mention the sample size for their stats. They take Well's Blue Jay stats from a city where he was a 3-time all star and had 3 gold gloves. He also finished as high as 8th in MVP voting. If you chart his trends in Toronto, they go steadily down in both categories. ESPN wants to give the impression that the Angels inherited a top-notch player, but the stats (conveniently left out) show otherwise; Wells was on the down-slope of his career.
Second, it didn't help that the Angels are also on the fall. Wells' stats were hurt by the overall make-up of the team. In the season before Well's arrival, the Angels got steadily worse. They lost Matsui (21 HRs in 2010), Napoli (Led team with 26 HR in 2010) and Morales (11 HR in 2010). This was only the continuance of the trend which saw them lose Vladimir Guerrero the season before. Guerrero still holds several significant Angels records including highest average slugging percentage, batting average, and OPS. Such significant losses clearly show that Wells wasn't hurting from the lack of signs from the outfield, but rather from the lack of a solid team around him. A waning star on a waning team cannot be held as evidence.
3) The Trade of Yunel Escobar
Yunel Escobar came to the jays mid-season. He provides the neatest sample of evidence because he almost split the season in half with two teams. If there was significant changes, we would see them. ESPN only mentions his rise in OPS, but nothing else. Why would that be? Maybe because if you look at his stat line during the season of the switch (available here), the 2010 season contains no major statistical differences. So what about the 2011 increase mentioned by ESPN? Look back at his OPS in 2007 and 2009 with Atlanta. His .837 and .812 averages out to .825. ESPN conveniently chose to compare Yunel's worst year (where he had off-field troubles in Atlanta) with his best year. Of course the stats will show improvement when you do that. Why not compare his best year to his current year? The full story would paint a different picture than the one ESPN is authoring.
4) AL East Analysis
Finally, let's look at two teams mentioned in the article: Red Sox and Yankees. The two proverbial beasts of the AL-East. Both were seen giving multiple signs at Rogers Centre, so let's assume they kept that up starting in 2010. This would theoretically defeat the Blue Jays' advantage as hitters, so they should have done worse against these opponents in 2010. Let's look at the stats. In 2009, (before the teams were aware) the Sox were 5-4 in Toronto. and the Yankees were 6-3. Pretty good for teams who are being cheated. In 2010, (after they figured it out) the Sox were 7-2 and the Yankees were 3-6. When you take the sums of the records:
Playing Against Cheating Blue Jays:
Playing Against Fair Play Blue Jays
The two teams actually did worse once they began guarding themselves against the Jays' "tactics". This stat doesn't even include the steady Jay improvement over that year. Their combined road records against the same two teams in 2009 was 6-12 and that improved to 8-10 in 2010. These stats show that the Yankees and Sox actually did worse when they played the Jays without the alleged cheating. Blue Jays didn't cheat at home, their wins tell us that, and any sign stealing that was occurring didn't significantly alter what matters most in professional sports--wins.
When all is said and done, the mathematical argument put forward by ESPN is shoddy at best. For such an esteemed broadcaster to claim statistical differences is a cruel trick on the in-numerate public. Sports' fans and mathematicians both have to be aware of the dangers of statistics. Often times an "independent analysis" showing significant "statistical deviations" is not telling you the whole truth.