A Talent / Value Adjustment You’ve Never Thought About

September 28, 2009

(Exclusive to this web site!)

One of the great missing adjustments to even our most advanced stats is the adjustment for the quality of the opposition. A given stat line put up in the AL East means something rather different from the NL Central. Compounding the problem is that the adjustment needs to be iterative; once you have calculated the opposition quality for everyone and adjusted all the stats, you have to paste the adjusted stats over the originals, recalculate the opposition quality, and so on, again and again until the values stabilize. I do that with my adjusted standings at SoSH, but no one, AFAIK, has ever done that with individual hitting and pitching lines.

How exactly would you do this? You could just take opponent overall quality, which would in fact be your best adjustment for value. But an interesting and in some ways better alternative would be to include handedness. For instance, each LHB would get an adjustment based on the numbers versus LHB of the LHP pitchers he faced, and a separate adjustment for the RHP.

(As an aside, if you’re studying whether some hitters have persistent quality-of-opposition splits, you have to do it this way. AFAIK, no one ever has — the few studies showing no such persistence have used opponent ERA, which adds a huge amount of noise. Jon Lieber in his prime was a great pitcher vs. RHB and a lousy one vs. LHB; why count him as average vs. everyone?)

This notion of adjusting for opposition quality by handedness immediately suggests a value adjustment I’ve never heard mentioned. As a rule, elite LHB face a better quality of LHP than do average LHB. Not only are they more likely to not get benched against a C. C. Sabathia, they are hugely more likely to face a nasty LHR in the late innings. The differences among LHB are mitigated (slightly? more than that? I don’t think anyone knows) by this. The quality-of-opposition adjustment I just outlined would put the proper distance between the Alex Coras and Adrian Gonzalezes of the world. And this would be hugely desirable and interesting — when you’re assessing talent, that is.

Now, the funny thing is this: when assessing value, this can probably be safely overlooked. It’s built into the way the game is played now that this will happen. That we are underestimating how much better Gonzalez is than Cora is pretty much negated by the better pitching Gonzalez faces as a result.

However, there is probably a small but very interesting class of exceptions to this rule. You would expect there to be some hitters who get too much or not enough respect from opposing managers, and thus face more or fewer LHR than they ought to based on their own platoon splits. You would adjust for this by finding the correlative relationship between LHB platoon splits and the percentage of time they are at the platoon disadvantage, and then calculate the expected number and quality of LHR they faced versus the actual. The players with the biggest differences, in both directions, would make for very interesting lists. It’s possible that some of the “noise” in platoon splits is actual signal; as LHB establish reputation, managers begin to match them up with their lefty-killers. But reputations lag behind reality, both at the start and end of careers (David Ortiz may now be seeing tougher LHP than other LHB of his quality).

(As an aside, I know that Trot Nixon’s career path of splits vs. LHB was made completely nonsensical by the genius of Jimy Williams, who benched Nixon against even the easiest LH starters but never pinch hit for him against even the toughest LHR. So he was probably leading the league in toughest quality of LHP faced despite being nowhere near the top of the list for overall LHB quality. That’s the sort of guy it would be neat to identify and adjust.)


Matsuzaka’s Journey Back

September 27, 2009

Two things we all know (or is that “know”?) about pitching:

— There’s no such thing as a pitcher with the skill of pitching out of jams. If a pitcher has much better numbers with runners on (or RISP) than with the bases empty, that’s luck and it will normalize, sooner than later.

— There’s no such thing as a pitcher with the skill of getting lots of easy outs on balls in play. While there are real differences in BABIP skill (with knuckleballers leading the pack), if a pitcher has a low enough BABIP, that’s luck and it will normalize, sooner than later.

What, then, do we make of a pitcher who consistently pitches out of jams by improving his rate of easy outs on balls in play?

I think we all knew that Dice-K has both a crazy bases empty / runners on split (and hence strand rate) and a crazy BABIP. What I didn’t know until I ran the numbers is that the crazy BABIP happens only with runners on:

Matsuzaka Tricks

Split PA BA OBP SA OPS CR/27 K% BB% HRC BABIP XBH/BIP 1B/(1B+OIP) XBH/HIP
Bases Empty 1002 .271 .357 .429 .787 6.02 .211 .111 .039 .330 .088 .265 .268
Men On 835 .215 .310 .355 .665 3.31 .229 .104 .039 .259 .073 .200 .281
% Improvement 9% 6% 0% 22% 17% 24% -5%

The improvement in strike zone command seems modest, but it’s crucial: even without the change in BABIP it would be enough to reduce his component ERA with runners on from 6.02 to 4.58. But that guy would still be lousy; he needs the low BABIP with runners on to succeed. And the improvement in strike zone command with runners on shows that the notion that he nibbles more to avoid harder contact is just wrong; when runners get on he comes after hitters more — and with more success.

It’s not merely that he’s doing it with smoke and mirrors, it’s like the smoke is in front of the mirrors and nowhere else.

One might begin to think that he simply pitches better out of the stretch. But then what do we make of this already legendary split, even with its eenie-weenie-teenie-tiny sample size?

Matsuzka Tricks Reloaded

Split PA BA OBP SA OPS CR/27 K% BB% HRC BABIP XBH/BIP 1B/(1B+OIP) XBH/HIP
Other Men On 778 .218 .315 .361 .676 3.40 .226 .105 .039 .263 .074 .204 .281
Bases Full 57 .163 .246 .265 .511 2.20 .263 .088 .028 .200 .057 .152 .286
16% 17% 30% 24% 23% 26% -2%

Or this more obscure and even more puzzling one?

Matsuzaka Tricks Revolutions

Split PA BA OBP SA OPS CR/27 K% BB% HRC BABIP XBH/BIP 1B/(1B+OIP) XBH/HIP
Leadoff Batter 441 .290 .379 .464 .842 7.10 .200 .111 .044 .347 .088 .285 .253
Other Empty 561 .256 .340 .402 .743 5.27 .219 .111 .035 .316 .089 .249 .281
% Improvement 10% 1% 20% 9% -1% 12% -11%

Let’s just combine the last two for easy scanning:

Matsuzala Tricks, The Box Set

Split PA BA OBP SA OPS CR/27 K% BB% HRC BABIP XBH/BIP 1B/(1B+OIP) XBH/HIP
Leadoff Batter 441 .290 .379 .464 .842 7.10 .200 .111 .044 .347 .088 .285 .253
Other Empty 561 .256 .340 .402 .743 5.27 .219 .111 .035 .316 .089 .249 .281
Other Men On 778 .218 .315 .361 .676 3.40 .226 .105 .039 .263 .074 .204 .281
Bases Full 57 .163 .246 .265 .511 2.20 .263 .088 .028 .200 .057 .152 .286

Pure science fiction.

I definitely intend to attack this question with pitch/fx data over the winter. Right now, I’m open to any possibility.

Taken from: Sons of Sam Horn

Let’s Take a Look at Pedro, Circa 2000

September 23, 2009

I have friends who are complete non-baseball fans, but big-time math-science geeks, i.e., they’re folks who understand the normal distribution of natural phenomena and can appreciate exceptions. I have entertained them greatly by reading the leader boards from those two years in descending order. Especially 2000, where the 2-5 finisher in most stat categories were tightly clustered.

Let’s play “what’s the next term in this sequence?”!

ERA: 4.17, 4.14, 4.13, 4.12, 4.11, 3.88, 3.79, 3.79, 3.70 . . .

How many people had 1.74?

WHIP: 11.52, 11.48, 11.18, 10.79 …. 7.22

Opposition OBP: .306, .303, .298, .291 … .213.

Opposition SLG: .392, .384, .374, .371 … .259.

This (as well as 1999, of course) is an essentially superhuman performance. If there had been a league as much better than MLB as MLB was to AAA, and then another league with the same performance differential, Pedro would have still been the best pitcher of that league.


Streakiness and the “Fog”

September 9, 2009

An aside on the notion of streakiness (part 1B of the series, if you will).

There have been many studies showing that all streakiness in sports is random. There was an exhaustive study a couple of years ago on Retrosheet that found that hot and cold streaks had no predictive power.

The problem is in the assumption that if a hot streak or cold streak is real, it must have significant predictive power.

In baseball, the standard test of the reality of hitting streaks is serial correlation. Is a player’s performance in one game predictive of his performance in the next? The problem with this is that there’s an enormous amount of “noise” added to the signal. A hot hitter will face Sabathia and go 0-4, a cold one will face a AAA callup and / or get two bloop hits.  Red Sox and Yankee fans may remember a series in NYC (May 27-29, 2005) where Manny Ramirez, one of baseball’s truly streaky hitters, came in 1 for his last 12 and looking awful and went 7-13, each and every one a cheap single; he then left town and put up a 562 OPS in his next 10 games.

The further problem is that in a serial correlation, the end of each streak and the beginning of the next form a pair of points included in the correlation, when our hypothesis is that they should anti-correlate. That further reduces the strength of the measured correlation.

In fact, if you take Manny’s career with Boston and divide it into apparent hot (actually just normal) and cold streaks and remove the anti-correlated data pairs, you do get a significant or nearly significant serial correlation (of linear weights / PA). And as I have noted elsewhere, player seasons often divide into chunks that chi-square tells us are unlikely to be random.

Standard statistical tests of streakiness just aren’t up to the task of demonstrating it’s real. That doesn’t mean it isn’t real — a perfect example of what Bill James calls the “fog.”

I’m 100% certain that a study using experienced baseball scouts could prove the existence of streakiness by having them significantly outperform chance in their ability to predict the end of slumps by streaky players like Manny (as Jerry Remy used to do). IOW, they’d say, “OK, today player X fixed his mechanics and should perform better over the next N games than the last N.” And they would be right most of the time.


Understanding Player Streakiness #1: The Epic Slump of Big Papi

August 26, 2009

Why do we believe that David Ortiz’s unimaginably bad April and May should nearly be ignored when projecting his performance for the rest of the year?  Why should we believe he has the skills of a .259 / .352 / .566 hitter (his numbers from June 6 to August 24) rather then those of  a .227 / .320 / .439 hitter (his overall season numbers)?   Isn’t this “cherry-picking” of the worst sort?  Why would you toss out a significant chunk of a season like that?

It’s not a trivial question: if Ortiz really is a 759 OPS hitter now, he should be getting no PT for the Red Sox, not when Casey Kotchman is sitting on the bench.  In fact, some on SonsofSamHorn.net (from which much of this post is adapted) have been arguing just that: bench Papi, he’s toast.  But if he’s actually a 918 OPS hitter, that would be crazy.

The first thing to understand is that you could play Stratomatic Baseball or Diamond Mind from now until the day you die and not see a .439 slugger put up a .288 SA in his first 221 PA and a .566 SA in his next 264.  Ortiz’s HR / Contact has gone from .007 to .106; the odds against seeing that in a random simulation are something like 3,823 to 1 (chi-square, p < .0003).  [NB: Yes, I know that chi-square is not exactly accurate with any of the n < 5, but it’s close enough for sabermetrics.]

It’s important to understand that streakiness is real. Player seasons which divide like this and give ridiculous odds of happening randomly according to chi-square are commonplace.  That serial day-to-day correlation is not significant does not disprove the notion of streakiness; it just fails to give positive evidence.  Remember that a correlation does not measure the strength of a relationship; it measures the strength of a relationship minus the noise. Add sufficient noise, and any real relationship can fail to show a significant correlation.

The second thing that’s crucial is that we have a very good explanation for why Ortiz (or any hitter and especially any aging hitter) could have such a miserable stretch of season.  When looking at big splits, “do I understand how this happened?” is one of the two crucial interpretive questions you must ask yourself.  (The other—and sometime it’s the same question—is “did I look for this split to confirm an existing hypothesis or suspicion, or did I stumble on it while examining all of his splits?”)

The explanation (and here we shift from sabermetric mode to scouting mode, and it’s something that every sabermetrician needs to be able to do) begins with the psychological contrast between the “Big Papi” of legend and the Ortiz of April and May. From the ’04 post-season to the end of ’06 Ortiz may have been the most confident athlete you’ll ever have the pleasure to watch.  I’ve done studies which showed that his success in walk-off situations literally had millions-to-one (maybe billions-to-one) improbability. He not only knew he could hit, he knew they couldn’t get him out when it really counted.

This absolute confidence disappeared when he started suffering the health problems that come with ordinary aging. In ’07 and ’08 his clutch differential was actually negative, which was just being average plus bad luck.

Compare the guy who knew that no pitcher in the planet could get him out when the game was on the line to the guy who told the press “Papi sucks.” Ortiz suffered a complete collapse of confidence, complete self-doubt.

Now, the way this affects hitting is that it causes you to think about mechanics while you’re up there. That’s the last thing you want to do; it’s got to be what people call “muscle memory.” I suspect former Sox #1 prospect Lars Anderson has struggled this year in part because he’s so damn smart, and I suspect that the success of guys like Wade Boggs and Manny Ramirez is directly correlated to, shall we say, their unlikelihood of ever joining Mensa. I think it took Dwight Evans half his career to stop thinking too much while he was up there. (NB: I’m not talking about the “what’s he going to throw me next” thoughts between pitches, just whether the hitter can shut out conscious thoughts about swing mechanics.)

It’s important to note that there were a few weeks where we had persistent reports that Ortiz was having great BP but was still struggling in games. BP gives you a chance to work on mechanics and, having made an adjustment, get it out of your conscious mind, let it settle into muscle memory, and take a bunch of repetitions. Bringing that to games can be a big challenge. It’s the reason why slumps last as long as they do despite hitting coaches, video study, and extensive extra BP. It’s absolutely like the “don’t think about elephants” dilemma. It’s not just that you have to get past the stage where you’re actively thinking about mechanics, you have to get past the stage where you’re thinking that you shouldn’t think about your mechanics. That takes repetitions and confidence. You have to literally forget you’re slumping.  You can’t be trying really hard to relax.

To sum up:

Age and declining health – > increased likelihood of mechanics getting out of whack, at the purely physical level. Your knees hurt, you lessen the depth of your crouch, suddenly the swing is just a bit off.

Declining health -> loss of general confidence. You know you’re not physically the player you used to be.

Loss of confidence -> increased likelihood of thinking about swing mechanics while at the plate. Once the thought even crosses your mind that the swing might not be right, thinking about the swing while up there just makes things worse.

That’s how slumps start. And then the bad performance of the slump creates a further loss of self-confidence which leads to more thinking which leads to yet worse performance.

The reason why we can ascribe Ortiz’s epic slump to these psychological processes (writ much larger than usual) rather than to a fundamental loss of skills is obvious and trivial: his performance after the slump is over.  The numbers are arguably even more dramatic than the ones I noted at the beginning, because in the 11 games after the PED story broke, Ortiz hit .114 / .204 / .136 in 49 PA, and according to his own testimony he wasn’t sleeping at night.  Those 49 PA can actually be excluded by the same logic, leaving us with a “true maximum skill level” of something like .293 / .386 / .668, which is basically his season projections with a big power boost.

The reason why you don’t project him to hit like that the rest of the year is equally obvious and trivial: he is not immune to further slumps.  There is even a small probability that the next slump will be extended, like the first one, but that is mitigated by two factors: he is much more likely to fear that he has lost his skills when he’s slumping at the start of the season rather than in the middle, and the April and May slump was exacerbated by the pressure of not having hit his first home run of the year (from the date of his first homer on May 20 to the end of the slump on June 5, he showed real signs of life, with a huge increase in pulled line drive percentage.  This was precisely the period where he was reported to be having great BP.)

In terms of pure, peak, hitting skills, David Ortiz is probably 90% the hitter he used to be a few years ago.  He probably has something like a 500% higher probability of getting himself into a serious sustained slump, especially at the start of the season (April ’08 was also terrible).  The specter of these extended slumps diminishes his overall value, but they do not much affect our sense of what he’s likely to do in the short run.