Evaluating Pitcher Talent

Dave · August 29, 2006 at 7:45 am · Filed Under Mariners 

The discussion of what statistics are useful in evaluating a pitcher came up in the game thread, again, last night. This issue comes up quite a bit around here, since I use a lot of non-conventional numbers, and new readers often don’t know what they mean, where to find them, or why they should bother. So, last night, I decided to write something of a primer on why I like to use the statistics that I use, what their usefulness is, and why I don’t really care about things like ERA, WHIP, or batting average against.

All the stats referenced, by the way, can be found at the Hardball Times, and detailed game logs using these numbers can be found at Fangraphs, which are two of the most awesome sites out there right now.

The mainstream tools for evaluating a pitcher’s success and abilities are won-loss record and earned run average, with fantasy baseball players often add WHIP (walks+hits per inning pitched) to the discussion, since it’s one of their categories. These statistics attempt to sum up pitcher effectiveness in total, giving an overview of the totality of his performance with just a few numbers.

I, personally, think they fail in that regard. ERA and WHIP group together a large string of individual events made by multiple players, making it extremely tough to separate out the credit for the pitcher, hitter, or defense. WHIP and ERA tell you there is no difference in an inning where three batters drive the ball to the fence and end up with three long flyouts or an inning where a pitcher strikes out the side. Clearly, they’re drastically different, but WHIP and ERA fail to account for the actual contributions of the pitcher. So, if the goal is to actually find out how well a pitcher threw, why not look at a micro level, instead of a macro level? That’s what I prefer to do.

For instance, what are the possible events in an at-bat that can occur?

A pitch can be thrown for a ball.
A pitch can be thrown for a strike.
A pitch can be swung at and missed.
The ball can be hit on the ground.
The ball can be hit on a line.
The ball can be hit in the air.

On any given pitch, those are the options. There are a few sub-categories under those options (outfield fly or infield fly, bunt grounder or normal grounder, etc…), but we can sum up every possible outcome of each pitch with those six options. Those outcomes might lead to wildly different events, but we’ll get to that later.

Which of these six outcomes are positive for the pitcher? Called strike, swinging strike, and groundball.

Which of these six outcomes are positive for the hitter? Called ball, line drive, and flyball.

If we can effectively determine which pitchers maximize their value in the “good outcomes” and minimize their harm in the “bad outcomes”, we can get a pretty firm grasp on who has pitching talent and who does not. Thankfully, Dave Studeman wrote a fantastic article called “Whats A Batted Ball Worth” in the 2006 Hardball Times Annual, and it includes the following run value chart. This chart will give a context to those good and bad outcome categories:

Line Drive: .356 – in other words, an average line drive is worth 35% of one run.
HBP: .342
Non-Intentional Walk: .315
Intentional Walk: .176
Outfield Fly: .035
Groundball: -.101
Bunts: -.103
Infield Fly: -.243
Strikeout: -.287

These run values were taken from real life play-by-play data, so this is an actual representation of events, not some theoretic formula. As you can see, a hit-by-pitch is a better event for the offense than a walk, even though they both simply put the batter on first base. Why? Because a hit-by-pitch is pretty much random, and can occur both at times when it is a critical situation and times when it isn’t. A walk, conversely, is far more likely to put a runner on first base in a run scoring situation, lowering it’s run value compared to the HBP.

As you can see, the difference between an outfield fly and a groundball isn’t huge, but its real, and it adds up over the course of the season. This is why, all things equal, a groundball pitcher is better than a flyball pitcher. All things are almost never equal, and flyball pitchers tend to have higher strikeout rates than groundball pitchers, but the theoretical best pitcher alive would be a groundball pitcher, not a flyball pitcher.

Also, bunting = bad.

So, now that we have some understanding of the possible outcomes and their relative value, instead of using statistics like ERA or WHIP that leave out critical information, our best bet is to try to quantify the six potential outcomes, and the events that result from those outcomes as best as we can.

BB% (Walks per Total Batters Faced) does a nice job evaluating how often a pitcher throws the ball in the strike zone. The average walkrate is 8% for a major league pitcher, though the DH makes the AL a higher walk league than the NL. Anything under 5% is tremendous, and anything over 11% is a problem. The Hardball Times publishes BB% and K% in a slightly different manner, calling it BB/G or K/G to make it scale more like the per nine innings numbers people are used to seeing. BB/G (and BB%, its derivitive) is more effective than BB/9 because it accounts for the actual amount of batters faced rather than using a proxy like innings pitched. It’s just more accurate.

K% (Strikeouts per Total Batters Faced) does a decent job evaluating how often a pitcher induces swings and misses or called strikes. 16% is league average, with 20% being terrific and 12% being a problem.

GB% (Groundballs per Balls In Play) does a very good job of telling us how often a pitcher induces a groundball. 42% is league average, and anything over 50% is terrific, with the best sinkerball pitchers posting rates in the 60-65% range, while anything below 35% can be a problem if its not offset with a high strikeout rate.

LD% (Line Drives per Balls In Play) does a very good job of telling us how often a pitcher gives up line drives. 20% is league average, 17% is good, and 23% is a serious problem. Because of the way line drives have been scored by Baseball Info Solutions the past couple of years, this number is hard to use for year to year analysis, and right now, it’s not a very effective tool. We don’t use it very often.

FB% (Flyballs per Balls In Play) does a very good job of telling us how often a pitcher gives up flyballs that leave the infield, and is basically the corollary to GB%. 36% is league average, while 32% is good and 40% could be a problem.

So we have five statistics that cover each of the six possible outcomes pretty effectively. Not perfect, but they do a credible job. They aren’t park adjusted (and yes, parks have an effect on things you might not expect, such as walk rates, strikeout rates, and groundball rates), but they’re pretty close for the majority of cases.

Thanks to the work of guys like Voros McCracken, Tom Tippett, Keith Woolner, and Dave Studeman, we also now know that the result of a particular ball in play is also not very consistent, and is due more to the actions of the hitter than the pitcher. So, when evaluating pitcher’s talent, we need to adjust for outlier type performances on converting outs on balls in play. If a pitcher has a lot of flyballs that are being caught on the warning track, or groundballs that are going right to infielders, that’s not likely to continue, and we shouldn’t assume that it will.

Not all balls in play are created equal, however, and so when we’re adjusting for outs on balls in play, we need to make sure we’re adjusting back to the type of ball in play the pitcher is giving up, since we’ve noted that they certainly do have control over their groundball or flyball tendencies.

An outfield fly becomes an out 77.7% of the time. A groundball becomes an out 74.8% of the time. A line drive becomes an out only 26.4% of the time, which is why it’s the worst possible outcome for a pitcher. An infield fly becomes an out 98.8% of the time. Because of this, flyball pitchers will post more outs on balls in play than groundball pitchers, and it won’t be a fluke. However, the non-outs that flyball pitchers give up are more harmful, and thus, the quality of the hits against flyball pitchers outweighs the relative lack of quantity. This is shown in the run value chart above, where an average groundball is a positive event for the pitcher and the outfield flyball is not.

Infield flies are automatic outs, essentially, so it’s best to separate them from outfield flies for analysis like this. Since evidence has shown that pitchers don’t have a strong year to year control over their infield fly percentage, however, when evaluating true talent levels, it’s best to assume something like a normal infield fly percentage for a pitcher, rather than the one he’s posting at the moment.

Two other big factors that we’ve identified that can have a great effect on run scoring are home run rates and stranding runners. In general, flyball pitchers give up more home runs than groundball pitchers, which is why a groundball is a positive event for the pitcher and a flyball is not.

We’ve seen very little evidence that major league pitchers have significant control over how often their flyballs go over the wall, so occassionally you’ll see a wild swing in performance that is not indicative of a players true talent level, simply because a pitcher is having more or less flyballs go over the wall than should be expected. Felix Hernandez in April and May of this year was a great example of a guy who allowed a lot of home runs per flyball, and that rate has steadily dropped as the season wore on. The average major league pitcher gives up home runs in about 11-12% of his outfield flies – significant variation from that is probably not an indicator of talent for a major league quality pitcher.

Stranding runners is also a big key, and a bit of a different animal. Naturally, good pitchers will strand more runners than bad pitchers. Since they’re good pitchers, they’re more likely to create an out in any situation, including with men on base, than if they weren’t a good pitcher. While the league average Left on Base Percentage is 70%, the bad pitchers often live in the low-to-mid-60% range, and the good pitchers live in the mid-to-high-70% range.

However, it’s not uncommon for bad pitchers to have flukily high strand rates that significantly lower than ERAs, and vice versa. Jarrod Washburn’s 2005 ERA was almost completely due to his high strand rate, as he posted the highest LOB% in the American League. That hasn’t held true in 2006, and we’ve seen his ERA rise a full run because of it. So, when you find a pitcher who is stranding runners at an unexpected rate when compared to his talent derived by BB%, K%, and GB%, it is prudent to expect that rate to regress back towards a more normal rate in the future.

So, looking at this breakdown, we see value in BB%, K%, GB%, HR/FB%, and LOB%. Those five statistics will tell you almost everything you need to know about what goes into why a pitcher is performing like he is, and all these statistics are easily available at The Hardball Times. There’s nothing that ERA or WHIP will tell you that those component statistics do not, but ERA and WHIP certainly leave a lot of the underlying information out.

However, it is understandable that people want one number that sums up pitcher performance. If you really prefer to not look through the prism of BB/K/GB/HR-FB/LOB percentages, you can always use FIP, or Fielding Indpendent Pitching (which I often call Fielding Independent ERA, since its scaled to look like ERA), which gives you an expected ERA for a pitcher based on his walk, strikeout, and home run rates. FIP isn’t perfect, either – it assumes that HR/FB is indeed a skill, and it assumes that all pitchers are equal at stranding runners, neither of which are true, but it’s better than ERA for summing up a pitcher’s total contributions to run prevention.

If you want to get really crazy, you can even use Expected FIP, or xFIP, which substitutes the league average home run per fly ball rate for the pitcher’s actual home run rate, giving a more accurate picture of how we’ll expect a pitcher to perform going forward as his HR/FB rate regresses towards the mean.

As I said, both FIP and xFIP have flaws, especially when it comes to evaluating relief pitchers, but if you’re insistent on using one number to sum up a pitcher’s contribution to run prevention, those would be your best bet.

In this age of wonderful information, there’s just no reason to use ERA and WHIP for serious analysis of a pitcher’s ability. We have better tools at our disposal. We’re doing ourselves an injustice if we continue to lean on inferior information.

Tags:

Comments

91 Responses to “Evaluating Pitcher Talent”

  1. DoesntCompute on August 29th, 2006 7:58 am

    Very nice pitching analysis primer Dave. Has this been linked to the FAQ yet?

  2. strong silence on August 29th, 2006 8:09 am

    You should demand that future posters take and pass a test based on this essay.

    ERA and WHIP group together a large string of individual events made by multiple players, making it extremely tough to separate out the credit for the pitcher, hitter, or defense.

    Very well said.

  3. JI on August 29th, 2006 8:14 am

    That’s one hell of a post.

  4. pdb on August 29th, 2006 8:14 am

    Thanks for not only writing this up, but for also including what are normal, low, and high rates for each stat you write about. I find that’s usually the missing piece in a great many explanations of statistical analysis – most people don’t say “well, for stat X, a rate of Y is league average, Q is phenomenal, and L, not so hot”. It really helps the not-quite-numbers-savvy of us out here to make sense of what’s being explained.

    (and yes, parks have an effect on things you might not expect, such as walk rates, strikeout rates, and groundball rates),

    Just out of curiosity, how? If it’s a hugely detailed explanation, don’t sweat it here, but I’d be interested in knowing at some point.

    And, I have to be a pedant for a second – it’s *independent*, not *independant*. Doesn’t detract from the analysis, though, great job and I appreciate it…

  5. Dave on August 29th, 2006 8:20 am

    Just out of curiosity, how? If it’s a hugely detailed explanation, don’t sweat it here, but I’d be interested in knowing at some point.

    We don’t know for sure, but we have some decent guesses.

    Altitude – strikeout rates are consistently lower in Colorado, probably due to the thin air’s effect on breaking balls.

    Sightlines – If the hitters backdrop isn’t very effective, players can have a hard time picking up the ball out of the pitchers hands, and this will lead to lower BB and higher K rates.

    Mental Adjustments – I have no doubt that Safeco Field got into Mike Cameron’s head. He could hit the ball as hard as he possibly could, and it’d still be an out. That messed with his approach, and he just couldn’t figure out how to reinvent his swing to succeed at home. This is probably a bigger factor than we expect. I would imagine there are many players or pitchers who make adjustments in their swing or approach to account for a ballparks effects.

    Colorado and Florida have shown sustained significant effect on strikeout rates. Cleveland has showed a sustained positive effect on groundball rates. Safeco Field has induced more flyballs than expected. The differences are there, even if we don’t have concrete, quantifiable reasosns why.

  6. msb on August 29th, 2006 8:25 am

    #2– I’d fail. As I read, I can feel my brain cells pulling down the shades and shutting off the lights, not unlike their reaction to the introduction of mathematics into their little lives.

  7. The Ancient Mariner on August 29th, 2006 8:30 am

    Curiosity question. BB and K are taken as a percentage of total batters faced–why not do the same with GB, FB, and LD, rather than taking them as a percentage of a subset (balls in play)? I would think it might be more helpful to take them all as percentages of the same global set; that way, you could see at a glance the percentages for each outcome per plate appearance. At least, it would seem to me to be useful to be able to look at a pitcher (let’s call him Felix) and say, for example, that 15% of the plate appearances against him end in a walk, 20% in a strikeout, 40% end in a ground ball, 10% end in a line drive, and 15% end in a fly ball.

  8. robbbbbb on August 29th, 2006 8:31 am

    Great post, Dave. Thanks for the summary. I’d come to some of these conclusions by reading this site (among others) over the last few years, but having it laid out in one place like that is a great summation of the basic concepts.

    Are you willing to do a flipside post on hitters? One major point would be: “Grounders are, in general, bad for hitters. However there are specific hitters (think Ichiro, or to a lesser extent, YuBet) who benefit from above-average groundball rates.”

  9. Dave on August 29th, 2006 8:36 am

    Curiosity question. BB and K are taken as a percentage of total batters faced–why not do the same with GB, FB, and LD, rather than taking them as a percentage of a subset (balls in play)?

    That’s how I’d prefer to do it as well. The guys at THT don’t agree with me. So, since this is supposed to be a primer to help the average fan abandon WHIP and ERA, I didn’t want to encourage them to go to fangraphs, copy the game log into excel, and then create their own formulas. I’m the only one nerdy enough to do that.

    Are you willing to do a flipside post on hitters? One major point would be: “Grounders are, in general, bad for hitters. However there are specific hitters (think Ichiro, or to a lesser extent, YuBet) who benefit from above-average groundball rates.”

    Eventually, sure, but it’s not as necessary for hitters. OPS correlates to runs scored at .97. We don’t have any tool that evaluates pitcher effectiveness as well as OPS (or, its better versions, OPS+, EqA, or VORP) does for hitters. Since we have a summation tool for offense that is so effective, the need for breaking them down by ball-in-play type isn’t as necessary.

    It can still be interesting, however, and I may do something like that during the offseason. To note, though, groundballs aren’t as bad for hitters as you may think. Miguel Tejada, Derek Jeter, and Carl Crawford are all extreme groundball hitters.

  10. arbeck on August 29th, 2006 8:39 am

    Just out of curiosity, how? If it’s a hugely detailed explanation, don’t sweat it here, but I’d be interested in knowing at some point.

    I can’t give a total be all end all explanation, but I can give you some examples.

    One of the reasons Coors field has been such an offensive park is that the thin air causes less movement on pitches. When a pitch moves less, you are going to have fewer swings and misses and a lower SO%. Safeco originally had a batters eye that players hated. Certain parks have better or worse batters eyes, different amounts of shadow, etc. All of that is going to have an effect on the SO%.

    GB% is a little harder to quantify in my head, but if i were to guess, I’d say the same affects from above would tend to decrease the GB%. When the players see the ball better they hit the ball harder which would increase LD% and FB%. Also, certain parks tend to favor ground balls. A left handed hitter travelling to the Metrodome maybe more likely to try to hit the ball on the ground and scoot it through on the turf then try to deal with the outfield and the way it supresses left handed sock.

    BB% is going to be similar to SO%. Batters see the ball better they take more borderline pitches and get fooled less. They hit the ball harder, the pitcher throws more pitches. This increases fatigue which also increases BB%.

  11. arbeck on August 29th, 2006 8:40 am

    Damn, dave is quicker than me.

  12. gwangung on August 29th, 2006 8:41 am

    As you can see, a hit-by-pitch is a better event for the offense than a walk, even though they both simply put the batter on first base. Why? Because a hit-by-pitch correlates pretty well with “struggling pitcher”, and so more struggles are likely to follow.

    Something to remember…correlation is not causation. What we see here are not necessarily explanations or causes of events, but they’re big clues.

    And as Dave points out, they’re not park adjusted, so there’s still more analysis to be done.

    Mental Adjustments – I have no doubt that Safeco Field got into Mike Cameron’s head. He could hit the ball as hard as he possibly could, and it’d still be an out. That messed with his approach, and he just couldn’t figure out how to reinvent his swing to succeed at home. This is probably a bigger factor than we expect. I would imagine there are many players or pitchers who make adjustments in their swing or approach to account for a ballparks effects.

    Another salient point to remember.

  13. DCMariner on August 29th, 2006 8:41 am

    It’s posts like these that make me feel like I am stealing from this website…

  14. pdb on August 29th, 2006 8:45 am

    Thanks all for the clarifications. I feel smarter already.

  15. AQ on August 29th, 2006 8:46 am

    #13 – I hear that. I used to pray at the altar of ERA and WHIP until I started to regularly visit USSM about 2 years ago. This post will help me explain to others about why I believe that WHIP and ERA are not valuable indicators of pitching prowess. I knew that they weren’t good tools to use, but I couldn’t fully articulate why that was.

    Thanks Dave!

  16. Ralph Malph on August 29th, 2006 9:23 am

    Also, as far as park adjustments, don’t forget the amount of foul territory, which I think is a bigger factor than most people realize.

  17. Steve Nelson on August 29th, 2006 9:36 am

    #16: lso, as far as park adjustments, don’t forget the amount of foul territory, which I think is a bigger factor than most people realize.

    Foul territory also favors fly ball pitchers. Since a foul flyball ends an AB, that is part of the reason why a greater proportion of flyballs result in outs.

    I’ve sometimes considered that foul ball fly outs should be considered differently from balls in play. They’re not really a ball in play, because if the fielder fails to make a play, the batter does not reach base.

    If a pitcher has a skill of inducing weak fly balls that can produce outs without the possibility of a runner reaching base, I think that skill should be noted. In that respect, foul ball fly outs have seemed to me more akin to strikeouts than to balls in play.

  18. Dan W on August 29th, 2006 9:45 am

    Awesome stuff Dave. I have been completely cured of looking at ERA as a valuable indicator of a pitcher’s true ability.

    Would it be fair to say, however, that ERA IS a valid indicator of past ACTUAL performance, factoring in all variables including those he does not control?

  19. Dan W on August 29th, 2006 9:52 am

    I have to add that the timing of this discussion has been perfect, given Felix’s living, breathing demonstration of these principles last night. Living here in Halo-ville has not been much fun from a baseball standpoint lately, except for the perverse enjoyment of wathing obnoxious Redsox fans gloat over Angel misery last week.

  20. Dave on August 29th, 2006 9:53 am

    Would it be fair to say, however, that ERA IS a valid indicator of past ACTUAL performance, factoring in all variables including those he does not control?

    Maybe, if you have any faith in official scorers (I don’t), and you really don’t care about park effects or team defense. ERA does a decent job of telling you how well the the team prevented their opponents from scoring when that pitcher was on the mound, but it doesn’t really help you figure out how to spread the credit around, and it doesn’t adjust for context – mainly, the park being played in.

    So, even in that limited value, it still has significant flaws.

  21. Dan W on August 29th, 2006 9:53 am

    watching, not wathing

  22. arbeck on August 29th, 2006 9:54 am

    Steve Nelson,

    The problem is that foul ball flyouts are almost all the equivalent of infield pop out. If pitchers don’t have much skill in inducing infield pop ups, then why would they have skill in inducing foul ball fly outs? I agree that it should be taken into consideration when calculating park effects, but I don’t think it has much to do with pitcher ability.

  23. The Ancient Mariner on August 29th, 2006 9:55 am

    I’m not Dave, obviously, but from where I sit, I’d think it would be fair to say that ERA is a partially-valid descriptor of past actual results; after all, variables beyond a pitcher’s control are not a part of his performance, but rather elements which interact with his performance to affect the results of that performance. Also, ERA can only be partially valid given that it excludes runs which the scorer decides are unearned — these are also a part of said pitcher’s results, and they are a part for which he likely bears some responsibility even if the scorer labels them “unearned runs.”

  24. theberle on August 29th, 2006 10:03 am

    Dave: OPS correlates to runs scored at .97. We don’t have any tool that evaluates pitcher effectiveness as well as OPS.

    Dave, this article is awesome. One quibble about the above comment though: I’m certain ERA also correlates pretty well to runs allowed.

    One key to understanding why ERA is not a great tool for evaluating talent is its (weak) predictive value

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.