Monday, December 6, 2010

wOBA by Ball-Strike Count

I am a big fan of graphs and baseball. Fangraphs made me excited because putting complex data into reasonably easy to understand graphs helps open up sabermetrics to more fans. I'm a big fan of statistical analysis, but after a while, a table full of numbers just starts running together and stops making sense. That's what makes graphs such an effective tool.

I've dabbled in graphs myself. When people were creating the WAR graphs to compare hall of famers, I made a sample graph showing cumulative WAR by age on Tom Tango's Book Blog:


(click for a larger image)

Of course, soon after Fangraphs came out with a far better looking one, saving me the headache of figuring out how to automate it.

Here is my latest foray into the world of graphs, looking at wOBA by count:


(click for a larger image)

Let me explain the mess you see above. The horizontal X-axis shows the amount of pitches. The first pitch is all the way to the left, and a full count is all the way to the right. The vertical Y-axis shows the wOBA for all at-bats that go through that count.

Since all at-bats go through the first pitch, the average wOBA is .330 (league average). The higher on the graph, the more likely a player is going to do something good. As you can see, the best count for hitters is 3-0, and the worst count is 0-2. On 3-0 the average hitter is better than 2001-2002 Barry Bonds, and on 0-2 they're batting more like Adam Wainwright in 2010.

The size of the counts (by area) are the amount of times that count has happened. There were 936,848 PA in my sample, so the first pitch is the biggest. There were only 47,488 3-0 counts, so that is the smallest. Each of the counts is a graph in and of itself showing what happened at that count.

Blue is ball, red is strike, and gray means the play ended. As you can see, with 2 strikes the play ends with another, so there are only balls and ended at-bats.

So What?

 

I made this graph for my own use. It is a nice easy-reference tool to track what's happening each pitch. I can follow and see if a batter's chances went up or down, and how likely the at-bat is going to end on each pitch (really roughly).

Ideally I would make one for each team, so that you can get one for your own team and use it when you're watching games, or even for each player so that you can compare and contrast Vladimir Guerrero with Kevin Youkilis, or the Twins and the Yankees, etc. And there's a good chance that there are things that you can think of to use this graph for, so please let me know what they are in the comments.

References:

The graph was initially made in Excel to get the bubble positions and sizes, then imported into Adobe Illustrator to add the pie graphs, connecting lines, etc. Images are licensed under Creative Commons, Attribution, Non-Commercial License.

4 comments:

  1. Love the visual here. Out of curiosity, does the wOBA--say in a 3-2 count--converge to that point if the count just before the most recent pitch was either 2-2 or 3-1, or does it differ based on the path taken to 3-2?

    ReplyDelete
  2. Salb asked the same question on the Book Blog. I don't have information on which count preceded each count, so I have no real way of knowing. If you have that info and care to share, I'll do my best to represent it graphically.

    ReplyDelete
  3. Well, I have the granular data from Pitch F/X (07-10), but I'm not certain on the wOBA calculation. No big deal, just curious if you had that data as well. Like I said, love the graphics at this site.

    ReplyDelete
  4. If you go to my spreadsheet you can see my wOBA calculation:
    https://spreadsheets.google.com/ccc?key=tvtCT3UELSFHszUetzEckHQ&authkey=CP-r724&hl=en#gid=12

    It's essentially:
    wobaBB wobaHB woba1B woba2B woba3B wobaHR Scale
    0.71 0.74 0.89 1.26 1.58 2.02 1.01474297439276

    (.71*(BB-IBB)+.74*HBP+.89*1B+1.26*2B+1.58*3B+2.02*HR)/(PA-IBB)*1.01474297439276

    (The scale value makes the league wOBA equivalent to the league OBA -- the number would change depending on the year).

    For getting the data for each different "route" pitch count, I'd need a whole army of data. I'd need all 10 ways that you can get to a 3-2 count, the 6 ways to get to a 2-2 count, etc. And I'm not exactly sure what format would work the best. If you want to take a shot at it together, let me know, and I'll be happy to think it through with you. I just don't have any access to the data at the moment.

    ReplyDelete