Sunday, March 13, 2011

Mark Roulo on baseball statistics

re: the 32-parameter value-added model in NYC
One thing I like doing for discussions like this is to try and find a sports analogy. People tend no to get so hung up on non-PC conclusion in sports, but also often care a lot. This can lead to enlightenment.

So ... baseball:

(1) You can get a *VERY* good handle on how valuable a batter is with just two values, which can be combined into one number. You need on-base percentage (OBP) , which is, for every 100 times he comes to the plate, how often does he get on base? And you need slugging percentage (SPG) , which says how many bases he gets each time he has an at-bat. In both cases, more is better. And you can combine them with this: (OBP*3 + SPG)/2 to get a number that works the way most people who follow baseball can understand.

There ARE more sophisticated models, but they don't improve on this one by much. So ... two parameters, both of which are pretty easily understood.

(2) For pitchers it is a bit more complicated, but you can basically track strikeouts, walks and home runs and then put them together to get a single number. Again, one can improve (for starting pitchers, you also care about how "efficient" they are), but basically you'll get the right answer for ranking pitchers with just these three.

I get that teaching is more complicated. But 32 parameters is nuts.

