Friday, April 1, 2011

MIT Quant Picks Red Sox Over Yankees

As the Major League Baseball season begins, a Massachusetts Institute of Technology professor is predicting the Boston Red Sox will easily beat out divisional rival the New York Yankees in 2011, reports Reuters.

Professor Dimitris Bertsimas used quantitative models based on player analytics to predict the Red Sox will win 101 games this season, eight more than the Yankees.

"A player is a vector of numbers and from that, we can forecast overall team statistics," said Bertsimas, co-director of MIT's Operations Research Center and admitted Red Sox fan.

In a new paper "The Analytics Edge in Baseball," Bertsimas and doctoral student Allison O'Hair developed three models to determine the outcome of teams' 162-game seasons.

As a life long member of Red Sox Nation, I will not argue with his conclusion. But I do think that the absurdity of quantitative forecasting, where there are many variables that can not be known in advance, can be much better understood when it comes to baseball than when it comes to the economy or investments.

There is no way, for example, that Bertsimas knows who will get injured during the season and for how long. Yet,  when it comes to the economy and investments, people take specific forecasts much more seriously, when there are even more variables to deal with.

It's not that you can't make any forecasts about the economy, but they have to be made in terms of tendencies rather than exact quantitative numbers. The same way that in meteorology, right now, you might be able to say that there will be a tendency for snow to fall in January 2012 and not in July 2012, but this far off into the future you would not be able to give specific dates in January 2012 that it will snow.

1. As a Yankees fan, I respectfully disagree with the professor's conclusions but I guess that's why they play the games. :)

The statistical revolution in baseball worked because the outcomes of the game are few and fixed and the sample sizes are enormous.

This is where the planners go wrong and their "solutions" become more draconian. In a complex economy where the rules aren't fixed and human demands are infinite, time dependent and always dynamic it's the height of arrogance for central planners to assume that by building a regression model they can predict or even worse, "control" and "solve for" outcomes of human behavior with any degree of certainty.

2. This is the problem with statistics: past data only predict future performance if the underlying relationships being modeled can be presumed constant across time. And there's no way that assumption can be assumed to hold with respect to something like athletic performance.