As I type this, I'm sitting in seat 5E on a plane headed to Boston, home of the MIT Sloan Sports Analytics Conference. Before I arrive, though, and begin an event with a panel entitled "Revenge of the Nerds", I want to say a little bit about the role of stats in our modern sports world.
Clearly, the penetration of stats in sports is higher than ever in 2013: mainstream sports outlets are using statistics more than ever in their broadcasts, they're incredibly ubiquitous in sports arguments, and teams are even now using advanced stats in their own press releases, justifying why a trade was made.
Still, though, I suspect we're using stats incorrectly in sports fairly often, and furthermore, I believe doing so is dangerous: not only to the "Moneyball" movement, but to our understandings when we do so. To be sure, stats have their place, indeed, the entire SSAC conference with thousands of attendees is dedicated to them, but we also need to consider how they apply to reality. The alternative, stats without context, is still interesting but somewhat meaningless (basically, it's pure mathematics: it's a fascinating exercise, but esoteric regardless).
Fans especially, but also occasionally writers and analysts (including myself), are prone to making arguments like this: "Gordon Hayward has a higher PER than Marvin Williams, and therefore deserves to start." These arguments mean well, and this sort of thinking is really tempting: it uses advanced statistics, so it can appeal to some higher authority of expertise, making it more likely that it's right. Heck, the argument probably is right.
The problem is that the assertion is right about the wrong thing. We've managed to get the answer to one sort of question ("Does Hayward have a higher PER than Williams?") and not to the question we actually want to answer ("Who should be starting?"). This sort of thing happens all of the time: we look at evidence and then make a conclusion, rather than really concerning ourselves with the question we're asking. Who has a higher PER, as used in this example, has no real relevance to the question asked.
Instead, we need to put questions first. Remember the scientific method: forming a hypothesis comes first, then evidence gathering, then making a conclusion. By focusing on asking the questions first, we can help ourselves in two ways:
- It means that we're forced to figure out what questions, exactly, we want to answer. This leads us to think logically about the problems we encounter and what a solution would look like.
- It brings context to the statistic. Gordon Hayward's PER is a telling statistic, sure, but it certainly doesn't take into account the context around Hayward himself.
At the 2011 SSAC, Mark Cuban explained that he gets hundreds of emails from job-hunters, interested in plying their statistical tools for the Dallas Mavericks. Most, he said, contained a "statistical resume" of some sort: i.e., "look at this work, see how brilliant I am?" The problem, though, is that by-and-large these all contained new statistics or ranking systems, designed to answer the question of which players were truly better than others. These systems and statistics all answer questions that never actually come up when making decisions for an NBA team. These are questions like:
- Who's a better player between player X and player Y?
- Which player should win the MVP award?
- Which player is better defensively?
- Is team X better than team Y?
They're fun questions, and ones we can make a lot of progress on, but are simplistic: they ignore the other variables of the real world. Instead, teams ask themselves contextual questions, such as:
- Is this player a good fit with the players we currently have?
- What player skills, if acquired, would make this team better? Which players have those skills, and would they help more than the cost of acquiring them?
- Is player X a good investment for Y dollars? Does there exist a player Z who would be a better use of those funds?
- Which is more likely to help our team, player X or draft pick Y? Does draft pick Y's higher potential make it worthwhile to take a risk, even though player X is more likely to be better?
You'll notice that the second category were more likely to contain followup questions. That's because these questions recognize that there's a context outside the question worth exploring, namely, real life.
The statistical revolution, along with the rise of big data, means that it's incredibly easy to pull up tables of stats and compose a story based off of them. I'm certainly guilty of this. To be fair, these stats may lead us to ask informative and contextual questions (Zach Lowe's work is a great example of this). But taking a stat and composing a storyline around it says more about the statistic and the storyteller than it does the subject.
Indeed, a conference like Sloan means that we're going to be assaulted with stats galore; I'm incredibly excited for this bombardment. But it'll be best if we evaluate our evaluations based on the questions they answer, not only how they relate to sports statistics land. This weekend, when you're presented with a new stat, ranking, or way of thinking, consider "What kind of question does this answer?" before jumping to conclusions.