What The %$!# Is Data Science?

Portrait of Brian Anderson

Dr. Brian Anderson

It’s certainly a buzzword—perhaps THE buzzword—from the C-Suite to the front line. The word conjures visions of dashboards, big data, and whiz-kid programmers and statisticians turning ones and zeros into insights and ideas that will unleash new strategies, productivity gains, and innovation.

Except that’s not what data science is.

Yes, applied statistics and computer science is a ​part​ of data science, but it’s not the defining element. Data science is fundamentally about workflow—how an organization goes about collecting, wrangling, organizing, analyzing, and communicating data. This workflow is dynamic, proceeds in fits and starts, and when done well, yields insights that informs decision-making, but does not drive the decision itself.

What is the C-Suite’s Role in the Data Science Workflow?

“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” ~ John Tukey

John Tukey was a professor at Princeton University and a researcher at Bell Laboratories, he was also a pioneer in what we now refer to as data science. To Professor Tukey, the most important consideration in data analysis was understanding the question that you really want to answer with your data. This is the most important role for senior executives in the data science workflow—identifying, defining, and refining salient questions answerable with data.

Asking the right question underpins the entire data science workflow, and it is also the hardest part of the process. If we don’t start with the right question, any data and analyses coming later will have limited usefulness to the organization.

The irony is that the right question typically emerges only after we’ve collected and analyzed data. The reason is that the process of collecting, wrangling, analyzing and communicating data uncovers hidden assumptions, biases, and errors in the data and the data generation process. This is a very good thing, because it reinforces the importance of iterative nature of the data science workflow.

Consider analyzing why a product failed. At first glance, the easy question is to ask whether it was a pricing problem, or a lack of market understanding, or whether the product had the right feature. The problem with all of these problems is that they are both too vague and too nuanced to yield a specific answer, and no amount of “big data” would help to answer such a causally ambiguous question.

One mistake I often see with senior executive is the “one shot” notion of data collection and analyses. The mistake is thinking that we know the right question to ask and that a single round of data collection will yield the “correct” answer to inform a business decision. That is simply asking too much from an inherently noisy, stochastic, and dynamic process. Leveraging data and data science in the enterprise requires embracing uncertainty and variation, and a willingness to engage in continuous improvement, knowing that the firm will never truly reach the finish line.

Turning back to our product failure example, rather than asking whether our pricing strategy was the culprit, we might ask a simpler question, such as whether we employed a similar pricing strategy with a similar product in a similar market, and whether we observed a different result than with our failed product. If the answer is yes, then we might—tentatively—conclude that our pricing strategy may not be the culprit. Some data and some logic, along with an understanding of causal inference, helped guide us towards a more productive question.

A computer can’t replace your judgement

Data science is simply a tool for the executive. Like all tools, the analyses produced by a data science workflow can be more or less useful, depending on the knowledge, skills, and abilities of the decision-maker.
Despite rapid advancements in machine learning and artificial intelligence, computers are far—very far—from being able to replicate the decision-making ability of a human. By far, the most important resource in a data science workflow is the people—the creators and consumers of data. For executives, the leadership challenge of finding the right people, doing the right things, in the right places, ​and asking the right questions​ , remains in managing a data science workflow.

Ultimately, the most important thing for executives to remember when considering where data and data science can—and cannot—add value is that no computer and no amount of data replaces the judgement and critical thinking skills of the people in the organization.

Idea Bar Categories
Leadership Organizations Student Life