Data Science: What the World Needs is Answers, Not Just Insights Part 2 (of 3)
By Marilyn Craig, Managing Director, Insight Voices
As you may or may not know, we are in the midst of a 3-part series on data science, covering roles, skills, etc.—generally what you should think about as well as what’s not as important (no matter what the latest articles say!). For Part 2, we have a guest poster—Marilyn Craig of Insight Voices. Marilyn is what I like to call a “classic quant.” She has been at the forefront of big data and data science before most people knew these terms (and spaces) existed and has been my go-to person whenever I had an analytics question (see title) that I needed an answer to. In this post, Marilyn looks at insights and makes the case for why we should all care far more about answers. Take it away Marilyn!
Here’s an interesting question for this new world order of Big Data Analytics: what’s an Insight and what’s an Answer? Sometimes they are the same, sometimes not. An insight is a piece of information or understanding. It may or may not be useful. It may or may not help your business improve, solve world hunger, or even make sense. An answer is always useful. It is the result of asking a question. And the best kinds of answers are those that solve the questions that you really care about.
A lot of the excitement about Big Data is the “insights” we can get now where we couldn’t before (often due to the quality of our tools). But just having someone wander through a ton ‘o data looking for interesting “insights” is potentially a huge waste of time. Because an insight is not necessarily the answer to a question. There have been zillions of questions over the last couple of decades that I’ve never been able to answer for the businesses I’ve worked on. The insights I really want are the answers to those questions.
That’s why I completely agree with Mary’s assessment of where we should start looking for “data science” resources – inside our own organizations. Because those folks are much more likely to know the important questions that need answers. The risk in bringing in a bright, new, shiny Data Scientist from outside is that they might spend time digging through your data looking for insights that aren’t really important. Now, I’m not saying that all data scientists are undisciplined practitioners that don’t listen to the questions and hypotheses presented by business decision makers. Good DS’s know to ask the right questions first and then approach the data and analysis. My point is this: asking the right questions is the most important part of the data science process—this has always been the case and will always be the case. Just wandering around in the data may prove useful, but finding answers to the wrong questions or unhelpful questions is a waste of time and money. (This is also why I shudder when vendors come in and try convince me that the petabytes/second that they can process should be the most important thing to me – obviously, they’ve never walked in my shoes!)
A good example of asking the right question (or wrong, depending on your reaction to the “creepy quotient”) is the Target pregnancy prediction kerfuffle. On one hand the marketers at Target were absolutely asking the right question: How can I communicate with new parents before anyone else does to get a jump on selling them appropriate products? However, any experienced marketer (or data scientist with the appropriate mindset) who was focused on the customer experience would have said, “Stop!” when presented with the outcome of the first round of research. Target learned very quickly that is it not enough to just reach these prospects first. They need to know how to reach them first without freaking them out!! You have to have the right customer-focused people involved in your analytics process, asking the right questions, AND thinking through how you are going to use the answer.
I will guarantee you that you have individuals (not just statisticians or analysts, but marketers, sales people, accountants) in your organizations that are perfectly capable and very willing to fulfill the data scientist role given that you provide them with a toolset that is designed for them and not some über geek programmer. (No offense to the geeks out there as I am married to one which makes for some very interesting conversations about things like bloom filters.). These folks have the passion for figuring out what drives your business and have been repeatedly frustrated by the inability of the tools available to them (traditional BI and Big Data 1.0 tools being the worst) to help them get the answers they seek. Give them the right tools, set them loose, and Eureka! You will have gained two important objectives: answers about your business and employee satisfaction in your ranks.
This is what I know: Data science is not only the purview of specialists. Listen, Math/Statistics can be learned (and in the big data age, all of us should become better at both disciplines) but very, very few of us will need to get PhD’s. A company is always best served by first enhancing the skills of folks that already understand their business and then supplementing them with “specialists.”
I’ve been in and around the world of data for almost 25 years. IMHO, big data is only kind of new. Many industries, like financial services, retail, medicine, and logistics, have been generating “big data” for forever. And those of us working in those businesses as marketers, product developers, financial analysts, and decision makers have always been frustrated by not being able to harness all that data to drive our businesses, not to mention the dreams that folks in other industries had. We have always been grappling with these two issues:
- We are saddled with tools that don’t respect our time. You can’t do big data with Excel but you can do amazingly complex analysis on small data with it. So don’t tell me I have to become an advanced java programmer to solve my big data problems (Hadoop vendors are you listening?).
- We didn’t collect and store the data we had access to. To make our IT partners happy, we aggregated our data within an inch of its life, thereby losing a lot of interesting variation that we needed. Thankfully, storage costs are now within a normal person’s (i.e. company’s) budget.
As Cosma Shalzi, my favorite statistics professor, puts it: Our theories (and desires for answers) ran way ahead of what we were able to do with the data and tools available. He goes on to point out that statisticians have always had the skills now lauded as what is needed for a data scientist – statistics, computer science, data visualization, and the social sciences.
We are at a crossroads. Today, for many companies, the data is being collected and with Big Data 2.0 companies (like PatternBuilders) we finally have the tools we need to utilize it. But the discipline of statistics is not new, and there have always been talented, articulate practitioners willing, and certainly able, to tackle any analysis we could throw at them. As we move forward into the big data age, keep this in mind: focus on the answers, not just the “insights!”
Do you agree or disagree? Let me know in the comments! Next up: Part 3 of 3—Mary’s take on the data science team and some other roles.