Big Data Project: Start with a Question that You Want to Answer
A top-level view of our data project over a series of posts.
Welcome to the second post of a series on a big data project that will (Mary and I hope) provide clarity and insights on how to successfully complete a big data initiative. Now, just in case you’ve forgotten the first two rules in our Big Data Playbook, I am going to repeat them here because they play into our topic of the day which is all about “starting” your big data project:
Rule #1: Big Data IS NOT rocket science.
Yes, far too often those lucky internal folks tasked with managing a big data project fall into the trap of data science paralysis which is similar in thought to analysis paralysis. By this I mean that there are so many moving pieces to capture, so many technology decisions to make, so many skill positions that need to be filled, so many fill-in-the-blanks that need to get done that you never actually get started which leads me to our second rule:
Rule #2: Garbage in, garbage out.
This is an as old-as-the-digital-technology-hills maxim that still holds true. And it does not only apply to “clean” datasets or a deep understanding of your company’s processes and operations. Rather, think of it as a reminder that before you do anything you need to understand what it is you are trying to accomplish. If you don’t spend time upfront thinking long and hard about what you are trying to do, you will end up with garbage.
Before I share our third rule and start talking about a specific project we are working on, Mary and I decided not to provide you with a complete overview in this post. Instead, over a series of posts we will walk you thorugh how a project gets underway in stages—sort of let you experience how a project plan takes shape and the decisions that need to be made at the beginning of the planning stage (and how the plan itself changes as we gather more information). And now, without further ado, it’s time for our third rule:
Rule #3: Start with a question that you want an answer to.
More times than I can count, I’ve sat in a conference room with the company’s data team and asked them this simple question: What question do you need answered? Sounds simple right? Not so fast. What I usually get in return is a litany of responses that fall into several different categories:
- So much data. “We’ve got all this internal and external data—surely there’s some value there.”
- Everybody’s doing it. “We’ve been reading all these articles about how Netflix/Amazon/Disney/Walmart/Target are successfully using big data to drive their busienss models and we want to do the same.”
- My boss wants to do it. “The CEO/CFO/CMO/CIO has tasked us with leveraging our big data to derive business insights.”
Now while all these responses could start a big data conversation, none of them has a specific question that needs to be answered which is something I spent a great deal of time writing about in our series on the data science team. However, Harvard Business Review is much more succinct in identifying the 5 key steps successful companies undertake to derive the most value from their big data initiatives and, not surprisingly, number 1 is:
Identify the Five Most Critical Business Questions to Answer. Companies must start somewhere. We see successful companies beginning by defining a set of the most critical business questions that require an answer. The initial set of questions should be limited and manageable. By addressing a small subset of critical questions, executives can demonstrate an initial set of quick wins that provide business value and enable additional funding to ask additional business questions. Starting small, and building from that foundation, is critical to ensuring successful business adoption.
While HBR talks about five questions, I encourage you to start with one. This limits the scope of your project to a manageable level which can then be used as a testbed for future big data projects and the implementation of big data best practices. In fact, I am now going to amend our third rule to:
Rule #3: Start with ONE question that you want an answer to.
When Dr. Donnel Briley, a Professor of Marketing at the University of Sydney Business School, approached PatternBuilders with a request to collaborate on a research project, he kicked off our initial meeting with this observation and question:
“It is pretty well understood that a company’s stock price is impacted by news from primary sources, and while others have done some research on the impact of social media, no one has separated out the impact that social media comments have and whether this impact partially mediates the influence of primary news. Is a company’s stock price influenced by both, and can we isolate and study the impact of those distinct sources on that stock price?”
I can hear some of you saying that this is more than one question but I would argue that Dr. Briley’s specific focus was on being able to categorize comments and analyzing the impact of each category on a company’s stock price. Which leads me to our fourth rule:
Rule #4: Ask questions about the question.
In other words, until you are absolutely certain that you fully understand the question you want to answer, ask questions about “the question.” I am not going to subject you to a transcript of my “back and forth” with Dr. Briley (as it would be quite long) but you will notice in the quote above that I underlined keys phrases that I wanted to explore further. The following table provides you with more context regarding the underlined parts of the question.
Perhaps you will now understand why the first rule in our playbook states quite clearly that data science is not rocket science. So far, no rocket science has been involved—rather, the application of stage 1 of any project which is the discovery phase. Or as Mary likes to put it: “What information do we need to know in order to define the project?” Well, in order to define the project, first we need to understand the question we want to answer!
Next up in our series: It’s Time to Make a Plan. And yes, as we discover more about different aspects of the project, that plan will change (as any decent planner will tell you) because it’s organic (more on that in our next post).