Big Data Project: Let’s Start at the Very Beginning—The Big Data Playbook
In my last post, I wrote about the three V’s of big data and why there are only three. There has been a messaging pile-on that seems to be happening in the big data space that even I, long-time marketer, find disconcerting. So, over the course of a number of posts, my colleague, Marilyn Craig, and I are going to de-mystify a big data project, taking apart each stage of a real big data initiative as if it were a release post-mortem. We will be talking about roles and responsibilities, data governance, project and process management, what went right, what went wrong, what we should have done differently. Except in this case, it will not be after the fact but rather a stage-by-stage review as we work on a real-world project. For your sanity and ours, we have created a special category, Big Data Project, as well as a tag with the same name. If you search on either, you will see all posts related to the project. Additionally, all posts about the project will start with Big Data Project in the title. Who knows? Maybe when we’re done, we’ll write a book (knowing what I know now about writing a book, I can’t believe I just said that)!
We’ll talk more about the project in the next post but first I wanted to take a look at a big data failure that anyone involved in a major enterprise application deployment could have seen coming and is Rule #1 in our big data playbook:
Rule #1: Big Data IS NOT rocket science.
This may be sacrilege to some of the big data pundits out there but many of the big data failures can be directly linked to the planning stage:
- You’ve come up with some great goals and objectives about what you want to accomplish—or as we like to put it, you’ve come up with great questions that you’d like to answer.
- You’ve decided on the technology platform and all the accompanying bells and whistles in the form of applications and toolsets.
- You know how large your data sets are and where they’re coming from.
In essence, you’ve dealt with all the complicated issues, come up with a great plan, are confident of your success and you’re feeling pretty good about the pending return on your big data investment. This is when I usually ask, “Who’s on your project team?” And I ask this question because most businesses and organizations treat big data projects like rocket science projects—the people on these projects are very, very smart, they are your best analysts, your best programmers, your best technologists (see our series on big data talent, Part 1, Part 2, and Part 3). What’s missing from this group? Business users—you know, the people responsible for the everyday running of operations, the folks who know the customer best, the folks closest to the data, the folks who actually know what works and what really doesn’t. If some of these people aren’t on your team, you are quickly going to run into Rule #2 in our playbook:
Rule #2: Garbage in, garbage out.
And in this case, garbage not only applies to dirty data (see our post on the Big Data Showdown) but to an understanding of how business processes REALLY work in a company. I am sure that all of us, one time or another, have sat in a meeting where our bosses talk about what we do and how we do it. And as they’re talking, we know that it really does not work that way. Almost every department in a company has one or more people who are “the keepers of institutional knowledge.” These are the folks who know how things really work and if they are not represented on your big data project team, it is very likely that your project will fail.
Certainly, California’s failed 911 big data upgrade project is an example of good intentions versus a real understanding of how “things” work. In 2009, the state Emergency Medical Services Authority decided that it needed to centralize millions of emergency medical response reports. They wanted to capture everything that happened between a call into 911 and the victim actually reaching a hospital. But they ran into a host of problems, all centered around the data collection process:
- Information reported inconsistently. For example, when do you start the clock for victim response times. Is is the moment the 911 call is answered? Or when the rescue crew is dispatched?
- Incomplete data sets. Almost half of the state’s emergency medical agencies have not contributed data to the system (some are still running on paper whiles other are using outmoded technology that is not compatible with the state’s system).
- Voluntary participation. Local agencies don’t have to participate if they don’t have available funds to upgrade or replace their systems in order to use the new state system.
The upshot of all of this:
“…three years and about $1.6 million later, the voluntary project is plagued by lack of participation, money problems and inconsistent information. The root of the problem is that some agencies rely on outmoded, ‘home-brewed’ computer systems, said Tom McGinnis, the state official overseeing the project… ‘We can’t compare apples to apples. We compare apples to oranges and peaches,’ McGinnis said.”
I think that we can all agree that those problems are not about rocket science (Rule #1) but show a lack of understanding about how things “really” work. And of course, if you’re data collection process is flawed and incomplete, then you are dealing with garbage coming into your system which means that only garbage can come out of it (Rule #2).
Up next in our series, a general overview of our big data project, the questions we want to answer, and what you need to do first—and yes, more rules for our Big Data Playbook!