A Big Data Showdown: How many V’s do we really need? Three!
Marilyn Craig (Managing Director of Insight Voices, frequent guest blogger, marketing colleague, and analytics guru) and I have been watching the big data “V” pile-on with a bit of bemusement lately. We started with the classic 3 V’s, codified by Doug Laney, a META Group and now Gartner analyst, in early 2001 (yes, that’s correct, 2001). Doug puts it this way:
“In the late 1990s, while a META Group analyst (Note: META is now part of Gartner), it was becoming evident that our clients increasingly were encumbered by their data assets. While many pundits were talking about, many clients were lamenting, and many vendors were seizing the opportunity of these fast-growing data stores, I also realized that something else was going on. Sea changes in the speed at which data was flowing mainly due to electronic commerce, along with the increasing breadth of data sources, structures and formats due to the post Y2K-ERP application boom were as or more challenging to data management teams than was the increasing quantity of data.”
Doug worked with clients on these issues as well as spoke about them at industry conferences. He then wrote a research note (February 2001) entitled “3-D Data Management: Controlling Data Volume, Velocity and Variety” which is available in its entirety here (pdf too).
Now, believe it or not, Marilyn and I were both working in high tech at the time and were both trying to manage the chaos that the 3 V’s wrought upon our respective companies. Enter “big data” (you know, the exponential growth of the 3 V’s due to eCommerce, social media, mobile, the Internet of Things, etc.) and many are now contending that it is living up to Gartner’s famous (or infamous, depending upon your point of view) Hype Cycle. Certainly, Ed Dumbill raises that very issue in a recent Forbes article. He argues that although the term is imprecise:
“As one of the progenitors of the term ‘big data’, I’m happy with its imprecision. It wasn’t coined for marketing purposes, but as a descriptor for an emerging technology phenomenon. In the broader business consciousness, at the end of 2012, ‘big data’ really means ‘smart use of data’.”
Dumbill goes on to call out big data vendors on their marketing hyperbole:
“What is an unhelpful turn-off is software vendors competing on their own interpretation of the phrase. ‘You’ve not got real big data until you’ve got our software!’, ‘We concentrate on Verisimilitude—the 8th ‘V’ of big data that everyone else has missed!’ I’ve a larger essay to write about this, but the supply-side nature of the big data industry is a bit disappointing, a constant flow of product that doesn’t really meet needs from a business perspective. In that sense, the big data marketing push is pernicious—the same old things just rebrand themselves into the new trend.”
As a marketer it pains me to say this, but Dumbill has a point. There’s been quite a bit of heavy-handed messaging going on about all things to do with big data as Marilyn and I pointed out in a series of blog posts (Part 1, Part 2, and Part 3) that sought to bring some pragmatism to the ever looming, hyperbolic data scientist sexiness and shortage (pushed by the media, analysts, and vendors) and the ongoing confusion between the discipline of data science and the specific roles and functions within it (aka the data scientist is a role and data science is a discipline—they are not the same).
This is the same case that can be made for all the V’s that are being pushed as add-ons to the original 3 V’s. Let’s start with Value as Stephen Swoyer sums up here:
“The industry seems to have settled on 3 Vs — volume, variety, and velocity — to describe the big data problem. What’s missing? Let’s start with another V: value.”
Swoyer points out that the 3 V’s have been around for more than a decade (thanks to Doug Laney and company) and that this does not accurately describe the big data problem:
“There’s a disconnect here, however. In practice, big data is hyped on the basis of its real or imagined outputs — e.g., for the breathtaking possibilities of big data analytics; the gleaning or unearthing of dramatic new insights: the intelligibility, coherence, and sensibility that we believe to be embedded in the dizzying volumes, varieties, and velocities of our data. A focus on volumes, varieties, and velocities, some say, misses the point.”
Like the other V’s, value has also been around for a long time. Throughout my career, I have asked the classic value question thousands of times: What is the return on my investment (ROI) for this program/product/software application/fill-in-the-blank? Value can be thought of as an input (as in why should I invest in this?) and an output (how do I measure the success of my investment?).
“… big data elements have the requisite level of veracity (or integrity). In other words, specific controls must be put in place to ensure that the integrity of the data is not impugned. Otherwise, any subsequent usage (particularly for a legal or regulatory proceeding, like e-discovery) may be unnecessarily compromised.”
Again, another V (albeit with a different name) that has been around since we had enterprise software applications. This aligns with the classic idea of garbage in, garbage out (or GIGO)—you need to ensure that your data is properly cleansed (the input part) or you will get unreliable results (the output part). Marilyn and I are very familiar with GIGO as we have spent thousands of hours “scrubbing” data to ensure that it was clean before we began any analysis. Unfortunately, we have also spent thousands of hours trying to figure out why our analysis was “inaccurate” and the culprit usually was “dirty” data.
Which brings me back to Dumbill’s sarcastic turn on an 8th V. Folks, there are 3 Vs: Volume, Velocity, and Variety. This other stuff that is entering into the discussion is about classic data and process governance and management. Once again, we are mixing up apples and oranges and saying that they are all oranges. Worse, we are doing prospects and customers a tremendous disservice by implying that this is a new paradigm.
As a big data application provider, we (PatternBuilders) try to stay away from the hyperbole and focus on how we solve the problems associated with big data analytics for companies and organizations of all sizes. We do talk about the 3 V’s because you need to understand how our solutions handle those data dimensions. And we do talk about data governance and management, privacy, security, the cloud, streaming versus batch analytics and a host of other topics (just select a tag from our blog—there are multiple posts on all these topics) because you need to understand how our solutions handle them as well. But we are not in the business of marketing messages instead of products—we’ll leave that up to all those other vendors.
As you can see, I am a bit riled up about the V debate—as is Marilyn. So we are taking this one head on. In upcoming posts, Marilyn and I will break out a big data project and look at the inputs (data, goals, etc.), processes (technology and applications that address the 3 V’s and other important considerations), and outputs (how to measure success). It’s time to get beyond the hype and focus instead on what it really takes to get a big data project off the ground!