What is Big Data and Why Should You Care?
Data ownership, privacy, and security: we are all in this together.
There’s been a lot of marketing “noise” going on about the exponential growth of digital data (and yes, we are partially responsible for some of it) and there’s even a sound bite for it: big data (we did not coin the term but we have used it over and over again). Now, in my defense, I thought that this term made complete sense and was the “perfect” definition for the problem we are all facing. Of course, I forgot an important marketing axiom: test the term with folks outside of the industry to ensure that the meaning is not lost. You know, it’s always fun to spend time with friends and family, especially when they ask “what is it exactly that you do?” In the course of our conversation, I discovered that “big” means, well, big, which does not quite “do justice” to the challenges of the “big data” world that we all live in.
So, what exactly is big data and why should you care? Well, big data is really big—which is how the term big data came to be. For example, IDC’s research on the size of our “digital universe” revealed the following:
- In 2009, the digital universe grew 62% or almost 800,000 petabytes (for those of you “size-challenged folks,” each petabyte is a million gigabytes which translates into a stack of DVDs reaching from the earth to the moon and back).
- In 2010, it was projected to grow to 1.2 million (final counts are not in as of yet) petabytes.
- By 2020, it is projected to be 44 times as big as it was in 2009 (those DVDs would be stacked up halfway to Mars).
Note the use of the term “digital universe.” This refers to data that is stored in digital form. For example, all data that is stored in a computer is digital. So when we talk about the digital universe, we are in essence talking about all the data in the world that is stored on some sort of computer (big or small), most likely in some sort of database.
Now, when we (those of us in the big data, big analytics industry) talk about big data (and at PatternBuilders we talk about it a lot) we are not just talking about the size of our digital universe. In fact:
“Big Data,” claims GigaOm analyst Derrick Harris, is a bit of a misnomer; it’s really about data from different sources, including social networks and even cell phones. “It’s coming from sensors, it’s coming from computers, it’s coming from the Web,” he says.
Put another way, big data has many dimensions: size, number of sources, and even types of data. And it is all available in digitized format (the digital universe).
What role does analytics play in all of this? Well, alone, data is just data. But add the ability to aggregate and analyze it all and suddenly every business, organization, and government agency is paying attention. Why? Well, there’s “gold in them thar hills!” Consider this:
“Wal-Mart, a retail giant, handles more than 1m customer transactions every hour, feeding databases estimated at more than 2.5 petabytes—the equivalent of 167 times the books in America’s Library of Congress … Facebook … is home to 40 billion photos. And decoding the human genome involves analyzing 3 billion base pairs—which took ten years the first time it was done, in 2003, but can now be achieved in one week. All these examples tell the same story: that the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account.”
So, this all sounds great but how does any of this impact you in your daily lives (this is the why you should care part)? Well, you may be surprised by this, but 70% of the digital universe is actually generated by you through email, Facebook, Twitter, LinkedIn, uploaded digital photos and videos, the list goes on and on. As a consumer, we play a central role in the expanding digital universe and our personal data is now considered more valuable than gold.
This is where the law of unintended consequences comes into play (where an action may have unexpected benefits and drawbacks). In the old digital world (before data from sources such as social media, mobile, and sensors was readily available), digital data was much less ubiquitous. And, as outlined in a previous post, the technology didn’t exist to fully “use” it. In this new digital world where multiple data sources can be easily aggregated and analyzed (using platforms like PatternBuilders Analytic Framework) no matter the size, everyone wants our personal data. In a previous post I asked the following question: “Who owns you?” Today, ownership and privacy boundaries are being questioned and rightfully so.
This new digital world is radically different from what we’re used to and we (as an industry, business, organization, government agency, or consumer) all have a stake in figuring out what data ownership, privacy, and security means. We are all in this together. So stay tuned, as PatternBuilders is actively participating in this conversation and we will be sharing some “big” (yes, the pun is intended whether you think it’s funny or not!) news on our blog within the next few days.