IDC’s Latest Digital Data Study: A Deep Dive
The PatternBuilders team has been “crazy busy” the last couple of weeks! Terence and I continue to work on our Ebook (plugged again!), I am still working my way through the McKinsey study on big data (long but incredibly interesting), the team is putting the final touches on a very cool analytics demo (that’s all I am going to say right now but you’ll hear more about it over the next couple of weeks), and we are all testing the latest release of our platform. That being said, when the IDC paper on “Extracting Value from Chaos” came out, I set everything aside to read through it (and you should too).
Before I begin my deep dive into the paper, I must say something about IDC: when it comes to research, nobody does it better. As a marketer, I am often asked about the different analyst firms and where a company should “spend” its analyst budget. IDC is always on my “short list” because I find its research to be both broad and deep and filled with useful insights. (Full disclosure: we are not an IDC client but hope to be one in the future.)
First, let’s cover some of IDC’s key takeaways on our (yes, our) digital universe and then I am going to do a deeper dive into some areas:
- The digital universe continues to grow (no surprise) at a rapid pace. In 2011, IDC estimates it will exceed 1.8 zettabytes (that’s 1.8 trillion gigabytes), “growing by a factor of 9 in just five years.”
- 75% of the information is generated by individuals (that’s you and me) but enterprises have “some liability for 80% of information in the digital universe at some point.” In other words, our generated information is captured, collected, and used by all the businesses/apps/services we use as we wander around the digital universe.
- Metadata (data about the data containers plus data about the data) is growing at an even faster clip. In other words, the amount of information we create ourselves when we save and send documents, take and upload pictures, or download music from our favorite sites is “far less than the amount of information being created about them in the digital universe.”
- Less than half of the data that should be protected is protected. This is why data security is an action item for all of us. If enterprises are not sufficiently protecting our data, it is up to us to implement and follow safeguards (for more on this topic, see my many posts tagged with “data security.”)
This is how IDC sums it all up:
“So, like our physical universe, the digital universe is something to behold—1.8 trillion gigabytes in 500 quadrillion files—and more than doubling every two years. That’s nearly as many bits of information in the digital universe as stars in our physical universe.”
Yikes! Now, to put this in visual perspective for you, check out EMC’s infographic which pretty much “says it all.”
A Big Data Definition for the Ages
There has been a lot of big data imagery thrown around (and I am as guilty of doing this as everyone else) focusing on surfing big data waves or riding the big data tsunami or fill-in-the-blank to try and give people a sense of the scale of the issue. Thankfully, IDC has decided it’s time to formalize a definition for big data that focuses on far more than the size:
“Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.”
Please note that the underlines are IDC’s and not my own but they certainly do crystallize what any discussion of big data must include: the big data tools, while new, must be budget-friendly, capable of easily extracting insights from very large data sets but also enable very fast (the velocity part) capture and analysis. At PatternBuilders, we often talk about being able to capture large data sets as well as deal with the speed, or velocity, at which the data is captured and then analyzed (see our posts on the importance of streaming analytics and building a streaming analytics engine and Terence’s next post that addresses the velocity issue in depth). Put simply: big data is an ecosystem that provides the tools and platforms that deal with every aspect of the collection, storage, and use of large data sets.
Big Data is Every Where and Everything
It is often difficult, even for me (as a card-carrying member of the big data industry), to get one’s arms around all the big data source possibilities but IDC does an excellent job of reminding us that it is a “horizontal cross-section of the digital universe and can include transactional data, warehoused data, metadata, and other data residing in ridiculously large files.” New growth segments include media/entertainment, health care, and video surveillance. Social media sites, like Facebook, Foursquare, and Twitter, have continuous streams of data from their users. Then there is consumption information (how we use devices to get at other stuff) to deal with. For example, smartphone data includes geographic location, text messages, browsing history, and directional information (via GPS). Is it any wonder that our digital universe is growing at such a rapid pace?
A Final Word (or Two): Beware the Digital Shadow
Now, if you are a regular reader of this blog you know that I often post (to see any of them, just search on the data privacy tag) about data privacy from pretty much every angle. As data about us is collected, aggregated, and mined it is possible for businesses and organizations of all kinds to build very detailed profiles and even predict, fairly accurately, our future behavior. IDC calls this our digital shadow:
“Our digital shadow is made up of information we may deem public but also data that we would prefer remain private. Yet it is within this growing mist of data where big data opportunities lie—to help drive more personalized services, manage connectivity more efficiently, or create new businesses based on valuable, yet-to-be-discovered intersections of data among groups or masses of people.”
While the possibilities are great, the big data industry as a whole should also ensure that no harm is done. Or as Peter Parker (Spider-Man) said: “With great power comes great responsibility.” The PatternBuilders team takes digital privacy and security as seriously as we take performance and ease of use and encourages all big data industry participants to do the same.