Big Data Tools Need to Get Out of the Stone Age: Business Users and Data Scientists Need Applications, Not Technology Stacks
Things have been crazy at PatternBuilders recently. The excitement and positive reactions to FinancePBI, our Financial Services big data analytics solution, from media, analysts, venture folks, cloud infrastructure partners, and users has been amazing. Our new cross industry graphical big data correlation mashups are generating a lot of excitement as well—we like to call this feature Google Correlate on steroids. Check out how our newest partner analytics consultancy, InsightVoices, has used it to find relationships between stock prices and traffic sensor data.
Mary’s recent post on Strata West 2012 provides a great overview of how hot the hype cycle around big data has become (while managing to work in a plug for her favorite gory TV series as well). In case you’re still not convinced, here are some additional nuggets:
- The market for big data technology worldwide is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015, a compound annual growth rate (CAGR) of 40% (hat tip to IDC).
- The amount of big data being generated continues to grow exponentially, now being expected to double in two years. This is largely driven by social networks, smartphones, and really cool IP-enabled devices like the Fitbit and this IPhone-based brain scanning device by our new Strata buddy Tan Le at Emotiv Lifesciences. Yes, she is much smarter than us but we like her anyway!
- The White House is even doing its share, investing $200 million a year in access and funding to help propel big data sets, techniques, and technologies while giving a shout out to our friends at Data Without Borders.
Yes, big data is here to stay and getting bigger by the day. But I wouldn’t break out the champagne just yet because we have two very big hurdles to overcome: talent and tools. Brian Deagon in Investor’s Business Daily puts it succinctly:
“A recent survey of companies looking to exploit the promise of Big Data reveals a rough road ahead…The study makes clear that while companies agree that data analytics can provide valuable customer insights that would boost sales and improve business operations, the tools and talent needed to perform such analytics is lacking.”
And he is not the only one to point this out. Lisa Arthur, in a recent Forbes article, makes the case that “data is the new oil.” There are just one or two challenges to address:
“Data is the new oil. I’m hearing that declaration more and more, and to me, it means simply this: Companies are learning to turn Big Data into Big Dollars…The era of Big Data is here. You’re going to need the right tools –and you’re going to need the right people –to put Big Data to work for you. All that “new oil” won’t have value unless it can be used to gather the actionable insights that drive business growth.”
There’s a theme here and it’s getting louder everyday: where are the tools and talent? And let’s be very clear that a major talent gap is developing:
“Currently, every company of appreciable size, roughly 1,000 employees or more, has at least as much data as the library of congress, and the market is full of inefficient structures meant to compensate for imperfect understanding of data, such as the current model of car insurance which puts drivers in broad categories rather than looking at granular driving habits. Given the growth of the Big Data in industry, the [McKinsey] report said that the American workforce would need 1.5 million more data literate workers and between 140,000 and 190,000 with deep analytics skills, a major gap.”
That talent gap is only exacerbated by the usability of the current set of big data tools. It’s hard enough to find folks that understand statistics and math well enough to solve interesting problems. The current state of big data demands that to be effective, data scientists also need to become expert java programmers in order to and figure out whether to use Pig or Hive for data access. No wonder a lot of organizations are saying to hell with it and sticking with Excel, SAS, and R.
Big Data is at a cross roads: we can either keep coming up with toolkits that glory in their complexity so we can all brag about how smart our users have to be to use them or we can start building tools and applications that make the expanding universe of data more accessible. Guess which road I’d choose.
Look, big data will not “change the world” until anyone with a college level understanding of statistics can derive useful information from large data sets. After all, it’s not about the data itself. It’s about analytics and the insights those analytics enable you to make. While big data projects propelled by those rocket scientists programmers are nice and all (and of course have solved some very interesting problems), that doesn’t change the fact that the current set of big data tools are inaccessible and unusable by the majority of public and private organizations.
Today, most big data “success stories” are one off projects developed by very expensive expert programmers who are working with advanced data scientists. These projects are deployed on Hadoop or a proprietary hardware platform. (See my earlier post on why I don’t believe Hadoop will ever be an enterprise Analytics system.) Hadoop reminds of a C++ compiler, which is a very powerful tool that given enough time, can be coerced by a skilled programmer to solve almost any problem. But even if you are geek enough to love C++ (I am), the most hardcore C++ lover understands that there is a reason why C++ compilers have only a tiny fraction of sales when compared to Excel. Make no mistake: Excel is still a programming environment but its goal was usability, not flexibility or power—USABILITY! But unlike C++ programmers, a lot of big data folks seem to think that inaccessibility is a drawing point:
“<Well know Hadoop vendor> doesn’t want to talk about how to make it easier. All they talk about is how they will be able process more petabytes. Who gives a @#$@#$ about petabytes? I just want answers!”
The quote above was from one of our prospects during a sales call. The CEO of CloudEra (BTW, they weren’t the vendor mentioned above) had a similar point to make at GigaOM’s Structure:Data conference. He wanted independent software developers to know that if you:
“… come to me with apps, I’ll get you money.”
Listen, if you have to entice (find money for) ISVs to use your platform, you have a huge problem. For example, we use MongoDB as our primary persistence engine. We use it because it meets our performance needs, stays out of our way while we do our statistical magic over really large data, and supports our fan out scaling needs very well. They didn’t have to pay us. They were, and remain, the most convenient tool that met both ours and hundreds of other developers needs, including folks like Foursquare. As Oracle and a ton of other very rich folks will tell you: If you have a good platform, ISV’s will pay you.
If the big data/analytics technology vendors can’t create platforms that attract ISVs, they can forget about the majority of IT shops. Those guys have businesses to run. Yeah, Facebook and Goldman can afford to bring in Hadoop gurus and have them write code for every analytics request. But even they will stop that once there are solutions that are more respectful of their time and resources (in much the same way that Java and C# have taken a large portion of the programming market from C++).
If the majority of conversations with your customers and prospects are about how cool and scalable your technology is and how large teams of brilliant folks worked together for months to produce a single result, you aren’t a platform! You are a technology! And if you want to build a sustainable business that will change the world you need to get back to work!