No, Hadoop Doesn’t Own Big Data Analytics!
A number of folks have asked me if I was concerned about Microsoft’s recent announcement that they would be partnering with HortonWorks and abandoning their own distributed processing technology for Hadoop. While I thought this was an unfortunate choice on Microsoft’s part (the Dryad project’s implementation of multi-server Linq was pretty compelling), since HPC is a small part of Microsoft’s business, it probably made sense from a business standpoint. In any case, we (as in all of us at PatternBuilders) are not concerned and just to be clear: we don’t believe that this announcement (or any other) means that the many Hadoop ecosystem players own the still forming big data analytics market.
That is not to say that the announcement isn’t proof of the strength of the Hadoop ecosystem. Hadoop is a nifty technology that offers one of the best distributed batch processing frameworks available, although there are other very good ones that don’t get nearly as much press, including Condor and Globus. All of these systems fit broadly into the High Performance, Parallel, or Grid computing categories and all have been or are currently used to perform analytics on large data sets (as well as other types of problems that can benefit from bringing the power of multiple computers to bear on a problem). The SETI project is probably the most well know (and IMHO, the coolest) application of these technologies outside of that little company in Mountain View indexing the Internet.
But just because a system can be used for analytics doesn’t make it an analytics system. This is something that many Hadoop users discover after seeing the cost of the professional services required to turn a Hadoop distribution into an enterprise analytics system. Keep in mind that there are many things required to store and perform big data analytics. Of course, being able to scale across machines to deliver computing power is a significant issue but as you can see here, it is just the tip of the iceberg. PatternBuilders is in the real time analytics business. We built our own technology on both the front and back ends to support analytics on large streaming data sets. We did this because while there were a lot of interesting technologies that service companies had built their businesses around, there weren’t any PRODUCTS focused on helping the enterprise fill real-time, big data analytics needs that could be implemented and maintained without a huge services effort. Since our engine was deliberately focused on real time/streaming analytics, we built in integration points to make it easy to integrate data from batch systems (including Hadoop) when it made sense. For example, an individual’s influence in a social graph is an ideal calculation to do in batch mode.
In summary: Hadoop is a great batch-focused distributing processing engine and I am glad that the work of that community is paying off for them, but they are not an enterprise analytics system! BTW, if this very mild piece gets any Hadoop loyalists screaming hatchet job go read my Twitter buddy Colin Clark’s piece on killing the elephant.
As a final note, Microsoft seems to be hedging their bets with another Microsoft Research project called Daytona that offers a distributed processing framework for their cloud platform Azure which looks pretty interesting. You will notice that Daytona ships with an Analytics front end (an Excel plugin). We have been playing with the bits and will definitely be integrating it in when it’s released.
Entry filed under: Data, General Analytics, PatternBuilders Technology, Technology. Tags: analytics, big analytics, big data, Hadoop, PatternBuilders Analytic Framework, real-time analysis, streaming analytics.