Posts tagged ‘Mongo’
I had to miss Strata due to a family emergency. While Mary picked up the slack for me at our privacy session, and by all reports did her usual outstanding job, I also had to cancel a Tuesday night Strata session sponsored by 10Gen on how PatternBuilders has used Mongo and Azure to create a next generation big data analytics system. The good news is that I should have some time to catch up on my writing this week so look for a version of what would have been my 10Gen talk shortly. In the meantime, to get me back in the groove, here is a very short post inspired by a Forbes post written by Dan Everett of SAP on “Hadoopla”
As a CEO of a real-time big data analytics company that occasionally competes with parts of the Hadoop ecosystem, I may have some biases (you think?). But I certainly agree that there is too much Hadoopla (a great term). If our goal as an industry is to move Big Data out of the lab and into mainstream use by anyone other than the companies that thrive on and have the staff to support high maintenance and very high skill technologies, Hadoop is not the answer – it has too many moving parts and is simply too complex.
To quote from a blog post I wrote a year ago:
“Hadoop is a nifty technology that offers one of the best distributed batch processing frameworks available, although there are other very good ones that don’t get nearly as much press, including Condor and Globus. All of these systems fit broadly into the High Performance, Parallel, or Grid computing categories and all have been or are currently used to perform analytics on large data sets (as well as other types of problems that can benefit from bringing the power of multiple computers to bear on a problem). The SETI project is probably the most well know (and IMHO, the coolest) application of these technologies outside of that little company in Mountain View indexing the Internet. But just because a system can be used for analytics doesn’t make it an analytics system…..“
Why is the industry so focused on Hadoop? Given the huge amount of venture capital that has been poured into various members of the Hadoop eco-system and that eco-system’s failure to find a breakout business model that isn’t hampered by Hadoop’s intrinsic complexity, there is ample incentive for a lot of very savvy folks to attempt to market around these limitations. But no amount of marketing can change the fact that Hadoop is a tool for companies with elite programmers and top of the line computing infrastructures. And in that niche, it excels. But it was not designed, and in my opinion will never see, broad adoption outside of that niche despite the seeming endless growth of Hadoopla.
As you all know, Tim and I spoke at MongoSF recently. Our session was focused on how to build a streaming analytics system with Mongo. For those of you who might have missed this post thread, here are the highlights (with the appropriate links):
- We wanted to make our beta version of PatternBuilders Social Media Analytics demo publicly available on the web.
- We looked at cloud-based deployments as a way to make this economically viable.
- As part of our move to the cloud, we made significant changes to PatternBuilders Platform architecture—which included MongoDB (a choice that the PatternBuilders development team is very happy with).
Our session was videotaped and I am happy to announce that it is now available on the 10gen site. You’ll notice that we got a lot of great questions. If, after viewing the video, you have some thoughts or questions please send them my way through comments or email—it may take me some time (we are, as Mary said in her last post, crazy busy right now), but I will follow up!
When we started PatternBuilders, we made what was then an unusual decision: to avoid multi-tenancy as I talked about here. However, we also decided to avoid the cloud because we wanted to have predictable costs and felt that given the high level of expertise we had internally with managing data centers, we would be better off investing in top tier colocation facilities. This made a lot sense given the security sensitivities of our initial target markets: internal IT at the Fortune 500, large retail suppliers, and hospital groups. It was also an economically viable choice because our business model provisions hardware and bandwidth for each customer after the sale to manage cash flow. We also knew that we would be able to reduce both the cost and maintenance headaches of separate customer provisioning by aggressive use of virtualization technology, much like the cloud server vendors Rackspace, Amazon, and others do today.
It’s a legitimate question that we get asked a lot: why can’t I use my multi-billion dollar BI system to manage my big data/real time analytics problem(s)? I have found that my tongue in cheek answer “we need you buy our software and services because we have families to feed” though true, is not as compelling to customers as I would like which leads to today’s post.
To put the question another way: what can PatternBuilders and the rest of the new approaches to data and analytics like Hadoop or Mongo (which, by the way, our platform uses and is a great technology) offer you over and above what large BI company X offers? (more…)