Posts tagged ‘Hadoop’
There’s a sad, but true, statistic that every entrepreneur knows by heart: 9 out of 10 startups fail. Unfortunately, PatternBuilders is adding its number to this pile. We have been procrastinating writing this post because shutting down a company is hard. When you put your heart and soul into something, you need time to process, reflect, and eventually get to the point where you can move on.
But moving on does not mean that we are disappearing; after all, shutting down the company does not end our passion for big data, privacy, and all things tech-related (especially IoT). To that end, we will be maintaining this blog, as our main place to write and comment about those issues. We are also consulting around all areas involving big data and/or privacy (via our existing consulting organization, Ludloff-Craig Associates) and are working on some other things that we are keeping under wraps for now. But if you follow our blog, @terencecraig, or @mludloff, you will be the first to know. And if you have interesting opportunities, consulting projects, or for the right company – a full-time job – please get in touch.
There are a number of reasons why we are shutting our doors, but suffice to say, we made some decisions we knew might have an adverse effect on the company. And we stand by those decisions. (more…)
For the second post on AnalyticsPBI for Azure (first one here), I thought I would give you some insight on what is required for a modern real-time analytics application and talk about the architecture and process that is used to bring data into AnalyticsPBI and create analytics from them. Then we will do a series of posts on retrieving data. This is a fairly technical post so if your eyes start to glaze over, you have been warned.
In a world that is quickly moving towards the Internet of Things, the need for real-time analysis of high velocity and high volume data has never been more pronounced. Real-time analytics (aka streaming analytics) is all about performing analytic calculations on signals extracted from a data stream as they arrive—for example, a stock tick, RFID read, location ping, blood pressure measurement, clickstream data from a game, etc. The one guaranteed component of any signal is time (the time it was measured and/or the time it was delivered). So any real-time analytics package must make time and time aggregations first class citizens in their architecture. This time-centric approach provides a huge number of opportunities for performance optimizations. It amazes me that people still try to build real-time analytics products without taking advantage of them.
Until AnalyticsPBI, real-time analytics were only available if you built a huge infrastructure yourself (for example, Wal-Mart) or purchased a very expensive solution from a hardware-centric vendor (whose primary focus was serving the needs of the financial services industry). The reason that the current poster children for big data (in terms of marketing spend at least), the Hadoop vendors, are “just” starting their first forays into adding support for streaming data (see CloudEra’s Impala, for example) is that calculating analytics in real-time is very difficult to do. Period.
It has been a while since I’ve done posts that focus on our technology (and big data tech in general). We are now about 2 months out from the launch of the Azure version
But before I start exercising my inner geek, it probably makes sense to take a look at the development philosophy and history that forms the basis of our upcoming release. Historically, we delivered our products in one of two ways:
- As a framework which morphed (as of release 2.0) into AnalyticsPBI, our general analytics application designed for business users, quants, and analysts across industries.
- As vertical applications (customized on top of AnalyticsPBI) for specific industries (like FinancePBI and our original Retail Analytics application) which we sold directly to companies in those industries.
I had to miss Strata due to a family emergency. While Mary picked up the slack for me at our privacy session, and by all reports did her usual outstanding job, I also had to cancel a Tuesday night Strata session sponsored by 10Gen on how PatternBuilders has used Mongo and Azure to create a next generation big data analytics system. The good news is that I should have some time to catch up on my writing this week so look for a version of what would have been my 10Gen talk shortly. In the meantime, to get me back in the groove, here is a very short post inspired by a Forbes post written by Dan Everett of SAP on “Hadoopla”
As a CEO of a real-time big data analytics company that occasionally competes with parts of the Hadoop ecosystem, I may have some biases (you think?). But I certainly agree that there is too much Hadoopla (a great term). If our goal as an industry is to move Big Data out of the lab and into mainstream use by anyone other than the companies that thrive on and have the staff to support high maintenance and very high skill technologies, Hadoop is not the answer – it has too many moving parts and is simply too complex.
To quote from a blog post I wrote a year ago:
“Hadoop is a nifty technology that offers one of the best distributed batch processing frameworks available, although there are other very good ones that don’t get nearly as much press, including Condor and Globus. All of these systems fit broadly into the High Performance, Parallel, or Grid computing categories and all have been or are currently used to perform analytics on large data sets (as well as other types of problems that can benefit from bringing the power of multiple computers to bear on a problem). The SETI project is probably the most well know (and IMHO, the coolest) application of these technologies outside of that little company in Mountain View indexing the Internet. But just because a system can be used for analytics doesn’t make it an analytics system…..“
Why is the industry so focused on Hadoop? Given the huge amount of venture capital that has been poured into various members of the Hadoop eco-system and that eco-system’s failure to find a breakout business model that isn’t hampered by Hadoop’s intrinsic complexity, there is ample incentive for a lot of very savvy folks to attempt to market around these limitations. But no amount of marketing can change the fact that Hadoop is a tool for companies with elite programmers and top of the line computing infrastructures. And in that niche, it excels. But it was not designed, and in my opinion will never see, broad adoption outside of that niche despite the seeming endless growth of Hadoopla.
In Search of Elusive Big Data Talent: Is Science Big Data’s Biggest Challenge? Or Are We Looking in the Wrong Places? (Part 1 of 3)
When we talk to prospects about their big data initiatives our conversations usually revolve around issues of complexity that goes something like this:
“Big data is so big (no pun intended), there’s such a variety of sources, and it’s coming in so fast. How can we develop and deploy our big data projects when everyone is telling us that we need lots and lots of data scientists and oh, by the way, there aren’t enough?”
Admittedly, many media outlets and pundits are positioning the search for skilled big data resources as what I can only characterize as the battle for the brainiacs. Don’t get me wrong, I am not disputing McKinsey’s report on big data last year that made it clear a talent shortage was looming, estimating that the U.S. would need 140,000 to 190,000 folks with “deep analytical skills” and 1.5 million managers and analysts to “analyze big data and make decisions based on their findings.” But the hype surrounding the data scientist is getting a bit absurd and we seem to be forgetting that those 1.5 million managers and analysts may already be “walking amongst us.” Is a shortage of data scientists really big data’s biggest challenge? (more…)
Big Data Tools Need to Get Out of the Stone Age: Business Users and Data Scientists Need Applications, Not Technology Stacks
Things have been crazy at PatternBuilders recently. The excitement and positive reactions to FinancePBI, our Financial Services big data analytics solution, from media, analysts, venture folks, cloud infrastructure partners, and users has been amazing. Our new cross industry graphical big data correlation mashups are generating a lot of excitement as well—we like to call this feature Google Correlate on steroids. Check out how our newest partner analytics consultancy, InsightVoices, has used it to find relationships between stock prices and traffic sensor data.
Mary’s recent post on Strata West 2012 provides a great overview of how hot the hype cycle around big data has become (while managing to work in a plug for her favorite gory TV series as well). In case you’re still not convinced, here are some additional nuggets:
- The market for big data technology worldwide is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015, a compound annual growth rate (CAGR) of 40% (hat tip to IDC).
- The amount of big data being generated continues to grow exponentially, now being expected to double in two years. This is largely driven by social networks, smartphones, and really cool IP-enabled devices like the Fitbit and this IPhone-based brain scanning device by our new Strata buddy Tan Le at Emotiv Lifesciences. Yes, she is much smarter than us but we like her anyway!
- The White House is even doing its share, investing $200 million a year in access and funding to help propel big data sets, techniques, and technologies while giving a shout out to our friends at Data Without Borders.
I’ve been meaning to blog about Strata West for the last week or so but felt the need to take a step back and look at the conference objectively. Of course, we’ve also been very busy at PatternBuilders working on our latest release (where correlation is the king and financial services is the queen or vice versa), engaging with potential partners and customers, and all the other activities that make up a startup’s life. In other words, during and after the conference we’ve barely been able to catch our collective breath (as well as get some much needed rest)!
So before I talk about the conference as a whole as well as some of the sessions and folks that caught my eye and of course, our book signing event (yes, Terence and I signed many books for conference attendees), I wanted to give a final shout out to our stellar Big Data and SCM panelists: Lora Cecere, Pervinder Johar, and Marilyn Craig. Thank you all for participating and for taking on this very broad topic! Much ground was covered, including the need for more rigorous cold chain management to ensure the efficacy of drugs, the amount of food that is spoiled and thrown away (one out of every three fruits and vegetables and two out of every five chickens) due to poor logistics management, and how big data can be used to transform the auto repair industry. What I loved about this panel (and yes, I am admittedly biased) was that it focused on real world problems that companies, industries, and societies are dealing with today. By the way, our panel was part of Strata Jumpstart—billed as the missing MBA for Big Data and it certainly lived up to its billing! (more…)