Posts tagged ‘batch processing’
Since Disqus seems to have completely eaten (bleh) my comment on @davidlinthicum’s very interesting InfoWorld post – Big data and the cloud: A far from perfect fit, I decided to just expand my comments and make a short blog post out of it. IMHO the problems that David is describing are more a reflection of problems with batch oriented technologies like Hadoop (more on my take on Hadoop here) in the cloud than a general problem for cloud based big data solutions.
Computing always has, and probably always will have, a bias towards creating batch focused technologies at the beginning of any large paradigm shift. But as new technologies are absorbed, understood, and move from early adopter to more mainstream use, the batch paradigm will inevitably start to shift to streaming and real-time. We have seen this again and again (from punch cards to touch sensitive tablets, downloaded media to streaming media, DOM to SAX parsers, HTML to Ajax, paper maps to real-time GPS). The reason this evolution almost always occurs is simple: humans live and think in real-time and when our tools do as well we are more productive and happier. So why do we have this bias for batch processing in our first generation computational technologies? Simply put, because batch processing is a lot easier.
I have been a little quiet on the blogging front recently as I and the rest of the PatternBuilders team have been focused on getting ready to launch our new financial services application: FinancePBI. It is the first cloud-based analytical platform for the Financial Services market. While this is our first public announcement of our entry into the market, behind the scenes we have been gearing the company up for a big splash for several months:
- Partnered with ActiveFinancial one of the premier real-time stock ticker vendors in the world. Look for more data partnerships shortly.
- We have added Doug Jeffrey to our board of advisors and board of directors. Doug is an executive with deep Wall Street and startup expertise who has already done outstanding things in the short time he has been with us.
- We have also partnered with the University of Sydney to use our technology to examine the influence of primary sources (NY Times, etc.) and secondary social media (Twitter, etc.) content on a company’s stock price over a 12 month period. This project will be done exclusively in the cloud and it’s our hope is that we will be able to convince our commercial partners to allow this PatternBuilders instance to be available to the general public. Of course, this would happen after the research is published. (more…)
Today one of our Server Engineers is going to give you a deep dive on our architecture. As always on our blog, all of the data is simulated and all trademarks are the property of their respective owners.
Hello everyone! I am going to get fairly technical in this post and go over how PatternBuilders Analytics Framework (PAF) does what it does so well. As Terence has said in the past couple of posts, we have a new architecture that’s based around scalability, streaming, and ease of use. That’s not quite the whole story though; the development of this architecture was in fact driven primarily by performance. (more…)
In my last post, I gave an overview of the difference between batch and streaming analytics approaches. It was a very popular post and was mentioned on the excellent MyNOSQL blog whichwas really appreciated. Their able proprietor, Alex Popescu, had the following comment:
“I cannot put my finger on it right now, but I don’t think stream processing can cover exactly the same wide range of computations available in batch processing:
While I haven’t had the chance to play with real big data, I believe it is not a matter of either or. An ideal system would need to support:
- piping incoming data through a combination of filters, preprocessors/transformers, and calculators/extractors
- preserve (all/relevant) data for later computation
- allow processing of stored data in either streams or batches“
Our new streaming analytics engine.
As promised, I am going to spend the next few posts discussing some of the new features in our analytics framework, otherwise known as PAF. This is our largest and most complex release so far. We are very proud of it—both in how far the framework has come and how closely it matches our vision of what a world class analytics system would look like when we started the company a few years ago.
One of my favorite features, and certainly the biggest change in this release, is that our analytics engine is now completely streaming based. I think that this, along with our improved ad-hoc analysis support, is going to improve our customers’ day-to-day to experience with both calculating and using analytics in their businesses. (more…)