How “Real” is Real-Time and What the Heck is Streaming Analytics?
I have a confession to make. I am in marketing and as such, my profession often tries to make the complex sound simple. We look for sound bites that will help folks to understand, in five words or less, a complicated story, process, service, or feature. I am sure that you are familiar with this as it is everywhere—politicians, news organizations, television, radio, and companies of every size struggle to reduce the very complex down to the very simple. Quite often, something gets lost in the translation.
This is the case with real-time analytics. As you may recall, in a previous post Terence pointed out that “real-time,” as it is applied to analytics, does not meet the computer science standard. Am I splitting hairs? Yes and no. Unfortunately, we (as in marketers in the big data and big analytics space as well as all the tertiary spaces such as business intelligence, data warehousing, etc.) coined the term “real-time analytics” when what we really meant, for all intents and purposes, was pretty darn fast analytics. (By the way, if you want to make developers go crazy, say real-time and then sit back and watch as they carefully, rationally, try and explain why there is no such thing as a real-time analytics system.)
In fact, my techie friends (computer science geeks of the highest order) tell me that real-time does not mean fast or slow processing; speed has nothing to do with it. It is simply ensuring that once an event happens, the results are returned within a predictable timeframe. Have I lost you yet? For example, a car’s anti-lock brakes are considered a real-time computing system. The predictable timeframe is the “time” in which the brakes must be released to prevent the wheel from locking.
Now, I suspect you’re beginning to understand that there is no such thing as real-time analytics. There is a time lag and depending upon what you’re doing, it can take a couple of seconds, a couple of minutes, a couple of hours to return an analytic result. The question becomes: how long are you willing to wait before a result (or partial result) is returned? Keep in mind that data is coming at us 24/7 and in larger and larger quantities. Think about this for a minute (or two):
- There are one trillion unique URLs in Google’s index and two billion Google searches every day.
- There are 70 million videos available on YouTube and they are viewed 100 million times on a daily basis.
- There are 133 million blogs.
- There are more than 29 billion tweets and three million are added every day.
- There are more than 500 million active Facebook users and they spend over 700 billion minutes per month on the site.
Now combine this with all the government, economic, financial, demographic, geographic, the list goes on and on, data sets that are available for aggregation and analysis and I think that you get the picture. Is it any wonder that the retail and financial services industries have spent billions to develop and support analytic solutions? And why other industries (and companies) want this capability? There’s just two little problems: to analyze these ever-growing data sets takes time, lots of time, and money, lots of money (don’t forget the infrastructure costs), and by the “time,” you’ve figured out “something” (such as out of stocks on store shelves in retail or the possibilities of a flu pandemic in healthcare) it may be too late to “change” the situation.
In my last post on Capital One and fraud detection, I mentioned that the 99 cent iTunes’ purchase was discovered in less than five minutes (from purchase to analysis to result to phone call to me). This is not just an example of fraud detection but one of streaming analytics. Put simply, data is analyzed as it comes in to predict an outcome (potential fraud) and then that information is used to alter the outcome (more fraud).
This is why streaming analytics is so important and why PatternBuilders Analytics Framework (PAF) supports it. We wanted to provide a solution that offers the same sort of computing power that the financial and retail industries bring to bear to the rest of the world (hey, I am in marketing and upon occasion am allowed to wax lyrical) so that they have the capability to alter outcomes or at the very least, see trends develop and have the ability to respond to those trends faster. And as pointed out in this post, we believe that a streaming analytics engine is the best way to cost effectively accomplish this.
Take Google flu trends for example. What if you could track Google flu trends on let’s say an hourly basis, looking for spikes in flu search activity in terms of demographics, and then use that information to direct how vaccines are distributed, make public service announcements, etc.? And, let’s say the pharmaceutical company supplying the vaccine knew where each truck was located within a five minute window so that it could divert its distribution in near real-time to deliver more vaccine to the impacted region ASAP. Again, you might be able to change the outcome which would result in less cases of flu.
Now you may wonder why we use the term streaming analytics rather than real-time analytics. When PatternBuilders was founded, the executive team and I made a pact: sound bites must be accurate. If they’re not, we don’t use them and trust that our customers and prospects can follow, and appreciate, our logic. So, we use the term streaming analytics because that is precisely what our analytics platform does. We leave the real-time analytics sound bites to those other companies and now you know what they’re really talking about!