We read an interesting paper and post about Google Flu Trends (GFT) and its foibles last week. The paper points out a couple of lessons that those of us living in the big data analytics world have learned the hard way but the dangers are worth revisiting as tools like ours (AnalyticsPBI for Azure) begin to move big data analytics into the mainstream of organizational practices. After all, our tool (and others like it) makes it easy and even fun for analytics junkies to use all those available zettabytes of data and answer questions that they’ve long wondered about. But the paper also reminded us of the dangers of ignoring the natural cycles of an analytics process that we talked about in this recent post. If Google followed the PatternBuilders Analytics Methodology, they might have avoided many of the errors that GFT is now spitting out. In fact, the authors of the paper point out that:
“Although not widely reported until 2013, the new GFT has been persistently overestimating flu prevalence for a much longer time. GFT also missed by a very large margin in the 2011-2012 flu season and has missed high for 100 out of 108 weeks starting with August 2011… This pattern means that GFT overlooks considerable information that could be extracted by traditional statistical methods.”
This overestimation is attributed to two primary factors: data hubris and algorithm dynamics. (more…)
A recent conversation with a client reminded me that no matter how crazy and exciting the Big Data world gets, it is still critical to understand what your goals are and where you are in the process of reaching those goals. Having a good foundation in “what’s important” is critical before you jump into the wild world of Big Analytics.
For example, in big data (well, actually all data but I digress) “Reporting” and “Analytics” are very different functions. But I often find our customers and prospects grappling with how to distinguish one from the other and as a result, confusing reporting with analysis and losing track of their real goals.
New Year (2014) Rumination: Death of privacy as we know it? Or inflection point signaling better things to come?
I am, and always have been, a glass half-full kind of gal. In fact, way back in September 2011 when Terence and I published our book on Privacy and Big Data, I was far more optimistic than he was on the future of privacy—of course, it’s easy to sound optimistic when your co-author states that privacy is dead. (And yes, we are still working on our book update but we do have day jobs and a significant release in the works so it is slow going but going it is.)
At that time, those in the “digital privacy know” characterized our book as a decent overview. Our intent at the time was to help those NOT in the “digital privacy know” get their arms around the privacy issues from a legislative, corporate, and government perspective. To our surprise, those not in the know included lots of folks in the high tech community! We did a number of interviews and dealt with informed and somewhat uninformed media folk—those in the mainstream focused on social media and those on the fringes (left and right) wanted to do deep dives into legal issues, government uses of data, and fourth amendment rights. Some seemed to think that we were members of the tin foil hat brigade, others that we were naïve, and still others that we were on point. (more…)
This past weekend the Philippines and surrounding areas were decimated by super typhoon Haiyan. The storm was estimated to have sustaining winds of 195 mph with winds gusts up to 235 mph. It is believed to be one of the strongest storms ever recorded.
For those of us who have lived through typhoons and hurricanes (myself included) and felt the power of those storms, what we saw was nothing compared to what the Philippines endured. The death toll is now estimated to be above 10,000 and climbing and the devastation it left is likened to the destruction wrought by the 2004 Indian Ocean tsunami.
It will take years to rebuild and recover from such a disaster and our hearts go out to the people impacted by the typhoon as well as the relief workers who are providing much needed aid to that region. For those of you who would like to help, the New York Times and Reuters have provided lists of organizations (with links) that are providing aid. I would also point you to a local organization (in the Washington area), World Vision, that is intent on providing relief to children and families devastated by the typhoon.
In times like these we are all reminded of the fragility of life, the power of nature, and the undeniable fact that we are truly all in this together. Please help if you can.
Privacy, Anonymity, and Judicial Oversight are on the Endangered List
An age old debate has once again reared its very ugly head due to whistleblower Edward Snowden’s revelations about NSA surveillance, PRISM, and the astounding lack of any rigorous oversight on the NSA’s vast data collection apparatus. While PatternBuilders has been incredibly busy, in our non-copious amounts of spare time Terence and I have also been working on our update to Privacy and Big Data (which is undergoing another rewrite due to new government surveillance revelations that for a while happened hourly, then daily, then weekly but certainly are far from over). It’s important to note that pre-revelations our task was already herculean due to mainstream media’s pick up on “all stories related to privacy” (a good thing) that often missed the mark on the technical side of the house (we often find ourselves explaining to non-techies just what meta data is which usually happens after someone on CNN, Fox, NBC, ABC, etc., butchers the definition) or got tripped up by the various Acts, Amendments, state laws, EU Directives, etc., that apply to aspects of privacy.
Over the last few weeks as details about PRISM emerged, it’s become clear to me that main street America may still not understand the seismic shift that big data and analytics brings to the privacy debate. Certainly the power of big data and analytics has been lauded or vilified in the press—followers of our twitter feed are used to seeing the pros and cons of big data projects debated pretty much every day. We’ve (Terence and I) talked and tweeted about privacy issues as it applies to individuals, companies, and governments. Heck, we even wrote a book about privacy and big data. (more…)
As regular readers of this blog know, Terence and I spend a great deal of time talking about the state of the big data analytics industry and what is needed before mainstream adoption becomes an actual fact (as opposed to the hyperbolic reporting on any and all things related to big data). Recently, we sat down with the Microsoft BizSpark (a partner of ours) team to talk about the state of big data analytics today and why we decided to co-found PatternBuilders. To read the full story, go here. And because I can never resist a great quote (I am in marketing after all), here’s what Terence had to say during our interview:
“We found it disconcerting that there was such a huge divide between big data excitement and actual adoption rates. Taking advantage of big data analytics often requires a budget, toolset and in-house expertise far beyond what most enterprises can muster. Mary and I founded PatternBuilders because we thought there must be a better approach.”
For more information on our technology choices and why we are unabashed fans of Microsoft technologies, you may find these posts helpful:
- Introducing AnalyticsPBI for Azure—A Cloud-Centric, Components-Based, Streaming Analytics Product
- AnalyticsPBI for Azure: Turning Real-Time Signals into Real-Time Analytics
- Enterprise Software in the Cloud: Why We Chose Azure as our First PaaS Platform
A top-level view of our data project over a series of posts.
By Mary Ludloff
Welcome to the third post in our series on a big data project. Our goal is to walk you all the way through a big data project from its inception through its completion (or depending on the project, through deployment and maintenance). Those of you familiar with our series know that we include our Big Data Playbook rules as we address specific topics—we may repeat some as we go along but if you need to refresh your memory on where we are, go to Part 1 and Part 2.
You now know that we are working with the University of Sydney on a project that looks at the impact social media comments have on a company’s stock and whether this mediates the influence of primary news. Specifically: Is a company’s stock price influenced by both and can we isolate and study the impact of those distinct sources on that stock price? (more…)
Sadly, this week we were reminded once again of the fragility of life and the resilience of the human spirit. Terence, myself, and the PatternBuilders team send our condolences to all who were impacted by this tragedy. For those who would like to help, donations can be made to:
- The One Fund—specifically formed to help those most affected by the bombings.
- The New England Patriots Charitable Foundation—all donations denoted with the words “Boston Marathon” will be earmarked for The One Fund.
- Boston’s First Responders Fund—also specifically established to benefit the victims of the bombings.
A number of resources can also be found here.
Much as it pains me to say this, beware of bogus Boston Marathon charity websites. Melanie Hicken of CNNMoney offers some advice on what to look out for.
Finally, there have been many moving tributes made by people via blogs, twitter, and other media sources. We leave you with this simple statement projected on the wall of the Brooklyn Academy of music:
In my last post, I wrote about the three V’s of big data and why there are only three. There has been a messaging pile-on that seems to be happening in the big data space that even I, long-time marketer, find disconcerting. So, over the course of a number of posts, my colleague, Marilyn Craig, and I are going to de-mystify a big data project, taking apart each stage of a real big data initiative as if it were a release post-mortem. We will be talking about roles and responsibilities, data governance, project and process management, what went right, what went wrong, what we should have done differently. Except in this case, it will not be after the fact but rather a stage-by-stage review as we work on a real-world project. For your sanity and ours, we have created a special category, Big Data Project, as well as a tag with the same name. If you search on either, you will see all posts related to the project. Additionally, all posts about the project will start with Big Data Project in the title. Who knows? Maybe when we’re done, we’ll write a book (knowing what I know now about writing a book, I can’t believe I just said that)!
We’ll talk more about the project in the next post but first I wanted to take a look at a big data failure that anyone involved in a major enterprise application deployment could have seen coming and is Rule #1 in our big data playbook:
Rule #1: Big Data IS NOT rocket science.
Marilyn Craig (Managing Director of Insight Voices, frequent guest blogger, marketing colleague, and analytics guru) and I have been watching the big data “V” pile-on with a bit of bemusement lately. We started with the classic 3 V’s, codified by Doug Laney, a META Group and now Gartner analyst, in early 2001 (yes, that’s correct, 2001). Doug puts it this way:
“In the late 1990s, while a META Group analyst (Note: META is now part of Gartner), it was becoming evident that our clients increasingly were encumbered by their data assets. While many pundits were talking about, many clients were lamenting, and many vendors were seizing the opportunity of these fast-growing data stores, I also realized that something else was going on. Sea changes in the speed at which data was flowing mainly due to electronic commerce, along with the increasing breadth of data sources, structures and formats due to the post Y2K-ERP application boom were as or more challenging to data management teams than was the increasing quantity of data.”
Doug worked with clients on these issues as well as spoke about them at industry conferences. He then wrote a research note (February 2001) entitled “3-D Data Management: Controlling Data Volume, Velocity and Variety” which is available in its entirety here (pdf too). (more…)