We read an interesting paper and post about Google Flu Trends (GFT) and its foibles last week. The paper points out a couple of lessons that those of us living in the big data analytics world have learned the hard way but the dangers are worth revisiting as tools like ours (AnalyticsPBI for Azure) begin to move big data analytics into the mainstream of organizational practices. After all, our tool (and others like it) makes it easy and even fun for analytics junkies to use all those available zettabytes of data and answer questions that they’ve long wondered about. But the paper also reminded us of the dangers of ignoring the natural cycles of an analytics process that we talked about in this recent post. If Google followed the PatternBuilders Analytics Methodology, they might have avoided many of the errors that GFT is now spitting out. In fact, the authors of the paper point out that:
“Although not widely reported until 2013, the new GFT has been persistently overestimating flu prevalence for a much longer time. GFT also missed by a very large margin in the 2011-2012 flu season and has missed high for 100 out of 108 weeks starting with August 2011… This pattern means that GFT overlooks considerable information that could be extracted by traditional statistical methods.”
This overestimation is attributed to two primary factors: data hubris and algorithm dynamics. (more…)
A recent conversation with a client reminded me that no matter how crazy and exciting the Big Data world gets, it is still critical to understand what your goals are and where you are in the process of reaching those goals. Having a good foundation in “what’s important” is critical before you jump into the wild world of Big Analytics.
For example, in big data (well, actually all data but I digress) “Reporting” and “Analytics” are very different functions. But I often find our customers and prospects grappling with how to distinguish one from the other and as a result, confusing reporting with analysis and losing track of their real goals.
New Year (2014) Rumination: Death of privacy as we know it? Or inflection point signaling better things to come?
I am, and always have been, a glass half-full kind of gal. In fact, way back in September 2011 when Terence and I published our book on Privacy and Big Data, I was far more optimistic than he was on the future of privacy—of course, it’s easy to sound optimistic when your co-author states that privacy is dead. (And yes, we are still working on our book update but we do have day jobs and a significant release in the works so it is slow going but going it is.)
At that time, those in the “digital privacy know” characterized our book as a decent overview. Our intent at the time was to help those NOT in the “digital privacy know” get their arms around the privacy issues from a legislative, corporate, and government perspective. To our surprise, those not in the know included lots of folks in the high tech community! We did a number of interviews and dealt with informed and somewhat uninformed media folk—those in the mainstream focused on social media and those on the fringes (left and right) wanted to do deep dives into legal issues, government uses of data, and fourth amendment rights. Some seemed to think that we were members of the tin foil hat brigade, others that we were naïve, and still others that we were on point. (more…)
This past weekend the Philippines and surrounding areas were decimated by super typhoon Haiyan. The storm was estimated to have sustaining winds of 195 mph with winds gusts up to 235 mph. It is believed to be one of the strongest storms ever recorded.
For those of us who have lived through typhoons and hurricanes (myself included) and felt the power of those storms, what we saw was nothing compared to what the Philippines endured. The death toll is now estimated to be above 10,000 and climbing and the devastation it left is likened to the destruction wrought by the 2004 Indian Ocean tsunami.
It will take years to rebuild and recover from such a disaster and our hearts go out to the people impacted by the typhoon as well as the relief workers who are providing much needed aid to that region. For those of you who would like to help, the New York Times and Reuters have provided lists of organizations (with links) that are providing aid. I would also point you to a local organization (in the Washington area), World Vision, that is intent on providing relief to children and families devastated by the typhoon.
In times like these we are all reminded of the fragility of life, the power of nature, and the undeniable fact that we are truly all in this together. Please help if you can.
Privacy, Anonymity, and Judicial Oversight are on the Endangered List
An age old debate has once again reared its very ugly head due to whistleblower Edward Snowden’s revelations about NSA surveillance, PRISM, and the astounding lack of any rigorous oversight on the NSA’s vast data collection apparatus. While PatternBuilders has been incredibly busy, in our non-copious amounts of spare time Terence and I have also been working on our update to Privacy and Big Data (which is undergoing another rewrite due to new government surveillance revelations that for a while happened hourly, then daily, then weekly but certainly are far from over). It’s important to note that pre-revelations our task was already herculean due to mainstream media’s pick up on “all stories related to privacy” (a good thing) that often missed the mark on the technical side of the house (we often find ourselves explaining to non-techies just what meta data is which usually happens after someone on CNN, Fox, NBC, ABC, etc., butchers the definition) or got tripped up by the various Acts, Amendments, state laws, EU Directives, etc., that apply to aspects of privacy.
Over the last few weeks as details about PRISM emerged, it’s become clear to me that main street America may still not understand the seismic shift that big data and analytics brings to the privacy debate. Certainly the power of big data and analytics has been lauded or vilified in the press—followers of our twitter feed are used to seeing the pros and cons of big data projects debated pretty much every day. We’ve (Terence and I) talked and tweeted about privacy issues as it applies to individuals, companies, and governments. Heck, we even wrote a book about privacy and big data. (more…)
As regular readers of this blog know, Terence and I spend a great deal of time talking about the state of the big data analytics industry and what is needed before mainstream adoption becomes an actual fact (as opposed to the hyperbolic reporting on any and all things related to big data). Recently, we sat down with the Microsoft BizSpark (a partner of ours) team to talk about the state of big data analytics today and why we decided to co-found PatternBuilders. To read the full story, go here. And because I can never resist a great quote (I am in marketing after all), here’s what Terence had to say during our interview:
“We found it disconcerting that there was such a huge divide between big data excitement and actual adoption rates. Taking advantage of big data analytics often requires a budget, toolset and in-house expertise far beyond what most enterprises can muster. Mary and I founded PatternBuilders because we thought there must be a better approach.”
For more information on our technology choices and why we are unabashed fans of Microsoft technologies, you may find these posts helpful:
- Introducing AnalyticsPBI for Azure—A Cloud-Centric, Components-Based, Streaming Analytics Product
- AnalyticsPBI for Azure: Turning Real-Time Signals into Real-Time Analytics
- Enterprise Software in the Cloud: Why We Chose Azure as our First PaaS Platform
A top-level view of our data project over a series of posts.
By Mary Ludloff
Welcome to the third post in our series on a big data project. Our goal is to walk you all the way through a big data project from its inception through its completion (or depending on the project, through deployment and maintenance). Those of you familiar with our series know that we include our Big Data Playbook rules as we address specific topics—we may repeat some as we go along but if you need to refresh your memory on where we are, go to Part 1 and Part 2.
You now know that we are working with the University of Sydney on a project that looks at the impact social media comments have on a company’s stock and whether this mediates the influence of primary news. Specifically: Is a company’s stock price influenced by both and can we isolate and study the impact of those distinct sources on that stock price? (more…)