Big Data Project: Objectives First, Plan Second (Part 3)
A top-level view of our data project over a series of posts.
By Mary Ludloff
Welcome to the third post in our series on a big data project. Our goal is to walk you all the way through a big data project from its inception through its completion (or depending on the project, through deployment and maintenance). Those of you familiar with our series know that we include our Big Data Playbook rules as we address specific topics—we may repeat some as we go along but if you need to refresh your memory on where we are, go to Part 1 and Part 2.
You now know that we are working with the University of Sydney on a project that looks at the impact social media comments have on a company’s stock and whether this mediates the influence of primary news. Specifically: Is a company’s stock price influenced by both and can we isolate and study the impact of those distinct sources on that stock price? (more…)
Enterprise Software in the Cloud: Why We Chose Azure as our First PaaS Platform
I’ve been absent from the blog too long, but if you’ve been following my colleagues (Mary and Marilyn) postings, you’ll see it’s been a very busy and fruitful time at PatternBuilders. While I’m still overdue for the next segment of the architecture blog series, I thought I would take a break and talk a bit about some of the things we learned as we moved our product and business model to Microsoft Azure.
As someone who has worked with Microsoft technology and partnered with them off and on over the last two decades (even flirting with going to work for them a couple of times), the most surprising discovery was how serious Microsoft has become about the cloud, open source, and being an active and supportive partner for startups. As many of you who have been around as long as I have will no doubt remember, this is a very different, some would say revolutionary, move for the world’s most powerful proprietary software company. We had some concerns when we became members of Microsoft’s Azure Startup program BizSpark Plus and subsequently the more exclusive BizSpark One, but it has turned out to be a great experience for us on both the business and technical level. (more…)
Boston Marathon Bombings: How To Help
Sadly, this week we were reminded once again of the fragility of life and the resilience of the human spirit. Terence, myself, and the PatternBuilders team send our condolences to all who were impacted by this tragedy. For those who would like to help, donations can be made to:
- The One Fund—specifically formed to help those most affected by the bombings.
- The New England Patriots Charitable Foundation—all donations denoted with the words “Boston Marathon” will be earmarked for The One Fund.
- Boston’s First Responders Fund—also specifically established to benefit the victims of the bombings.
A number of resources can also be found here.
Much as it pains me to say this, beware of bogus Boston Marathon charity websites. Melanie Hicken of CNNMoney offers some advice on what to look out for.
Finally, there have been many moving tributes made by people via blogs, twitter, and other media sources. We leave you with this simple statement projected on the wall of the Brooklyn Academy of music:
Big Data Project: Start with a Question that You Want to Answer
A top-level view of our data project over a series of posts.
Welcome to the second post of a series on a big data project that will (Mary and I hope) provide clarity and insights on how to successfully complete a big data initiative. Now, just in case you’ve forgotten the first two rules in our Big Data Playbook, I am going to repeat them here because they play into our topic of the day which is all about “starting” your big data project:
Rule #1: Big Data IS NOT rocket science.
Yes, far too often those lucky internal folks tasked with managing a big data project fall into the trap of data science paralysis which is similar in thought to analysis paralysis. By this I mean that there are so many moving pieces to capture, so many technology decisions to make, so many skill positions that need to be filled, so many fill-in-the-blanks that need to get done that you never actually get started which leads me to our second rule:
Rule #2: Garbage in, garbage out.
Big Data Project: Let’s Start at the Very Beginning—The Big Data Playbook
In my last post, I wrote about the three V’s of big data and why there are only three. There has been a messaging pile-on that seems to be happening in the big data space that even I, long-time marketer, find disconcerting. So, over the course of a number of posts, my colleague, Marilyn Craig, and I are going to de-mystify a big data project, taking apart each stage of a real big data initiative as if it were a release post-mortem. We will be talking about roles and responsibilities, data governance, project and process management, what went right, what went wrong, what we should have done differently. Except in this case, it will not be after the fact but rather a stage-by-stage review as we work on a real-world project. For your sanity and ours, we have created a special category, Big Data Project, as well as a tag with the same name. If you search on either, you will see all posts related to the project. Additionally, all posts about the project will start with Big Data Project in the title. Who knows? Maybe when we’re done, we’ll write a book (knowing what I know now about writing a book, I can’t believe I just said that)!
We’ll talk more about the project in the next post but first I wanted to take a look at a big data failure that anyone involved in a major enterprise application deployment could have seen coming and is Rule #1 in our big data playbook:
Rule #1: Big Data IS NOT rocket science.
Strata West, Law, Ethics, and Open Data: Smart People Solving Some Very Hard Problems
Last week the Bay Area was treated to another great Strata West hosted by the O’Reilly team. For those of you who weren’t able to make it, keep checking strataconf.com for updates on the videos and speaker slides—one of the great things about this conference is that many of the sessions are available to anyone as are the videos and slides.
I had the pleasure of co-hosting the Law, Ethics, and Open Data track with my friend and fellow O’Reilly Author (and Civilization devotee), Alex Howard. Alex is O’Reilly’s government reporter and his book, Data for the Public Good, is a must read. Our track was two days long and featured thoughtful sessions and speakers–bringing together people who are solving difficult technology problems and then showing us how those problems and solutions are impacting lives and society. If you check out my tweets from last week you’ll see my 140 character attempts to highlight some of the sessions. Here is a “longer” version of the highlights of the sessions I hosted:
- Fred Trotter and DocGraph—Fred actually tweeted his presentation as he was giving it, so check out @fredtrotter for last Thursday starting around 10:40 am PST. A presentation of 140 character sound bites made for a very succinct message. He’s done some amazing work creating the DocGraph, probably the largest public social graph in the world, showing the referral relationships between doctors in the US. You can view a nice visualization his team has done here. (more…)
A Big Data Showdown: How many V’s do we really need? Three!
Marilyn Craig (Managing Director of Insight Voices, frequent guest blogger, marketing colleague, and analytics guru) and I have been watching the big data “V” pile-on with a bit of bemusement lately. We started with the classic 3 V’s, codified by Doug Laney, a META Group and now Gartner analyst, in early 2001 (yes, that’s correct, 2001). Doug puts it this way:
“In the late 1990s, while a META Group analyst (Note: META is now part of Gartner), it was becoming evident that our clients increasingly were encumbered by their data assets. While many pundits were talking about, many clients were lamenting, and many vendors were seizing the opportunity of these fast-growing data stores, I also realized that something else was going on. Sea changes in the speed at which data was flowing mainly due to electronic commerce, along with the increasing breadth of data sources, structures and formats due to the post Y2K-ERP application boom were as or more challenging to data management teams than was the increasing quantity of data.”
Doug worked with clients on these issues as well as spoke about them at industry conferences. He then wrote a research note (February 2001) entitled “3-D Data Management: Controlling Data Volume, Velocity and Variety” which is available in its entirety here (pdf too). (more…)




