Posts tagged ‘big data’
A top-level view of our data project over a series of posts.
Welcome to the second post of a series on a big data project that will (Mary and I hope) provide clarity and insights on how to successfully complete a big data initiative. Now, just in case you’ve forgotten the first two rules in our Big Data Playbook, I am going to repeat them here because they play into our topic of the day which is all about “starting” your big data project:
Rule #1: Big Data IS NOT rocket science.
Yes, far too often those lucky internal folks tasked with managing a big data project fall into the trap of data science paralysis which is similar in thought to analysis paralysis. By this I mean that there are so many moving pieces to capture, so many technology decisions to make, so many skill positions that need to be filled, so many fill-in-the-blanks that need to get done that you never actually get started which leads me to our second rule:
Rule #2: Garbage in, garbage out.
Marilyn Craig (Managing Director of Insight Voices, frequent guest blogger, marketing colleague, and analytics guru) and I have been watching the big data “V” pile-on with a bit of bemusement lately. We started with the classic 3 V’s, codified by Doug Laney, a META Group and now Gartner analyst, in early 2001 (yes, that’s correct, 2001). Doug puts it this way:
“In the late 1990s, while a META Group analyst (Note: META is now part of Gartner), it was becoming evident that our clients increasingly were encumbered by their data assets. While many pundits were talking about, many clients were lamenting, and many vendors were seizing the opportunity of these fast-growing data stores, I also realized that something else was going on. Sea changes in the speed at which data was flowing mainly due to electronic commerce, along with the increasing breadth of data sources, structures and formats due to the post Y2K-ERP application boom were as or more challenging to data management teams than was the increasing quantity of data.”
Doug worked with clients on these issues as well as spoke about them at industry conferences. He then wrote a research note (February 2001) entitled “3-D Data Management: Controlling Data Volume, Velocity and Variety” which is available in its entirety here (pdf too). (more…)
Greetings one and all! 2012 was a breakout year for PatternBuilders and we are very grateful to all of you for helping to make that happen. But we would also like to take a minute to extend our condolences and share the grief of parents across the world that lost young children to violence. Newtown was singularly horrific but similar events play out all too often across the globe. We live in an age of technical wonders—surely we can find ways to protect the world’s children.
This is our last post of 2012 and in the spirit of the season, we decided to do something a little different this year. Recently, the Wall Street Journal asked 20 of its “friends” to tell them what books they enjoyed in 2012 and the responses were equally eclectic and interesting. Not to be outdone, Adam Thierer published his list of cyberlaw and info-tech policy books for 2012. Many of the recommendations culled from both sources ended up on our reading lists for 2013 (folks, 2012 is almost over and between launching AnalyticsPBI for Azure and working on our update for Privacy and Big Data, not a lot of “other” reading is going to happen during the holiday season!) and spurred an interesting discussion about our favorite reads of the year. One caveat: Our lists may include books we read but were not necessarily published this year. So without further ado, I give you our favorite reads of 2012! (more…)
I had to miss Strata due to a family emergency. While Mary picked up the slack for me at our privacy session, and by all reports did her usual outstanding job, I also had to cancel a Tuesday night Strata session sponsored by 10Gen on how PatternBuilders has used Mongo and Azure to create a next generation big data analytics system. The good news is that I should have some time to catch up on my writing this week so look for a version of what would have been my 10Gen talk shortly. In the meantime, to get me back in the groove, here is a very short post inspired by a Forbes post written by Dan Everett of SAP on “Hadoopla”
As a CEO of a real-time big data analytics company that occasionally competes with parts of the Hadoop ecosystem, I may have some biases (you think?). But I certainly agree that there is too much Hadoopla (a great term). If our goal as an industry is to move Big Data out of the lab and into mainstream use by anyone other than the companies that thrive on and have the staff to support high maintenance and very high skill technologies, Hadoop is not the answer – it has too many moving parts and is simply too complex.
To quote from a blog post I wrote a year ago:
“Hadoop is a nifty technology that offers one of the best distributed batch processing frameworks available, although there are other very good ones that don’t get nearly as much press, including Condor and Globus. All of these systems fit broadly into the High Performance, Parallel, or Grid computing categories and all have been or are currently used to perform analytics on large data sets (as well as other types of problems that can benefit from bringing the power of multiple computers to bear on a problem). The SETI project is probably the most well know (and IMHO, the coolest) application of these technologies outside of that little company in Mountain View indexing the Internet. But just because a system can be used for analytics doesn’t make it an analytics system…..“
Why is the industry so focused on Hadoop? Given the huge amount of venture capital that has been poured into various members of the Hadoop eco-system and that eco-system’s failure to find a breakout business model that isn’t hampered by Hadoop’s intrinsic complexity, there is ample incentive for a lot of very savvy folks to attempt to market around these limitations. But no amount of marketing can change the fact that Hadoop is a tool for companies with elite programmers and top of the line computing infrastructures. And in that niche, it excels. But it was not designed, and in my opinion will never see, broad adoption outside of that niche despite the seeming endless growth of Hadoopla.
Let me tell you a little secret: I always know when I am talking (and working) with a company that has successfully launched big data initiatives. There are three characteristics that these companies share:
- A C-level executive runs the “[big] data operations.”
- The Chief Data Officer (even if they are the CIO) has a heavy business/operations background.
- The data team is focused on the “business,” not the data.
Did you notice that technology and data science are not reflected in any of the characteristics? Some of you may consider this sacrilege—after all, we are operating in a world where technology (and I happily work for one of those companies) has changed the data collection, usage, and analysis game. Colleges and universities are now offering master degrees in analytics. The role of the data scientist has been pretty much deified (I refer you to Part 1 of this series). And we all need to be very worried about the “talent shortage” and our ability to recruit the “right analytical team” (I refer you to Part 2 of this series).
Yes—technology has had a tremendous impact on how much data we can collect and the ways in which we can analyze it but not everyone needs to be a senior computer programmer. Yes—we all should strive to be more mathematically inclined but not all of us need Master’s or PhD’s in statistics or analytics. Yes—some companies, based on their business models, may have a staff of data scientists but others may get along just fine without one (with the occasional analytics consultant lending a hand). (more…)
By Marilyn Craig, Managing Director, Insight Voices
As you may or may not know, we are in the midst of a 3-part series on data science, covering roles, skills, etc.—generally what you should think about as well as what’s not as important (no matter what the latest articles say!). For Part 2, we have a guest poster—Marilyn Craig of Insight Voices. Marilyn is what I like to call a “classic quant.” She has been at the forefront of big data and data science before most people knew these terms (and spaces) existed and has been my go-to person whenever I had an analytics question (see title) that I needed an answer to. In this post, Marilyn looks at insights and makes the case for why we should all care far more about answers. Take it away Marilyn!
Here’s an interesting question for this new world order of Big Data Analytics: what’s an Insight and what’s an Answer? Sometimes they are the same, sometimes not. An insight is a piece of information or understanding. It may or may not be useful. It may or may not help your business improve, solve world hunger, or even make sense. An answer is always useful. It is the result of asking a question. And the best kinds of answers are those that solve the questions that you really care about. (more…)
I apologize for falling behind on blogging, but between several new hires, major partnerships, and the industry finally starting to understand the need for product-driven (instead of project-driven) big data, things have been very hectic. Good, but hectic.
I did want to pull my head off my keyboard for a minute to tell you about participating in the big data & real estate panel this Thursday at Connect San Francisco. Our panel will be moderated by industry luminary Brad Inman @bradInman.
Real estate has always been a data-driven business and is relying more and more on the insights and operational nimbleness provided by big data. For those of you who are scratching your heads and going, “Huh, Real Estate and big data?” – think about it for a minute. The real estate industry is “using” big data to do all kinds of things and drive all kinds of business models, such as:
- Commercial landlords using smart thermostats and smart windows adjusted in real-time to save energy.
- Capturing real-time parking meter data to make real-time decisions about how long to leave a retail location open.
- Using real-time video analysis to stop vandalism before it happens.
- Offering sophisticated analytics – see consumer facing sites like Truila and Zillow.
- Risk Modeling – check out RMS. Like most of the PatternBuilders team, they were “doing” Big Data before the term was invented.
If you are attending the show, stop by and say hi. If you are interested in Big Data & Real Estate, look for our post-Connect blog next week. In it, we will talk about some great insights about the New York real estate market derived from a ton of data we grabbed from the NYC public data market which was then spun up in the PatternBuilders framework on our brand spanking new Microsoft Azure cloud beta release.
A few years ago, Terence and I were trying to get our arms around the world of big data and how to effectively communicate its size and challenges as we were talking to analysts, media, prospects, partners, and, of course, the venture community. Hopefully, most of our readers would acknowledge that we have a fine grasp of language (we like to engage in a Scrabble battle of wits upon occasion) but I must admit that we have been eclipsed by many in the race to illustrate just how “big” big really is. My favorite is this:
“We are engulfed by a supernova of data.”
Now, Webster’s defines supernova in two ways:
“1: the explosion of a star in which the star may reach a maximum intrinsic luminosity one billion times that of the sun or 2: one that explodes into prominence or popularity.”
I have purposely given myself a week to reflect on pii2012 before blogging about it because there was a lot of information to absorb. As I look back on the conference, two themes come to mind: trust and transparency. Where to begin? Well, as Jim Adler (@Jim_Adler) tweeted:
“#pii2012 is the best collection of geeks, wonks, and suits.”
This was a conference chock full of interesting ideas, opposing views, and, at times, heated debate. One of the most interesting panels featured four people from very different backgrounds and points in their lives discussing the effect social media and the digital world has on how they conduct themselves online. There is a great post by Marina Ziegler (@Marina_Z) that covers it in depth, but like Marina, I felt that this was the key takeaway:
“The panel, while made up of experts, provided some very direct and honest understanding of how the average person is unaware, and how there is still work to be done in helping average folks understand how to navigate their social online lives, even down to the applications they use online.”