Posts filed under ‘O’Reilly’

Strata West, Law, Ethics, and Open Data: Smart People Solving Some Very Hard Problems

By Terence Craig

Strata 3Last week the Bay Area was treated to another great Strata West hosted by the O’Reilly team. For those of you who weren’t able to make it, keep checking strataconf.com for updates on the videos and speaker slides—one of the great things about this conference is that many of the sessions are available to anyone as are the videos and slides.

I had the pleasure of co-hosting the Law, Ethics, and Open Data track with my friend and fellow O’Reilly Author (and Civilization devotee), Alex Howard.  Alex is O’Reilly’s government reporter and his book, Data for the Public Good, is a must read. Our track was two days long and featured thoughtful sessions and speakers–bringing together people who are solving difficult technology problems and then showing us how those problems and solutions are impacting lives and society. If you check out my tweets from last week you’ll see my 140 character attempts to highlight some of the sessions.  Here is a “longer” version of the highlights of the sessions I hosted:

  • Fred Trotter and DocGraphFred actually tweeted his presentation as he was giving it, so check out @fredtrotter for last Thursday starting around 10:40 am PST.  A presentation of 140 character sound bites made for a very succinct message.  He’s done some amazing work creating the DocGraph, probably the largest public social graph in the world, showing the referral relationships between doctors in the US. You can view a nice visualization his team has done here. (more…)

March 8, 2013 at 6:02 pm 1 comment

“Hadoopla”

© Marqin Cook

By Terence Craig

I had to miss Strata due to a family emergency. While Mary picked up the slack for me at our privacy session, and by all reports did her usual outstanding job, I also had to cancel a Tuesday night Strata session sponsored by 10Gen on how PatternBuilders has used Mongo and Azure to create a next generation big data analytics system.   The good news is that I should have some time to catch up on my writing this week so look for a version of what would have been my 10Gen talk shortly. In the meantime, to get me back in the groove, here is a very short post inspired by a Forbes post written by Dan Everett of SAP on “Hadoopla”

As a CEO of a real-time big data analytics company that occasionally competes with parts of the Hadoop ecosystem, I may have some biases (you think?).  But I certainly agree that there is too much Hadoopla (a great term).  If our goal as an industry is to move Big Data out of the lab and into mainstream use by anyone other than the companies that thrive on and have the staff to support high maintenance and very high skill technologies, Hadoop is not the answer – it has too many moving parts and is simply too complex.

To quote from a blog post I wrote a year ago:

“Hadoop is a nifty technology that offers one of the best distributed batch processing frameworks available, although there are other very good ones that don’t get nearly as much press, including Condor and Globus.  All of these systems fit broadly into the High Performance, Parallel, or Grid computing categories and all have been or are currently used to perform analytics on large data sets (as well as other types of problems that can benefit from bringing the power of multiple computers to bear on a problem). The SETI project is probably the most well know (and IMHO, the coolest) application of these technologies outside of that little company in Mountain View indexing the Internet. But just because a system can be used for analytics doesn’t make it an analytics system…..

Why is the industry so focused on Hadoop? Given the huge amount of venture capital that has been poured into various members of the Hadoop eco-system and that eco-system’s failure to find a breakout business model that isn’t hampered by Hadoop’s intrinsic complexity, there is ample incentive for a lot of very savvy folks to attempt to market around these limitations.  But no amount of marketing can change the fact that Hadoop is a tool for companies with elite programmers and top of the line computing infrastructures. And in that niche, it excels.  But it was not designed, and in my opinion will never see, broad adoption outside of that niche despite the seeming endless growth of Hadoopla.

October 24, 2012 at 1:39 pm 1 comment

Privacy and Big Data: Speaking at Strata East (NYC), Book Update, and Upcoming O’Reilly Webcast

By Mary Ludloff

There are times when Terence and I look at each other and say, “What on earth were we thinking?” And this is one of those times! PatternBuilders is crazy busy right now putting out release 3.0 of our Analytics Platform (the secret sauce for our analytics applications that we like to call data-science-in-a-box), ramping up on a funding round, working with partners on a University of Sydney research project on the impact of social media on a company’s stock price (a really fun project and a post about it is in the works), and, of course, supporting customers and prospects on their big data initiatives. So… since we did not have enough to do (sarcasm on), we decided it was time to update our book, participate in a pre-Strata East webcast, speak at the Strata Conference and the MongoDB User Group (that is collocated with Strata) in New York City! In the words of the immortal Bette Davis in All About Eve (and ever so slightly revised):

“Fasten your seat belts, it’s going to be a bumpy night ride!”

Really, what were we thinking????? (more…)

September 20, 2012 at 5:49 pm Leave a comment

Big Data and Cloud not a fit? Comments on Infoworld Article

By Terence Craig

Since Disqus seems to have completely eaten (bleh) my comment on @davidlinthicum’s very interesting InfoWorld post – Big data and the cloud: A far from perfect fit, I decided to just expand my comments and make a short blog post out of it. IMHO the problems that David is describing are more a reflection of problems with batch oriented technologies like Hadoop (more on my take on Hadoop here) in the cloud than a general problem for cloud based big data solutions.

Computing always has, and probably always will have, a bias towards creating batch focused technologies at the beginning of any large paradigm shift.   But as new technologies are absorbed, understood, and move from early adopter to more mainstream use, the batch paradigm will inevitably start to shift to streaming and real-time. We have seen this again and again (from punch cards to touch sensitive tablets, downloaded media to streaming media, DOM to SAX parsers, HTML to Ajax, paper maps to real-time GPS). The reason this evolution almost always occurs is simple: humans live and think in real-time and when our tools do as well we are more productive and happier.  So why do we have this bias for batch processing in our first generation computational technologies? Simply put, because batch processing is a lot easier.

(more…)

February 23, 2012 at 3:01 pm Leave a comment

O’Reilly Webcast On “The Evolution from Private to Public: Is There Privacy in the Digital Age?” Scheduled for October 28 (And it’s free)!

By Mary Ludloff

For those of you who attended our webcast on Privacy and Big Data (replays available) you may remember a little teaser at the end of it regarding an upcoming privacy panel that we are sponsoring. Well, details on the panel are now available and you can register for it here.  And I have got to say that it features a great group of privacy experts:

  • Natalie Fonseca, the moderator, is the co-founder of Tech Policy Summit and the Privacy Identity Innovation Conference.
  • Jim Adler, panelist, is the Chief Privacy Officer and General Manager of Data Systems at Intelius.
  • danah boyd, panelist, is a Senior Researcher at Microsoft Research (amongst many other affiliations) and is known for her work on youth engagement, privacy, and risky behaviors (some of her research is discussed in our book).
  • Terence Craig, panelist, is the CEO/CTO of PatternBuilders, a frequent blogger and speaker on privacy issues, as well as my esteemed co-author on our book “Privacy and Big Data.”
  • Betsy Masiello, panelist, is a Policy Manager on Google’s public policy team and is one of the leads for Google’s privacy efforts.

This esteemed panel is going to address “The Evolution from Private to Public: Is There Privacy in the Digital Age? Tune in on October 28, 10:00 AM PDT, for what promises to be a very lively discussion from panelists that are never shy about sharing their (varied) opinions as they take on the issue of how our private and public worlds are colliding in the digital age!

October 5, 2011 at 6:53 pm Leave a comment

Privacy and Big Data: Webcast Now Available!

By Terence Craig

Mary and I had a great time – and a couple of good arguments during our webcast “Privacy and Big Data: Is there room for privacy in the age of big data?” last week. The kind folks at O’Reilly have just made a recording available in case you missed it. You can find it here.  O’Reilly is also offering 50% off of all its Ebooks (offer expires September 28) including ours so go grab it.  The discount code B2SDEAL.  We would love to hear your comments on the webcast and the book – either as comments on this post or hit us up on twitter @terencecraig, @mludloff or bigprivacy@patternbuilders.com.

September 19, 2011 at 11:34 am Leave a comment


Video: Big Data Made Easy

PatternBuilders Corporate

Special privacy section!

Previous Posts


%d bloggers like this: