Author Archive

Strata West, Law, Ethics, and Open Data: Smart People Solving Some Very Hard Problems

By Terence Craig

Strata 3Last week the Bay Area was treated to another great Strata West hosted by the O’Reilly team. For those of you who weren’t able to make it, keep checking strataconf.com for updates on the videos and speaker slides—one of the great things about this conference is that many of the sessions are available to anyone as are the videos and slides.

I had the pleasure of co-hosting the Law, Ethics, and Open Data track with my friend and fellow O’Reilly Author (and Civilization devotee), Alex Howard.  Alex is O’Reilly’s government reporter and his book, Data for the Public Good, is a must read. Our track was two days long and featured thoughtful sessions and speakers–bringing together people who are solving difficult technology problems and then showing us how those problems and solutions are impacting lives and society. If you check out my tweets from last week you’ll see my 140 character attempts to highlight some of the sessions.  Here is a “longer” version of the highlights of the sessions I hosted:

  • Fred Trotter and DocGraphFred actually tweeted his presentation as he was giving it, so check out @fredtrotter for last Thursday starting around 10:40 am PST.  A presentation of 140 character sound bites made for a very succinct message.  He’s done some amazing work creating the DocGraph, probably the largest public social graph in the world, showing the referral relationships between doctors in the US. You can view a nice visualization his team has done here. (more…)

March 8, 2013 at 6:02 pm 1 comment

AnalyticsPBI for Azure: Turning Real-Time Signals into Real-Time Analytics

By Terence Craig

PBI 3 0 archslide 3For the second post on AnalyticsPBI for Azure (first one here), I thought I would give you some insight on what is required for a modern real-time analytics application and talk about the architecture and process that is used to bring data into AnalyticsPBI and create analytics from them. Then we will do a series of posts on retrieving data. This is a fairly technical post so if your eyes start to glaze over, you have been warned.

In a world that is quickly moving towards the Internet of Things, the need for real-time analysis of high velocity and high volume data has never been more pronounced. Real-time analytics (aka streaming analytics) is all about performing analytic calculations on signals extracted from a data stream as they arrive—for example, a stock tick, RFID read, location ping, blood pressure measurement, clickstream data from a game, etc. The one guaranteed component of any signal is time (the time it was measured and/or the time it was delivered).  So any real-time analytics package must make time and time aggregations first class citizens in their architecture. This time-centric approach provides a huge number of opportunities for performance optimizations. It amazes me that people still try to build real-time analytics products without taking advantage of them.

Until AnalyticsPBI, real-time analytics were only available if you built a huge infrastructure yourself (for example, Wal-Mart) or purchased a very expensive solution from a hardware-centric vendor (whose primary focus was serving the needs of the financial services industry). The reason that the current poster children for big data (in terms of marketing spend at least), the Hadoop vendors, are “just” starting their first forays into adding support for streaming data (see CloudEra’s Impala, for example) is that calculating analytics in real-time is very difficult to do. Period.

(more…)

December 12, 2012 at 5:22 pm 8 comments

Introducing AnalyticsPBI for Azure—A Cloud-Centric, Components-Based, Streaming Analytics Product

By Terence Craig

It has been a while since I’ve done posts that focus on our technology (and big data tech in general). We are now about 2 months out from the launch of the Azure version  of our analytics application, AnalyticsPBI, so it is the perfect time to write some detailed posts about our new features. Consider this the first in the series.

But before I start exercising my inner geek, it probably makes sense to take a look at the development philosophy and history that forms the basis of our upcoming release. Historically, we delivered our products in one of two ways:

  • As a framework which morphed (as of release 2.0) into AnalyticsPBI, our general analytics application designed for business users, quants, and analysts across industries.
  • As vertical applications (customized on top of AnalyticsPBI) for specific industries (like FinancePBI and our original Retail Analytics application) which we sold directly to companies in those industries.

(more…)

November 29, 2012 at 8:38 am 8 comments

Even Geniuses Pass Away

By Terence Craig

Today, I got the sad news that a dear friend and an early contributor to PatternBuilders passed away.

Andrew (Andrei) Leman was a gruff, kind and generous man who will be deeply missed.  Andrei was also a very talented mathematician and software engineer who created some of the fundamental theories around the mathematics of graphs.  His papers on that subject are still heavily cited.

More importantly Andrei was a loving husband to his wife Elena and a great friend  and mentor to many, many folks.

He will be missed but his work and the respect and affection he engendered will endure.

пухом my friend.

November 8, 2012 at 6:19 pm Leave a comment

“Hadoopla”

© Marqin Cook

By Terence Craig

I had to miss Strata due to a family emergency. While Mary picked up the slack for me at our privacy session, and by all reports did her usual outstanding job, I also had to cancel a Tuesday night Strata session sponsored by 10Gen on how PatternBuilders has used Mongo and Azure to create a next generation big data analytics system.   The good news is that I should have some time to catch up on my writing this week so look for a version of what would have been my 10Gen talk shortly. In the meantime, to get me back in the groove, here is a very short post inspired by a Forbes post written by Dan Everett of SAP on “Hadoopla”

As a CEO of a real-time big data analytics company that occasionally competes with parts of the Hadoop ecosystem, I may have some biases (you think?).  But I certainly agree that there is too much Hadoopla (a great term).  If our goal as an industry is to move Big Data out of the lab and into mainstream use by anyone other than the companies that thrive on and have the staff to support high maintenance and very high skill technologies, Hadoop is not the answer – it has too many moving parts and is simply too complex.

To quote from a blog post I wrote a year ago:

“Hadoop is a nifty technology that offers one of the best distributed batch processing frameworks available, although there are other very good ones that don’t get nearly as much press, including Condor and Globus.  All of these systems fit broadly into the High Performance, Parallel, or Grid computing categories and all have been or are currently used to perform analytics on large data sets (as well as other types of problems that can benefit from bringing the power of multiple computers to bear on a problem). The SETI project is probably the most well know (and IMHO, the coolest) application of these technologies outside of that little company in Mountain View indexing the Internet. But just because a system can be used for analytics doesn’t make it an analytics system…..

Why is the industry so focused on Hadoop? Given the huge amount of venture capital that has been poured into various members of the Hadoop eco-system and that eco-system’s failure to find a breakout business model that isn’t hampered by Hadoop’s intrinsic complexity, there is ample incentive for a lot of very savvy folks to attempt to market around these limitations.  But no amount of marketing can change the fact that Hadoop is a tool for companies with elite programmers and top of the line computing infrastructures. And in that niche, it excels.  But it was not designed, and in my opinion will never see, broad adoption outside of that niche despite the seeming endless growth of Hadoopla.

October 24, 2012 at 1:39 pm 1 comment

Black Founders Conference

I am thrilled to be a mentor at the Black Founders Conference in San Francisco.  The event, is sponsored by Black Founders a group that is attacking the digital divide by promoting entrepreneurship. With luminary speakers such as Mitch Kapor (Lotus), Steve Blank (E.phinany) and Charles Hudson (Softech VC), it is sure to be a great event.

September 5, 2012 at 3:20 pm Leave a comment

Speaking on Inman Connect Panel on Real Estate and Big Data

By Terence Craig

I apologize for falling behind on blogging, but between several new hires,  major partnerships, and the industry finally starting to understand the need for product-driven (instead of project-driven) big data, things have been very hectic. Good, but hectic.

I did want to pull my head off my keyboard for a minute to tell you about participating in the big data & real estate panel this Thursday at Connect San Francisco.  Our panel will be moderated by industry luminary Brad Inman @bradInman.

Real estate has always been a data-driven business and is relying more and more on the insights and operational nimbleness provided by big data.  For those of you who are scratching your heads and going, “Huh, Real Estate and big data?” – think about it for a minute.  The real estate industry is “using” big data to do all kinds of things and drive all kinds of business models, such as:

  • Commercial landlords using smart thermostats and smart windows adjusted in real-time to save energy.
  • Capturing real-time parking meter data to make real-time decisions about how long to leave a retail location open.
  • Using real-time video analysis to stop vandalism before it happens.
  • Offering sophisticated analytics – see consumer facing sites like Truila and Zillow.
  • Risk Modeling – check out RMS. Like most of the PatternBuilders team, they were “doing” Big Data before the term was invented.

If you are attending the show, stop by and say hi. If you are interested in Big Data & Real Estate, look for our post-Connect blog next week. In it, we will talk about some great insights about the New York real estate market derived from a ton of data we grabbed from the NYC public data market which was then spun up in the PatternBuilders framework on our brand spanking new Microsoft Azure cloud beta release.

August 1, 2012 at 9:37 pm Leave a comment

Older Posts Newer Posts


Video: Big Data Made Easy

PatternBuilders Corporate

Special privacy section!

Previous Posts


%d bloggers like this: