Posts tagged ‘NOSQL’
For the second post on AnalyticsPBI for Azure (first one here), I thought I would give you some insight on what is required for a modern real-time analytics application and talk about the architecture and process that is used to bring data into AnalyticsPBI and create analytics from them. Then we will do a series of posts on retrieving data. This is a fairly technical post so if your eyes start to glaze over, you have been warned.
In a world that is quickly moving towards the Internet of Things, the need for real-time analysis of high velocity and high volume data has never been more pronounced. Real-time analytics (aka streaming analytics) is all about performing analytic calculations on signals extracted from a data stream as they arrive—for example, a stock tick, RFID read, location ping, blood pressure measurement, clickstream data from a game, etc. The one guaranteed component of any signal is time (the time it was measured and/or the time it was delivered). So any real-time analytics package must make time and time aggregations first class citizens in their architecture. This time-centric approach provides a huge number of opportunities for performance optimizations. It amazes me that people still try to build real-time analytics products without taking advantage of them.
Until AnalyticsPBI, real-time analytics were only available if you built a huge infrastructure yourself (for example, Wal-Mart) or purchased a very expensive solution from a hardware-centric vendor (whose primary focus was serving the needs of the financial services industry). The reason that the current poster children for big data (in terms of marketing spend at least), the Hadoop vendors, are “just” starting their first forays into adding support for streaming data (see CloudEra’s Impala, for example) is that calculating analytics in real-time is very difficult to do. Period.
We have recently made a big architectural change concerning our storage back-end and I wanted to talk about it.
Storage is key to any Big Data problem. As we’ve mentioned in prior posts, most of our performance bottlenecks and optimizations have to do with storage performance and architecture, as opposed to computation. Our architecture for the last few years has consisted of a hybrid approach with “no-SQL” analytics storage using MongoDB and “non-transactional” data stored in a traditional RDBMS, primarily SQL Server. There were a couple of reasons for this architecture. First, we started off entirely in RDBMS-land, because our initial design was done before no-SQL systems were really at a production-level of maturity. Second, most of our customers and prospects had traditional schemas and data organization – making integration easier if we could just use the same object model. (more…)
I am excited to announce that I will be speaking at MongoSF 2011 with my fellow data wrangler, Tim. Our talk will cover how we used Mongo to build the PatternBuilders Analytics Framework. The official title for our talk is: Building a Streaming Analytics System with Mongo.
In a previous post, I talked about the impact our Social Media Analytics solution had on our deployment choices. Briefly, we wanted to make a beta version of our solution publicly available on the web and to do that, we needed to ensure sufficient capacity. Since we did not want to make a massive investment in the infrastructure to support it, we investigated the state of cloud servers. Long story short, as part of our move from our colo to the cloud we made a significant change in architecture, fully embraced some of MongoDB’s more advanced capabilities, and created a radically improved product – although the previous version was pretty cool too! (more…)
It’s that time of year, so I guess will follow the herd and create my completely arbitrary, end of the year, best of list. Don’t know if it will be a top ten list yet or not, but let’s see how many I come up with. Warning for the non-programmers: this list is focused on technology alone. (more…)