Top Signs You Need NoSQL For Your Data
When your relational database takes longer to process your data than to collect it, it’s time to call in big data technology, said panelists at Interop.
Not everyone is sure whether they have big data or not, or whether they need a NoSQL system to handle it. One way to find out, said one adopter of a NoSQL approach, is to ask yourself whether it is taking you longer to process your data than it did to collect it.
The Enterprise Cloud Summit Monday at Interop 2011 in Las Vegas, a UBM TechWeb event, called on Jeremy Edberg, senior product developer at Reddit.com, and Bradford Stephens, founder and CEO of Drawn to Scale, a big data consulting firm, to address the confusion.
Reddit.com is the social news site where anyone may submit a post of either self-created content or linked content and let other viewers vote on it. With enough positive votes versus negative, a blog, news story, or other item gets positioned on Reddit.com’s front page.
Reddit.com collects so much information and records so many user interactions that Edberg realized at one point its relational database system was taking nearly as long to process the data as the site spent collecting it. Edberg started tracking the processing time and realized at a later date that it was taking 25 hours to process data collected over 24 hours.
Turn the mobile device management challenge into a business opportunity.
Discover four strategies to secure your mobile environment.
He concluded that the situation was untenable. If the time the database system took to extract, transform, and load the data was growing longer than the collection phase, “pretty soon we were going to be in the infinite pit of despair.”
Stephens said his experience as lead platform engineer at Visible Technologies, a firm producing business intelligence for social media, was similar to Edberg’s. The main problem is that relational databases function most effectively when they sit on one large server. Relational systems do not easily distribute data across a cluster without introducing latencies into the database’s operations.
Stephens said he tried to solve the problem through sharding, or distributing subsets of data around a cluster, each with its own database system to manage it as a discrete unit, “but we still couldn’t get reads fast enough.”
- Follow this posting on InformationWeek…
- Find other postings from InformationWeek…