You Are Here: Home » Big Data » NoSQL Is for the Birds

NoSQL Is for the Birds

Scale breaks everything. Scale even breaks your assumptions about how best to store and query data. Scale does not care about your personal engineering preferences, or about SQL vs. NoSQL. The demands of rapid growth and ever-higher expectations for availability, performance, and cost efficiency force you to re-evaluate and re-imagine what you need, what is possible, and how to best achieve your business goals. This is the context in which non-relational databases like Dynamo, BigTable, Memcache, and Membase were conceived and built. However, even when relational databases are used to build large-scale services, they are unrecognizable as relational. Instead, they look almost exactly like a NoSQL database.

This perspective is rare outside of the companies forced to embrace it. The explosion of open-source databases, including those from major online services, presents a great opportunity to see how things look when engineers are faced with the demands of enormous scale. Let’s embrace that opportunity, as those companies have, by examining a production data storage service and see what SQL really looks like at scale.

Dissecting the Bird

Twitter began as a monolithic relational database accessed by a monolithic Rails application. Facebook began as a monolithic relational database accessed by a monolithic PHP application. Amazon began as a monolithic relational database accessed by a monolithic C++ application. From these humble beginnings, through many painful lessons in growth, all have developed the tools they need to thrive at enormous scale. The tools encode those lessons, so examining them can be instructive.

We’ll take as our example a system that is both used in production at large scale and is open source: FlockDB, the storage service that maintains the Twitter social graph. While billed as a graph database, FlockDB is better described as a set database: it stores sets of adjacencies and supports a small number of operations over those sets. FlockDB is typical of storage systems at large, online services; the interface is extremely narrow, and clients are very loosely coupled to the service. This structure is common, because broad, complex interfaces and synchronous dependencies, like transactions, are hard to scale. If you’ve ever seen lock pile-ups in a relational database, you have some idea of this already. At scale, such things are lethal.


About The Author

Number of Entries : 18

Leave a Comment

© 2011 Third Eye Consulting Services & Solutions LLC.

Scroll to top