Google has recently announced that its flagship wide-area database named Spanner has been made available on the Google Cloud. Google Spanner is the next generation globally-distributed database built inside Google and announced to the world through the paper published in OSDI 2012 . This article explores the implication of Google Spanner, in particular to the NoSQL world.
CAP Theorem: A Quick Recap
The three important aspects of a distributed system include:
– Consistency: It is defined w.r.t. the memory being similar to atomic read-write memory – each operation works as if it has full control on the data item and operations are sequenced, one after the other. Any read operation that begins after a write operation completes, must return the value of that write operation or the value of a later write operation.
– Availability: It is defined as every request received by a non-failing node in the system must result in a response, i.e., every request eventually must be responded to. This can also be seen as a strong definition, as even in the face of network partitions, every request must eventually terminate or receive a response.
– Partition: When a network is partitioned, nodes are split into two groups and any message sent from one group to another is lost.
Brewer in his PODC 2000 talk conjectured that out of the three Consistency, Availability and Partition tolerance, only two are achievable in any distributed system. Seth Gilbert and Nancy Lynch formalized this notion and gave a theorem to the effect that in any asynchronous distributed system it is impossible to build an atomic read-write object that can guarantee both properties of availability and atomic consistency in the face of arbitrary failures a.k.a partitions .
The last few years have seen significant advances made by NoSQL databases such as Cassandra
among others and their adoption across enterprises. The key features of NoSQL databases include schema flexibility, linear scalability, key-based sharding of data across a distributed set of nodes etc. They were primarily intended for high read-write ratio based workloads and were constrained by the CAP theorem – implying that under a network partition, the database can achieve only one out of availability or consistency. The NoSQL databases such as MongoDB, Redis, Google BigTable as well as HBase (and HDFS) sacrifice availability and ensure consistency in the face of partitions (CP systems). Cassandra, Riak and Voldermort were based on Amazon Dynamo and made the system available, sacrificing strict consistency (AP systems). They provide eventual consistency, a relaxed form of consistency compared to the stricter forms provided by relational databases.
Google Spanner: An Introduction
Google Spanner, on the other hand, provides ACID consistency across a wide-area based distributed system. It provides a strict form of consistency known as Linearizability . It is the first system to do so at global scale . Spanner assigns global timestamps to transactions across a distributed set of nodes; timestamps reflect serialization order. The key to Spanner’s global timestamps are the TrueTime API and its implementation. The TrueTime API abstracts and exposes clock uncertainty and allows applications to reason with uncertainty, while the TrueTime API implementation in Google’s datacenters restricts the uncertainty to less than 10 milliseconds. The uncertainty is small compared to say NTP where the deltas between different clocks across a distributed system can be as high as 250 milliseconds. Google’s TrueTime API implementation has achieved that by having two physical clocks on each node: atomic and GPS.
Source: Google Spanner: Beginning of the End of the NoSQL World? – ACM SIGMOD Blog