What is Distributed SQL?
SQL has been the de-facto language for relational databases (aka RDBMS) for almost four decades. Relational databases are therefore also known as SQL databases. However, the original SQL databases like Oracle, PostgreSQL and MySQL are monolithic from an architectural standpoint. They are unable to distribute data and queries across multiple instances automatically. NewSQL databases emerged to make SQL scalable. However, they also introduced their own painful compromises.
After the introduction of Docker containers and Kubernetes orchestration to create flexible, composable infrastructure starting 2015, microservices-based applications have been on the rise. Cloud-native principles of built-in scaling, resilience and geo-distribution are at the center of this architectural shift. Time was ripe for the introduction of a new class of databases called “Distributed SQL”. The defining characteristic of a distributed SQL database is that the entire database cluster (irrespective of the number of nodes in it) looks to applications as a single logical SQL database.
Distributed SQL databases have a three layer architecture.
1. SQL API
As is evident by the name, a distributed SQL database must have a SQL API for applications to model relational data and also perform queries involving those relations. Typical data modeling constructs that are unique to SQL databases are indexes, foreign key constraints, JOIN queries and multi-row ACID transactions.
2. Distributed Query Execution
Queries should be automatically distributed across multiple nodes of the cluster so that no single node becomes a bottleneck for query processing. Incoming queries should be accepted by any node in the cluster which should then request other nodes to process their portion of their query in such a way that processing latency is minimized, which includes the amount of data transferred between the nodes over the network. The original node accepting the request should then send the aggregated results back to the client application.
3. Distributed Data Storage
Data including indexes should be automatically distributed (aka sharded) across multiple nodes of the cluster so that no single node becomes a bottleneck for ensuring high performance and high availability. Additionally, the database cluster should support strongly consistent replication and multi-row (aka distributed) ACID transactions in order to ensure the single logical database concept.
Strongly Consistent Replication
Supporting a powerful SQL API layer inherently requires the underlying storage layer to be built on strongly consistent replication across the nodes of the database cluster. This means writes to the database would be synchronously committed at multiple nodes in order to guarantee availability during failures. Reads should either serve the last commited write or an error. This property is commonly known as Linearizability. As per the famous CAP Theorem, distributed SQL databases are classified as Consistent and Partition-tolerant (CP).
Distributed ACID Transactions
The database storage layer should also support distributed ACID transactions where transaction coordination across multiple rows located on multiple nodes is required. Usually this requires the use of a 2 Phase Commit (2PC) protocol. Isolation levels, which stand for the I in ACID, signify how strict the database is with respect to concurrent data access. Distributed SQL databases are expected to support Serializability as the most strict isolation level as well as additional weaker isolation levels such as Snapshot.
The above architecture leads to four key benefits.
1. Developer Agility with SQL & Transactions
Even though NoSQL databases such as Amazon DynamoDB, MongoDB and FaunaDB are starting to make some operations transactional, application developers continue to keep SQL databases close to their heart. One of the reasons for this affinity is the inherent power of SQL as a data modeling language that effortlessly models relational and multi-row operations. For example, SQL goes way beyond traditional key-value NoSQL by allowing multi-row transactions both implicitly (using secondary indexes, foreign keys and JOIN queries) and explicitly (using the BEGIN and END TRANSACTION syntax). More importantly, developers love the ease with which they can leverage SQL to model (and store) data only once and then change queries by simply changing JOINs as and when business needs change.
2. Ultra Resilience with Native Failover/Repair
Modern distributed systems usually rely on either Paxos or Raft distributed consensus for ensuring resilience when subjected to random failures. “How Does Consensus-Based Replication Work in Distributed Databases?” highlights how such replication algorithms work in practice. Distributed SQL databases apply distributed consensus at a per-shard level ensure that each shard (and not simply each instance) remains highly available in the presence of failures. Infrastructure failures always affect only a subset of data (only those shards whose leaders get partitioned away) and never the entire cluster. And, given the ability of the remaining shard replicas to automatically elect a new leader in seconds, the cluster repairs itself thereby exhibiting self-healing characteristics when subjected to failures. The application remains transparent to these cluster config changes and continues to work normally without outages or slowdowns.
3. Scale On-Demand with Horizontal Write Scalability
“How Data Sharding Works in a Distributed SQL Database” shows how automatic data sharding is usually implemented in a distributed SQL database. The shards remain automatically balanced across all available nodes as new nodes are added or existing nodes are removed. Microservices needing write scalability for transactional applications can now rely on the database directly as opposed to adding new infrastructure such as an in-memory cache (that offloads read requests from the database so that it can be preserved for handling write requests) or a NoSQL database (that scales writes but forsakes ACID guarantees).
4. Low User Latency with Geographic Data Distribution
As highlighted in “9 Techniques to Build Cloud-Native, Geo-Distributed SQL Apps with Low Latency”, distributed SQL databases can offer a wide array of techniques to build geo-distributed applications that not only help in tolerating region failures automatically but also lower latency for end users by bringing data closer to their local region.
YugabyteDB adheres to the overall distributed SQL architecture previously described and as a result, delivers on the benefits highlighted above. Additionally, it differentiates itself from others in the distributed SQL category with the following three benefits.
1. Low Total Cost of Ownership with High Performance
DocDB is YugabyteDB’s RocksDB-based distributed document store written in C++. Thanks to the performance engineering behind DocDB, YugabyteDB is a great fit for applications needing either low latency queries (such as distributed OLTP applications like a Retail Product Catalog) or high-volume data ingestion (such as an IoT Analytics Platform). YugabyteDB serves such applications with clusters smaller than other distributed SQL databases. Additionally, every YugabyteDB node can store multiple TBs of data without compromising the core promise of horizontal write scalability for the overall database cluster. “Comparing Distributed SQL Performance – YugabyteDB vs. Amazon Aurora PostgreSQL vs. CockroachDB” benchmarks the performance characteristics of YugabyteDB against two other distributed SQL databases.
2. Cloud Neutral with Kubernetes Native
Software organizations today see cloud-neutral and multicloud design patterns as the freedom to build and manage applications at unparalleled speed. Kubernetes-driven orchestration of containerized applications is a proven way to implement such design patterns. However, stateful transactional databases are one of the most complex workloads to run in Kubernetes. The ephemeral nature of Kubernetes pods and the constant need to reschedule them onto a new Kubernetes host requires the underlying database tier to also become equally agile. Otherwise, the application will see outages, slowdowns and worse of all, data loss and incorrect results. YB-Master, YugabyteDB’s built-in configuration management service, guarantees that applications never experience such scenarios by constantly monitoring and re-balancing the data shards across the available nodes even in a highly dynamic environment such as a Kubernetes cluster.
3. High Release Velocity with Open Source
In the era of database projects increasingly moving towards proprietary software and thereby breaking the trust of their community, YugabyteDB is proud the follow the exact opposite path by releasing the entire database under the permissive Apache 2.0 open source license. “Why We Changed YugabyteDB Licensing to 100% Open Source” highlights the reasons behind such a philosophy. The end result is that users get to build and release business-critical applications faster by leveraging advanced database features such as distributed backups, change data capture, two-region deployments, encryption at rest and more. The commercial products, Yugabyte Platform and Yugabyte Cloud, come into consideration only if there is a need for a self-managed or fully-managed Database-as-a-Service that simplifies enterprise operations even further.
Comparing Distributed SQL Databases
Now that we have established the seven benefits that users should demand from their distributed SQL database of choice, let us compare how five such databases fare against these desired benefits. The five databases we have picked for comparison are Amazon Aurora, Google Cloud Spanner, PingCap’s TiDB, CockroachDB and YugabyteDB. The first two are proprietary managed database services while the last three are cloud-neutral projects.
Generally available since 2015, Amazon Aurora is built on a proprietary distributed storage engine that automatically replicates 6 copies of data across 3 availability zones for high availability. From an API standpoint, Aurora is wire compatible with both PostgreSQL and MySQL. As described in “Amazon Aurora under the hood: quorums and correlated failure”, Aurora’s quorum write approach based on the 6 replicas allows for significantly better availability and durability than traditional master-slave replication. By default, Aurora runs in a single-master configuration where only a single node can process write requests and all other nodes are read-only replicas. If the writer node becomes unavailable, a failover mechanism promotes one of the read-only nodes to be the new writer.
Multi-master configuration is a recent addition to Aurora MySQL (not yet available on Aurora PostgreSQL) for scaling writes that involves a second writer node. However, since all of the data is now present in both the nodes, concurrent writes to the same data on the two nodes can lead to write conflicts and deadlock errors that the application has to handle. A long list of limitations include the inability to scale beyond the original two writer nodes as well as lack of geo-distributed writes across multiple regions.
Google Cloud Spanner
Spanner is Google’s globally-distributed SQL database that powers much of its business-critical properties such as AdWords, Apps, GMail and more. Google had been building Spanner starting 2007 first as a transactional key-value store and then as a SQL database. A subset of the Spanner system was made publicly available in 2017 on the Google Cloud Platform as a proprietary managed service called Google Cloud Spanner. Cloud Spanner offers a proprietary SQL API that is neither compatible with any other open source SQL database nor does it support traditional relational data modeling constructs such as foreign key constraints. However, using Google’s proprietary TrueTime atomic clock, Spanner is able to guarantee an isolation level called External Consistency that is even stricter than Serializability.
PingCap’s TiDB, a MySQL-compatible distributed database built on TiKV, takes design inspiration from Google Spanner and Apache HBase. While its sharding and replication architecture are similar to that of Spanner, it follows a very different design for multi-shard transactions. As described in “Implementing Distributed Transactions the Google Way: Percolator vs. Spanner”, TiDB uses Google Percolator as the inspiration for its multi-shard transaction design. This choice essentially makes TiDB unfit for deployments with geo-distributed writes since the majority of transactions in a random-access OLTP workload will now experience high WAN latency when acquiring a timestamp from the global timestamp oracle running in a different region. Additionally, TiDB lacks support for critical relational data modeling constructs such as foreign key constraints and Serializable isolation level.
CockroachDB, a PostgreSQL-compatible distributed database built using Raft and RocksDB, is inspired by Google Spanner as far as sharding, replication and distributed transactions are concerned. The API layer is a reimplementation of the PostgreSQL query layer resulting in loss of functional depth. For example, partial indexes, stored procedures and triggers are not supported. Additionally, CockroachDB recently gave up its open source roots by relicensing the core database under the proprietary source-available BSL 1.0 license that restricts freedom of use. Last but not least, advanced features such as distributed backups and change data capture are not even included in the freemium edition and require a commercial license.