The Distributed SQL Blog

Thoughts on distributed databases, open source, and cloud native

Geo-Distribution in YugabyteDB: Engineering Around the Physics of Latency

Customers today expect always-on, highly responsive access to services wherever they are in the world. This is driving businesses to deploy globally distributed applications that deliver better customer experiences. Moving to a geo-distributed database is a key part of this. Moving data closer to where the end-users are enables lower latency access. Geo-distribution can make the data service resilient to zone and region failures in the cloud. Recent adverse weather conditions and data center accidents have underscored the need for this.

YugabyteDB offers database architects and operators a rich set of deployment and replication options in geo-distributed environments. While there is no getting around the physics of network latency, Yugabyte customers like Admiral, Kroger, and Narvar have been able to achieve resilience, performance, and compliance objectives using the wide array of built-in synchronous and asynchronous data replication and granular geo-partitioning capabilities. It is impossible to predict the data needs of your applications in the future. With a versatile database like YugabyteDB, you can future proof your data infrastructure and pick a deployment option according to your application’s needs.

In this blog post, we look at geo-distributed deployment topologies that engineering teams can use with YugabyteDB. In a follow-up post, we will cover best practices for deploying and running YugabyteDB in a geo-distributed environment to get the most out of the database.

Before we dive into the details, here is a TL;DR summary of deployment options that YugabyteDB offers:

Geo-Distribution-Blog-Post-Table-Image

Let us look at each of these deployment options in detail.

Option 1: Single region multi-zone cluster

A YugabyteDB cluster consists of three or more nodes that communicate with each other and across which data is distributed. You can place the nodes of a YugabyteDB cluster across different zones in a single region. Let us look at what this topology gives you.

Resilience: Cloud providers like AWS and Google Cloud design zones to minimize the risk of correlated failures caused by physical infrastructure outages like power, cooling, or networking. In other words, single failure events usually affect only a single zone. By deploying nodes across zones in a region, you get resilience to a zone failure as well as high availability.

Consistency: YugabyteDB automatically shards the tables of the database, places the data across the nodes, and replicates all writes synchronously. The cluster ensures strong consistency of all I/O and distributed transactions.

Latency: Because the zones are close to each other, you also get low read and write latencies for clients located in the same region as the cluster, typically 1 – 10ms latency for simple row access.

Geo-Distribution Blog Post Image 1

Figure: Single cluster deployed across three zones in a region

In summary, this deployment mode is great for applications that:

  • Need strong consistency
  • Need resilience and HA – zero RPO and near zero RTO
  • Have clients in the same region for low read and write latency

Tradeoffs with multi-AZ deployments:

  • Apps accessing data from remote regions may experience higher read/write latencies
  • This deployment mode is not resilient to region-level outages, e.g., caused by natural disasters like floods or ice storms

Option 2: Multi-region “Stretched” clusters with Synchronous Replication

The second option is similar to the first one except that the nodes of the cluster are deployed in different regions rather than in different zones of the same region.

Geo-Distribution Blog Post Image 2

Figure: Single cluster deployed across three regions

Resilience: Putting cluster nodes in different regions provides an even higher degree of failure independence than in different zones. In the event of a failure, the database cluster continues to serve data requests from the remaining regions while automatically replicating the data in the background to maintain the desired level of resilience.

Consistency: As with the previous option, all writes are synchronously replicated. Transactions in these two configurations are also globally consistent.

Latency: Latency in a multi-region cluster depends on the distance/network packet transfer times between the nodes of the cluster and between the cluster and the client. As a mitigation, YugabyteDB offers tunable global reads that allow read requests to trade off some consistency for lower read latency. By default, read requests in a YugabyteDB cluster are handled by the leader of the Raft group associated with the target tablet by default to ensure strong consistency. In situations where you are willing to sacrifice some consistency in favor of lower latency, you can choose to read from a tablet follower that is closer to the client rather than from the leader. YugabyteDB also allows you to specify the maximum staleness of data when reading from tablet followers.

Write latencies in this deployment mode can be high. This is because the tablet leader replicates write operations across a majority of tablet peers before sending a response to the client. All writes involve cross-zone communication between tablet peers.

In summary, this deployment mode offers:

  • Resilience and HA – zero RPO and near zero RTO
  • Strong consistency of writes, tunable consistency of reads

Tradeoffs with multi-region deployments:

  • Write latency can be high (depends on the distance//network packet transfer times
  • Follower reads trade off consistency for latency

Option 3: Multi-region Clusters with Single-Direction Asynchronous replication

The previous two options offered ways to deploy a single YugabyteDB cluster across zones or regions. In situations where applications want to keep data in multiple clouds or in remote regions, YugabyteDB offers asynchronous replication across two data centers or cloud regions.

Geo-Distribution Blog Post Image 3

Figure: Multi-region deployment with single-direction asynchronous replication between clusters

Here’s how it works in steps:

1. You deploy two YugabyteDB clusters (typically) in different regions. Each cluster automatically replicates data within the cluster synchronously for strong consistency.

2. You then set up xCluster asynchronous replication from one cluster to another. This can be either bi-directional in active-active configuration, or single-directional in active-passive configuration. We discuss the active-passive configuration here, and active-active configuration in the next point.

Sink clusters can be used to serve low-latency reads that are timeline consistent to clients nearby. They can also be used for disaster recovery. In the event of a source cluster failure, the clients can simply connect to the replicated sink cluster.

xCluster replication is ideal for use cases such as DR, auditing, and compliance. Customers can also use xCluster replication to migrate data from a data center to the cloud or from one cloud to another. In situations that tolerate eventual consistency, clients in the same region as the sink clusters can get low latency reads.

Resilience: If you deploy the nodes of each cluster across zones, you get zone-level resilience. In addition, this topology also gives you disaster recovery in the event that the source cluster goes down.

Consistency: Reads and writes within the source cluster are strongly consistent. Because replication across clusters is asynchronous, I/O will be timeline consistent.

Latency: With xCluster, replication to the remote cluster happens outside the critical path of a write operation. So replication does not materially impact latency of reads and writes. In essence you are trading off consistency for latency. Reads within the region that a cluster has low latency.

In summary, this deployment mode offers:

  • Disaster recovery – non-zero RPO and non- zero RTO
  • Timeline consistency in the sink cluster, strong consistency in the source cluster
  • Low latency reads and writes within the source cluster region

Tradeoffs with unidirectional async replication:

  • The sink cluster does not handle writes. Writes from clients outside the source cluster region can incur high latency
  • Since xCluster replication bypasses the query layer for replicated records, database triggers won’t get fired and can lead to unexpected behavior

Option 4: Multi-region Clusters with Bi-directional Asynchronous replication

As we saw in the previous option, YugabyteDB offers asynchronous replication between clusters across two data centers or cloud regions. In addition to the active-passive configuration with one-directional replication, YugabyteDB also has an active-active configuration in which both clusters can handle writes to potentially the same data. Writes to either cluster are asynchronously replicated to the other cluster with a timestamp for the update. xCluster with bi-directional replication is used for disaster recovery.

Geo-Distribution Blog Post Image 4

Figure: Multi-region deployment with bi-directional asynchronous replication between clusters

Resilience: If you deploy the nodes of each cluster across zones, you get zone-level resilience. In addition, this topology also gives you disaster recovery if either cluster goes down.

Consistency: Reads and writes within the cluster that handles a write request are strongly consistent. Because replication across clusters is asynchronous, data replication to the remote cluster will be timeline consistent. If the same key is updated in both clusters at a similar time window, this will result in the write with the higher timestamp becoming the latest write (last writer wins semantics).

Latency: With xCluster, replication to the remote cluster happens outside the critical path of a write operation. So replication does not materially impact latency of reads and writes. In essence you are trading off consistency for latency.

In summary, this deployment mode offers:

  • Disaster recovery – non-zero RPO and non- zero RTO
  • Strong consistency within the cluster that handles a write request, eventual (timeline) consistency in the remote cluster
  • Low latency reads and writes within either cluster

Tradeoffs with bidirectional async replication:

  • Since xCluster replication bypasses the query layer for replicated records, database triggers won’t get fired and can lead to unexpected behavior
  • Since xCluster replication is done at the write-ahead log (WAL) level, there is no way to check for unique constraints. It’s possible to have two conflicting writes in separate universes that will violate the unique constraint and will cause the main table to contain both rows but the index to contain just 1 row, resulting in an inconsistent state.
  • Similarly, the active-active mode doesn’t support auto-increment IDs since both universes will generate the same sequence numbers, and this can result in conflicting rows. It is better to use UUIDs instead.

Option 5: Geo-partitioning with data pinning

Applications that need to keep user data in a particular geographic region to comply with data sovereignty regulations can use row-level geo-partitioning in YugabyteDB. This feature allows fine-grained control over pinning rows in a user table to specific geographic locations.

Geo-Distribution Blog Post Image 5

Figure: Geo-partitioned cluster deployed across three regions

Here’s how it works:

1. Pick a column of the table that will be used as the partition column. The value of this column could be the country or geo name in a user table for example.

2. Next, create partition tables based on the partition column of the original table. You will end up with a partition table for each region that you want to pin data to.

3. Finally pin each table so the data lives in different zones of the target region.

With this deployment mode, the cluster automatically keeps specific rows and all the table shards (known as tablets) in the specified region. In addition to complying with data sovereignty requirements, you also get low-latency access to data from users in the region while maintaining transactional consistency semantics. This blog gets into more details about how row-level geo-partitioning works.

Resilience: Clusters with geo-partitioned tables are resilient to zone-level failures when the nodes in each region are deployed in different zones of the region.

Consistency: Because this deployment model has a single cluster that is spread out across multiple geographies, all writes are synchronously replicated to nodes in different zones of the same region, thus maintaining strong consistency.

Latency: Because all the shard replicas are pinned to zones in a single region, read and write overhead is minimal and latency is low. To insert rows or make updates to rows pinned to a particular region, the cluster needs to touch only shard replicas in the same region.

In summary, use row-level geo-partitioning for:

  • Tables that have data that needs to be pinned to specific geographic regions to meet data sovereignty requirements
  • Low latency reads and writes in the region the data resides in
  • Strongly consistent reads and writes

Tradeoffs with geo-partitioning:

  • Row-level geo-partitioning is useful for specific use cases where the dataset and access to the data is logically partitioned. Examples include users in different countries accessing their accounts, and localized products (or product inventory) in a product catalog.
  • When users travel, access to their data will incur cross-region latency because their data is pinned to a different region.

Option 6: Read replicas

For applications that have writes happening from a single zone or region but want to serve read requests from multiple remote regions, you can use read replicas. Data from the primary cluster is automatically replicated asynchronously to one or more read replica clusters in the same universe. The primary cluster gets all write requests, while read requests can go either to the primary cluster or to the read replica clusters depending on which is closest.

Resilience: If you deploy the nodes of the primary cluster across zones, you get zone-level resilience. The read replicas do not participate in the Raft consistency protocol and therefore don’t affect resilience.

Consistency: The data in the replica clusters is timeline consistent, which is better than eventual consistency.

Latency: Reads from both the primary cluster and read replicas can be fast (single digit millisecond latency) because read replicas can serve timeline consistent reads without having to go to the shard leader in the primary cluster. The read replica clusters do not handle write requests; instead they are redirected to the primary cluster. So the write latency will depend on the distance between the client and the primary cluster.

Geo-Distribution Blog Post Image 6

In summary, use read replicas for:

  • Blazingly fast, timeline-consistent reads from replicas, and strongly consistent reads and writes to the primary cluster
  • Low latency writes within the region

Tradeoffs with read replicas:

  • The primary cluster and the read replicas are correlated clusters, not two independent clusters.  In other words, adding read replicas does not improve resilience
  • Read replicas can’t take writes, so write latency from remote regions can be high even if there is a read replica near the client

Summary

YugabyteDB offers the most comprehensive and flexible array of deployment and replication options in geo-distributed environments.  Whether you are deploying a globally distributed application to serve customers around the world or looking for greater resilience, you can pick from a deployment option that works for your application’s needs.

If you are interested in learning more about geo-distribution in YugabyteDB, here are some additional resources:

Read the blog posts on 9 Techniques to Build Cloud-Native, Geo-Distributed SQL Apps with Low Latency and Geo-partitioning of Data in YugabyteDB.

Related Posts