Overcoming MongoDB Sharding and Replication Limitations with YugaByte DB
A few of our early users have chosen to build their new cloud applications on YugaByte DB even though their current primary datastore is MongoDB. Starting with the v3.4 release in Nov 2016, MongoDB has made improvements in its sharding and replication architecture that has allowed it to be re-classified as a Consistent and Partition-tolerant (CP) database and move away from its Available and Partition-tolerant (AP) origins. However, significant limitations remain that make it unsuitable for latency-sensitive, highly scalable applications. These limitations, as detailed in this post, have been cited by our users as the one of the key reasons for their switch to YugaByte DB. The other key reason cited is YugaByte DB’s support for distributed ACID transactions, a topic we will review in the next post of this series.
Basic Data Organization
MongoDB organizes data in a Document and a group of documents form a Collection. On the other hand, YugaByte DB’s data organization is similar to that of a RDBMS — the basic unit is a Row and group of rows combine to form a Table.
A method of splitting and storing a single logical dataset in multiple database instances. By distributing data among multiple instances, a group of database instances can store a larger dataset and handle additional requests. Also referred to as horizontal partitioning.
Sharding is optional in MongoDB with the default being unsharded collections grouped together into a standalone mongod instance. For fault-tolerance in a production environment, this single instance has to be manually converted to a Replica Set which is essentially a group of mongod instances that maintain the same dataset. Users new to MongoDB often do not understand the implications of starting out with an unsharded database in the form of single Replica Set. As data sizes grow beyond the limits of a single primary node, they are forced to undertake a complex and manual migration to a new deployment architecture called Sharded Cluster, where each “shard” is modeled as an independent Replica Set. This means each shard has a primary mongod instance and multiple secondary mongod instances.
Let’s take the simple case of a Sharded Cluster with 3 shards. Such a cluster either runs 3 nodes with 3 mongod instances each or the more common production deployment of 9 nodes with 1 mongod instance each — leading to a total of 9 mongod instances. Adding new shards will lead to more such instances. Additionally, each node of the application server running MongoDB client code should now run a mongos instance for query routing to the correct shard.
YugaByte DB treats sharding as a fundamental data organization concept from day 1. Each table is auto-sharded into a configurable number of tablets using a hash-based partitioning scheme as the default. As shown above, a cluster is a group of yb-tserver instances (one instance per node) where each yb-tserver instance is responsible for an equal number of tablets across all tables in all keyspaces.
Note that both YugaByte DB and MongoDB Sharded Clusters use a replicated set of coordinator servers to store metadata and configuration settings for the cluster. These servers are known as yb-masters in YugaByte DB while MongoDB refers to them as config servers.
YugaByte DB’s sharding design is purpose-built for handling ever-growing datasets alongside bounded datasets in the same cluster and that too with the most efficient resource utilization possible.
Benefits of YugaByte DB sharding include:
- YugaByte DB seamlessly handles uneven data growth across tables unlike MongoDB whose rigid sharding mechanism makes the same number of shards available to every collection. E.g., in a Retail ECommerce application running on YugaByte DB, tables with ever-growing data characteristics such as User Activity or Order History can have more number of tablets (and hence more nodes serving writes) as opposed to a table with bounded data characteristics such as User Profile.
- Since each node is responsible for an equal number of tablets, incoming client requests are evenly load balanced across all nodes in the cluster. This leads to the most optimal cluster utilization. The cluster remains auto-rebalanced when new nodes are added or existing nodes are removed (including complete failures). This is in contrast to MongoDB where only the nodes with Replica Set primary mongod instances are responsible for the write requests while the other nodes remain idle wrt such write requests.
- On a given node, a single yb-tserver instance is responsible for optimizing resource allocation across all the tablets it hosts. E.g. it can allocate a larger share of the block cache to a popular tablet serving lots of requests compared to the infrequently accessed tablets. On the other hand, MongoDB’s approach of requiring multiple mongod instances on each node to implement sharding may lead to inefficient resource utilization. Users have to compensate for this inefficiency by over-provisioning resources (e.g. 9 nodes for a cluster with only 3 shards). Additionally, the lack of coordination among the mongod instances hurts performance. E.g., all the instances on a single node can independently decide to do a compaction or garbage collection at the same time leading to latency spikes for the application.
- YugaByte DB client drivers have built-in query routing logic obviating the need for an independent mongos-like instance on the application node. This helps from an operational simplicity standpoint given that any mongos-like instance is in the critical path has to be managed and monitored effectively.
The approach of keeping multiple replicas or copies of the same data in different fault domains in order to tolerate failures in one or more domains. Depending on the type of replication used, read and write requests can be also be scaled using the replica copies.
The Replica Set architecture discussed in the previous section is the main concept powering replication in MongoDB. It uses a Raft-like consensus protocol for election of the primary among its members. However, no consensus protocol is used for the actual data replication between the members, which remains asynchronous (aka master-slave) from primary to the secondary members. MongoDB can still serve linearizable (aka strong) reads using the linearizable read concern that involves a majority quorum by the primary at read time.
The simplest replicated MongoDB cluster is an unsharded 3 node Replica Set where 1 node runs the primary mongod instance (and is responsible for the writes) while the 2 remaining nodes run a secondary mongod instance each.
YugaByte DB’s replication architecture is significantly different than that of MongoDB. It treats replication as a fundamental concept similar to sharding.
Each tablet has its own Raft consensus group for replication. As shown in the simple 3 node cluster above, each tablet’s replica group has 1 leader and multiple followers. The total number of replicas (including the leader) is configured using a Replication Factor (RF), which allows [ceiling(RF/2) –1] failures to be automatically tolerated without any application impact. The tablet leader is responsible for serving linearizable reads and writes.
A leader lease mechanism ensures low latency for linearizable reads by guaranteeing that a tablet can have only 1 leader under any condition. Any follower can become a leader in a matter of seconds upon leader failure using Raft-based leader election. Unlike MongoDB, YugaByte DB uses Raft not only for leader election but also for synchronous data replication for write operations. Using Raft also allows YugaByte DB to be very efficient when adding new nodes or removing current nodes. New nodes can bootstrap their tablets from tablet leaders without having to do expensive quorum reads involving majority replicas. Tablet followers can serve timeline-consistent reads at a latency even lower than linearizable reads and that too without any majority quorum.
YugaByte DB’s replication is designed to serve low latency linearizable and timeline-consistent reads while remaining highly resilient to any kind of failure and achieving high cluster utilization.
Benefits of YugaByte DB replication include:
- Linearizable reads can be served directly from the tablet leader at a latency lower than that of MongoDB. The leader lease mechanism ensures that only 1 leader is possible for a given tablet and hence obviates the need to establish quorum with the followers for serving linearizable reads. On the other hand, MongoDB uses no such leader lease mechanism which means linearizable reads suffer from high latency during intermittent network partitions when there can be more than 1 leader.
- Each node of a cluster not only hosts an equal number of tablet leaders but also an equal number of tablet followers — all in a completely automatic manner. The loss of a node only impacts only those tablets whose leaders were hosted on that node. The followers of those tablets on the remaining nodes elect a new leader for write operation to continue as before. In contrast, MongoDB puts the burden on the user to manually place the Replica Set primary of some shards along side the Replica Set secondary members of some other shards on every node of the cluster. Users either make errors or over-provision their cluster with more hardware/nodes than necessary.
Designing For Performance and Scalability Without Compromising Simplicity
Sharding and replication in MongoDB are difficult for new users to understand because these are not designed to work together at the same data organization level. Sharding is at a higher level than collections (a single shard can house multiple collections). Replication is all the way up at the database instance level. It suffers from the same complexity observed in sharded RDBMS deployments with multiple independent databases each having master-slave replication.
On the other hand, YugaByte DB is designed ground-up for achieving high performance and high scalability while remaining extremely simple for users to understand and operate. Sharding and replication are both automatic and are at the same level, that of a tablet. This level is very close to the basic unit of data organization since a tablet is essentially a group of rows of the same table. This approach allows agile movement of data blocks across all nodes of the cluster while ensuring high resilience and integrity. The various benefits highlighted in the previous sections stem from this agility.
In a follow-up post, we will review the depth of ACID transaction support in MongoDB and YugaByte DB. Meanwhile, you can watch and star YugaByte DB’s GitHub project as well as install a local cluster on your laptop using multiple options including Docker, Kubernetes (Minikube), macOS, CentOS and Ubuntu.