GO-JEK’s Performance Benchmarking of CockroachDB, TiDB & YugaByte DB on Kubernetes
Iqbal Farabi and Tara Baskara, Systems Engineers from GO-JEK Indonesia, recently presented the results of their benchmarking of cloud native databases on Kubernetes at KubeCon Europe in Barcelona. The three databases they benchmarked were CockroachDB, TiDB and YugaByte DB. This post brings their presentation (video recording) and slides (PDF) to the attention of our readers. It also highlights a few areas of collaboration between the GO-JEK team and YugaByte DB Engineering.
Selecting Cloud Native Databases to Benchmark
As we have previously described in “Docker, Kubernetes and the Rise of Cloud Native Databases”, cloud native databases are a new breed of databases that are horizontally scalable, can be run in dynamic cloud environments, can be deployed in containers (with Kubernetes as the orchestration engine), are highly resilient to failures and finally, are easy to observe and manage. On top these must-have characteristics, the GO-JEK team added the following qualification criteria.
- Open source
- Operational database
- ACID compliance
- Provides SQL-like API
The CNCF Landscape lists many databases and a few of them could qualify for the above criteria. The GO-JEK team selected the following three for their benchmarking exercise.
Yahoo! Cloud Serving Benchmark (YCSB)
Brian Cooper et. al from Yahoo! Research introduced their Yahoo! Cloud Serving Benchmark (YCSB) to the world in June 2010. Since then it has become the de-facto standard for benchmarking performance of databases that act as serving stores for “cloud OLTP” applications.
An Area of Collaboration
The YCSB paper notes the following four design tradeoffs in databases.
- Read performance versus write performance
- Latency versus durability
- Synchronous versus asynchronous replication
- Row versus column data partitioning
The GO-JEK team focused on the last three aspects in their presentation. However, they seem to have misinterpreted YugaByte DB’s replication architecture as asynchronous. We believe the Cassandra roots of the YugaByte Cloud QL (YCQL) API may have led them to this conclusion. As highlighted in the next subsection, YugaByte DB uses synchronous replication that is inspired from Google Spanner and has no similarities with the asynchronous replication architecture of Cassandra. We would welcome the opportunity to collaborate with the GO-JEK team around better understanding of YugaByte DB architecture.
Synchronous vs. Asynchronous Replication in YugaByte DB
The YCSB paper defines this tradeoff in the following way.
For the primary cluster responsible for serving writes, YugaByte DB by default uses Raft-based synchronous replication to keep all the replicas of a shard in sync. It goes beyond the other two databases compared by also offering optional Read Replicas in faraway regions — these Read Replicas receive the data from the primary cluster using asynchronous replication. This approach is similar to Google Spanner’s notion of read-write replicas (for the primary cluster driven by synchronous replication) and read-only replicas (driven by asynchronous replication from the primary cluster).
The GO-JEK team customized the standard YCSB dataset to manage 1M records on a table with each workload running 1M operations and a variable number of threads. Details are listed in the slide below.
There were 5 workloads used for the benchmarking — each workload is representative of a type of application as listed in the slide below.
A 3 node GCE cluster with the following per-node specifications was used for benchmarking.
- n1-standard-16 machine type
- 1000 GB Local SSD
- 60 GB RAM
Since databases are modeled as StatefulSets in Kubernetes, the following were the resource settings for each StatefulSet:
- 14 vCPU request, 16 vCPU limit
- 30 GB RAM request, 60 GB RAM limit
- 500 GB SSD local persistent volume
For Workload A with 50% Reads and 50% Writes, YugaByte DB comes out ahead of the other two databases in both throughput and latency.
For Workload B with 95% Reads and 5% Writes, YugaByte DB is the leader for low latency operations and 2nd best when it comes to throughput.
For Workload C with 100% Read operations, YugaByte DB shows characteristics similar to Workload B. It is the leader for low latency operations and the 2nd best for throughput.
For Workload D with Read Latest operations (95% Read and 5% Write), YugaByte DB again comes out as the lowest latency database and throughput quite close to that of TiDB.
Workload E – Another Area for Collaboration
During their presentation, the GO-JEK team noted that they were not able to run Workload E (focused on Short Range Scans) on YugaByte DB. Needless to say, it was disappointing for us to hear. We would love to work with the GO-JEK team to fix the issue and have results published even for this workload. That would also give us an opportunity to understand why throughput is lower than our expectations for workload B-D.
We have run YCSB using the YCQL API multiple times over the last couple years. “Technical Deep Dive into YugaByte DB 1.1” from September 2018 lists the results of our last run with results for even Workload E. The figure below summarizes those results for Throughput observed. Since our YCSB tests were performed on VM-based clusters and not Kubernetes-based clusters, we believe the error observed by GO-JEK team could be related to Kubernetes configuration and hence easily addressed.
GO-JEK team’s efforts to benchmark three cloud native, distributed SQL databases on Kubernetes are commendable. Given that our users frequently ask us for such benchmarks, we have been exploring this effort ourselves. We are very happy to see an established Kubernetes end-user such as GO-JEK take on this initiative and publish benchmarks as a neutral 3rd-party. We look forward to working with the GO-JEK team to not only correct the misinterpretations but also fix the benchmarking-related issues observed. Updated results from GO-JEK will certainly help more end-users like them make informed choices.