The YugaByte Database Blog

Thoughts on open source, cloud native and distributed databases

Announcing YugaByte DB 1.0! 🍾 🎉

Team YugaByte is delighted to announce the general availability of YugaByte DB 1.0!

It has been an incredibly satisfying experience to, in just two years, build and launch a cloud-scale, transactional and high-performance database that’s already powering real-world production workloads. I wanted to take a moment to share our journey to 1.0 and the road ahead.

The Inspiration

Modern user-facing applications are increasingly moving to a multi-region, cloud-native architecture. They have demanding requirements from the data tier around transactional consistency, high performance, extreme scale and global distribution. Existing databases only meet a subset of these requirements and hence enterprises end up with multiple databases, often even within a single use case (e.g., an RDBMS for transactional needs, a NoSQL DB for scale, and caches such as Redis for performance).

We started YugaByte with the thesis that the key to simplifying data infrastructure for development and deployment of modern applications lies in building a database that’s designed right for the modern cloud. Such a database needs to be transactional, high-performance and planet-scale — all at the same time.

Enter YugaByte DB. Its a multi-API, multi-model database similar to Azure CosmosDB. It supports Cassandra-compatible and Redis-compatible APIs; and a PostgreSQL API is in Beta. YugaByte DB’s consistency & transaction model are inspired by Google Spanner.

Yuga in Sanskrit represents an era or an epoch, and the name YugaByte signifies data without limits. YugaByte DB blends Azure CosmosDB and Google Spanner into a unified database, while being open-source without any proprietary cloud lock-in.

The Road to 1.0

Combining the best of NoSQL and SQL in a single database designed to run on a “shared-nothing architecture” (which is what the modern cloud is) may sound impossible to some. We decomposed this challenging problem into smaller “byte-sized” chunks, each of which took about 6 months to build.

Extending RocksDB and Integrating with Raft

Our first design goal was high performance, which meant each node should be able to sustain a high read and write throughput with low latencies. Therefore, as a starting point, we picked RocksDB, a high-performance key-value store written entirely in C++ for its log-structured design and ability to handle high read/write rates.

But RocksDB is a monolithic data store, and does not satisfy the design goal of being planet-scale. We needed to build a fault-tolerant and highly available version of RocksDB with strong consistency, an essential building block for transactions. We used Raft, a popular distributed consensus protocol, for implementing strongly consistent replication.

An additional benefit is that the Raft protocol is designed to allow membership changes on the fly. This enables operational capabilities such as changing the machines your database is running on or migrating it to a new region or cloud. These capabilities are also key to running a database in cloud-native environments such as Kubernetes.

Furthermore, given that the database concepts of transaction logging, recovery, point-in-time backups and multi-version concurrency control had to be implemented in our distributed-core, the equivalent capabilities in vanilla RocksDB were simply overheads, and we trimmed those away to allow for a high performance design.

We then extended the core with essential capabilities required for correctness, distribution and scale. These included implementing sharding of tables, leader leases, zone-aware data placement, failure detection and load-balancing to name just a few.

Distributed Key-to-Document Store

The next step was to extend the capabilities of the core storage engine. We knew that handling a variety of workloads was critical to making the database useful across a wide variety of applications. Some workloads such as time series have ever growing data, which means the database needs to handle high data density. Other workloads such as traditional OLTP may have a high overwrite rate, and the database needs to efficient at fine-grained updates.

We picked a document-oriented storage model. With careful design and the correct optimizations, a document store can be of high performance. We built a log-structured key-to-document storage engine, DocDB, using RocksDB as a starting point and heavily extending it to natively handle non-primitive and nested types (such as rows, maps, collections, and JSON). DocDB can support large objects as well as perform fine-grained updates efficiently.

Pluggable API Layer

During our journey in building YugaByte DB thus far, we spoke to a number of customers to understand their pain-points and desired features. We picked three existing, popular APIs based on this customer discovery— Cassandra (to model NoSQL workloads), Redis (for apps requiring low-latency access), and PostgreSQL (because of SQL’s universal appeal).

In the initial phase, we implemented wire-protocol compatible server-side implementations of Cassandra and Redis APIs. We deferred the work on PostgreSQL API until we had built a few more foundational pieces. The wire-compatibility allows for developer agility because of familiarity with data models and programming paradigms. The added benefit is the ability to leverage existing ecosystems for these APIs as is — such as Spring application framework, Apache Spark for real-time analytics, or JanusGraph for graph workloads.

Increasing App Development Velocity

Distributed ACID transactions and secondary indexes have always ranked very high on our customer wishlist. These are hard engineering problems, and areas where NoSQL falls short. Support for a native JSON datatype was another common ask. As a result, we created YugaByte Cloud Query Language (YCQL). YCQL extends the Cassandra API with support for distributed transactions, strongly consistent secondary indexes and JSON.

Redis Open Source is not a true database. We created YugaBytE DIctionary Service (YEDIS), a Redis-compatible API, but with built-in persistence, auto-sharding and linear scalability. YEDIS optionally allows for timeline-consistent low-latency reads from the nearest datacenter, while write operations maintain global consistency. Additionally, YEDIS adds a new Time Series data type based entirely on customer feedback.

Enterprise-Ready 1.0

This brings us to the recent few months in our journey. We wanted to make it easy to deploy a production-grade setup on any IaaS platform — be it bare-metal, VMs or containers, in public or private clouds.

We developed YugaByte DB Enterprise Edition (EE) for this purpose. With the YugaByte DB EE, public and private clouds are abstracted away. YugaByte DB EE can orchestrate, secure and monitor production-grade deployments across multiple regions and multiple clouds in minutes. Distributed backups onto a configurable end-point such as S3 now become trivial operations.

Multiple customers have already deployed YugaByte DB 1.0 in production. Turvo, a real-time collaborative logistics platform, moved to YugaByte DB from MongoDB to benefit from transactional consistency and ability to serve large datasets at very low latencies. Another customer, Narvar, leveraged YugaByte DB to increase app development agility by unifying their cache and DB tiers. Others have chosen YugaByte DB for its multi-cloud infrastructure portability as well as for implementing regulatory compliance mandates such as GDPR.

Looking Ahead

Our mission is to accelerate development and deployment of modern applications by simplifying the data tier.

Launching version 1.0 of YugaByte DB is just a first step in this journey. There is continued work underway in almost every aspect you can imagine— functionality, correctness, performance, security and ecosystem integrations. A key area of focus is to complete the functional capabilities of the PostgresSQL API and battle harden it to GA.

We are very excited about the progress we have made over the last two years, and are looking forward to the years ahead. We hope to welcome many new users and developers into our community, and working together to push the boundary of YugaByte DB with challenging applications and use-cases.