YugaByte DB

The YugaByte Database Blog

Thoughts on open source, cloud native and distributed databases

New to Google Cloud Databases? 5 Areas of Confusion That You Better Be Aware of

After billions of dollars in capital expenditure and reference customers in every major vertical, Google Cloud Platform has finally emerged as a credible competitor to Amazon Web Services and Microsoft Azure when it comes to enterprise-ready cloud infrastructure. While Google Cloud’s compute and storage offerings are easier to understand, making sense of its various managed database offerings is not for the faint-hearted. This post introduces app developers to the major Google Cloud database services, highlights 5 key areas of confusion and explains how YugaByte DB enables agile app development and simplified cloud operations without much of the confusion.

Google Cloud Database Offerings Highlighted (Source: ZDNet)

SQL Databases

Cloud SQL

 

 

 

 

Hosted service for monolithic MySQL and PostgreSQL. Single write node that can achieve a max IOPS of 10K with a max storage of 10 TB , max RAM of 416 GB. Good for monolithic OLTP apps such as content management systems, ERP and CRM.

Limitations

If you need to need increase IOPS beyond 10K or store more than 10 TB, then you cannot simply add a new node. This is because there is no horizontal write scalability in Cloud SQL. You have to now re-write your application in one of the following ways.

  • Shard data at an application-level where each independent node becomes responsible for part of your data — this option is very expensive to maintain if you need to increase the number of nodes or number of shards in the future.
  • Port application’s database access code to the significantly-more-expensive Google Cloud Spanner (see below), which is not API compatible with either MySQL or PostgreSQL.

Cloud Spanner

 

 

 

 

Globally distributed, highly available relational database service with both single region and multi-region deployment configurations. Inserts and updates are through a custom API while reads and DDL operations are though a Spanner-specific flavor of SQL. Good for distributed OLTP apps such as retail product catalog, SaaS user identity and online gaming.

NoSQL Databases

Cloud BigTable

 

 

 

Single-region, highly-scalable, wide-column NoSQL service with low latency and high throughput. Deeply integrated with the Hadoop ecosystem including HBase API compatibility. Good for timeseries-like Hybrid Transactional/Analytical Processing (HTAP) apps that do not require multi-region deployments.

Cloud Datastore

 

 

 

 

Highly scalable NoSQL database with document data model, atomic (but not fully ACID) transactions, SQL-like query language and support for single-region/multi-region configurations. Note that Cloud Datastore is being retired in favor of a new product called Cloud Firestore. Good for distributed OLTP apps similar to Cloud Spanner.

Cloud Memorystore (Beta)

 

 

 

 

Redis compatible in-memory data store to build app caches with sub-ms latency. Good for session management, gaming leaderboard and stream processing.

The Battles Within

As we can see by simply reading the standard descriptions of the above databases, there is a good amount of overlap in the use cases these databases serve. Let’s review 5 cases that are not easily answered even by Google Cloud’s Choosing a Storage Option flowchart.

Cloud SQL vs. Cloud Spanner

Need for horizontal write scalability either in the same region or across multiple regions is the single driver for choosing Cloud Spanner over Cloud SQL. But as we know in real-world cases, we can never predict when exactly we need to horizontal scale. Whenever such a situation arises, the application has to be re-written and the database has to be migrated from Cloud SQL to Cloud Spanner. Both these tasks can turn out to be painful in practice.

Cloud Spanner vs. Cloud BigTable

Google’s recommendation is to pick BigTable for single-region analytics use cases and Spanner for multi-region operational use cases. This distinction is also very hard to implement in practice. A time series app for a custom DevOps monitoring use case looks very similar to the early days of a SaaS infrastructure monitoring platform built by New Relic and Datadog. The latter use case requires multi-region scalability and strongly consistent secondary indexes. Going with BigTable on day 1 will not help such a use case but that’s what most users will pick by default.

Cloud Spanner vs. Cloud Datastore

The big distinction here is the nature of the data stored — Spanner is good for structured data that should be stored in relational tables while Datastore is good for unstructured data that should be stored in JSON documents. Another subtle but important distinction is that Spanner is fully ACID compliant whereas Datastore has only Atomic & Durable transactions (which means theres no Consistency and Isolation). What if we need to store the user reviews (unstructured data) along side the item details (structured data) in a product catalog? We need to use two databases in the Google Cloud world.

Cloud BigTable vs. Cloud Datastore

BigTable is optimized for high volumes of data and analytics while Datastore is optimized to serve high-value transactional data to applications. So even though both of them are NoSQL databases, issues similar to what we previously discussed in Cloud Spanner vs. Cloud BigTable arise.

Cloud Memorystore vs. Cloud BigTable

Both BigTable and Memorystore can be used to model key-value datasets, but Memorystore is in-memory only while BigTable has disk-based persistence. Using Memorystore for completely low latency caching situations make sense, but things get murky for use cases such as gaming leaderboards (where cache-to-DB consistency starts to matter) and stream processing (where data volume can get too high to fit into memory). In these cases, BigTable is a better option.

Solving the Confusion with YugaByte DB

YugaByte DB is a Cassandra, Redis & PostgreSQL (beta) compatible open source distributed database with sharding, replication and transactions architecture similar to that of Google Spanner, the original Google-internal system that led to the publicly available Google Cloud Spanner. As a Consistent and Partition-tolerant (CP) database with native JSONB document data typehigh performance secondary indexescloud native operational ease as well as the ability to handle high data density, it serves as an excellent alternative to many of the Google Cloud databases.

Multi-API and Multi-Model App Development

Google Cloud’s database offerings follow the same polyglot persistence pattern that AWS follows. As highlighted in Polyglot Persistence vs. Multi-API/Multi-Model: Which One For Multi-Cloud?, customers are better served instead by a multi-API/multi-model database such as YugaByte DB. Database needs are future-proofed against changing application requirements since each of the APIs solve a fundamental challenge that the original API implementation did not — it does so by adding ACID transactions to Cassandra, fault-tolerance & persistence to Redis and horizontal write scalability to PostgreSQL. In the worst case, changing an API to go from say Cassandra Query Language to PostgreSQL is a still a simpler option than changing the database altogether.

Multi-Region and Multi-Cloud Operations

YugaByte DB is ideal for a variety of multi-zone, multi-region, hybrid cloud and multi-cloud deployment options. YugaByte DB 1.0 — A Peek Under The Hood details many of these deployment options. The key here is to be able to change infrastructure choices as often as business demands — moving from single region to multi-region as well as moving from single cloud to multi-cloud should be zero downtime operations with no application impact.

What’s Next?

Sid Choudhury

VP, Product