The Distributed SQL Blog

Thoughts on distributed databases, open source and cloud native

Getting Started with pgbench and Distributed SQL on GKE

VP Developer Relations

pgbench is a simple program for running benchmark tests on PostgreSQL. It runs the same sequence of SQL commands over and over, possibly in multiple concurrent database sessions, and then calculates the average transaction rate (transactions per second). By default, pgbench tests a scenario that is loosely based on TPC-B, involving five SELECT, UPDATE, and INSERT commands per transaction. However, it is easy to test other cases by writing your own transaction script files.

Before we dive in, we wanted to let you know that the Yugabyte team has been working from home in order to do our part with social distancing and to help with containment efforts. We have also transitioned to online meetings with our customers, partners, candidates, community, and fellow Yugabeings. We will continue with these and other measures until it’s determined to be safe otherwise. We wish everyone the best health and wellness, and our thoughts are with all those around the world who have been affected by COVID-19. If during these unprecedented circumstances, you have some time or would welcome a brief distraction, we invite you to check out this post below.

At YugabyteDB we have forked pgbench and renamed it to ysql_bench. This utility can be found in the YugabyteDB install directory under /postgres/bin/ysql_bench. You can find the source for ysql_bench on GitHub. This fork is identical to pgbench with two minor differences in the set up phase only:

  • When initializing ysql_bench, primary keys are created during the CREATE TABLE statement, not afterwards.
  • When seeding the tables with data during the initialization phase we divide the data into a number of smaller batches instead of a single batch using the COPY command.

What’s YugabyteDB? It is an open source, high-performance distributed SQL database built on a scalable and fault-tolerant design inspired by Google Spanner. Yugabyte’s SQL API (YSQL) is PostgreSQL wire compatible.

In this blog post we are going to walk you through how to deploy YugabyteDB on Google Kubernetes Engine (GKE) and configure ysql_bench to run against it.

Single Node PostgreSQL vs Distributed SQL

Although pgbench is a benchmarking utility designed for single node PostgreSQL deployments, it can still be used against a distributed, multi-node SQL database like YugabyteDB. As a PostgreSQL administrator it is important to remember that you will likely be making some performance trade-offs to get higher availability and geo-data distribution. Without getting into the eventually consistent, master/slave replication schemes you can use with PostgreSQL, as a general rule, PostgreSQL requires you to scale-up to achieve more performance, while YugabyteDB relies on scale-out to deliver more performance without compromising on data consistency guarantees.

Given the above, although PostgreSQL vs YugabyteDB is an “apples to oranges” comparison, pgbench can still provide some valuable insights in regards to offering a comparative starting point for cluster sizing, deployment architecture, and resource allocation. As you might imagine, your ultimate results are going to be highly dependent on your environment, machine types, default isolation levels and what (if any) database optimizations you’ve configured up front. A final bit of guidance is that we’ll be using GKE as our environment. And, as we all know, Kubernetes will introduce its own performance encumbrances, regardless of the software being benchmarked. However, for the purposes of this how-to, Kubernetes provides a relatively simple vehicle for getting a three node cluster of YugabyteDB up and running.

Ok, first things first, let’s get YugabyteDB running on GKE.

Prerequisites

Below is the environment we’ll be configuring for the purposes of this how-to:

Setting Up a Kubernetes Cluster on GCP

To deploy YugabyteDB on the Google Cloud Platform (GCP), we first have to set up a cluster. To create the cluster in Google Kubernetes Engine (GKE):

Go to Kubernetes Engine> Clusters > Create Cluster > Standard cluster. For the purposes of this example we’ll use the default options. As for the machine type, make sure you select the n1-standard-8 (8 vCPU, 30 GB memory) machine type at a minimum.

Connect to the Google Cloud Shell and verify that the nodes are setup and running by using the command:

Installing YugabyteDB on the GKE Cluster Using Helm

Detailed installation steps can be found here, which we’ve summarized below.

Upgrade Helm

Initialize Helm with the upgrade option:

You should now be able to install YugabyteDB using a Helm chart.

Create a service account

Before you can create the cluster, you need to have a service account that has been granted the cluster-admin role. Use the following command to create a yugabyte-helm service account granted with the ClusterRole of cluster-admin.

Initialize Helm

Create a namespace

Add the charts repository

Fetch updates from the repository

Install YugabyteDB

To install our YugabyteDB cluster with a replication factor of 3, we’ll be using a Helm chart and will expose the administrative UI endpoint and YSQL using LoadBalancer. YSQL is YugabyteDB’s PostgreSQL-compatible SQL API. Detailed Helm instructions can be found in the Docs.

To check the status of the YugabyteDB cluster, execute the command below:

You can also verify that all the components are installed and communicating by visiting GKE’s Services & Ingress and Workloads page.

services and ingress - Getting Started with pgbench and Distributed SQL on GKE

You can view the YugabyteDB install in the administrative UI by visiting the endpoint for the yb-master-ui service on port 7000.

YugabyteDB and pgbench how-to - view the YugabyteDB install in the administrative UI

You can also now log into YSQL, the PostgreSQL-compatible shell, by executing:

Next, create the database that will hold the ysql_bench objects and data.

Initializing ysql_bench with YugabyteDB

The next step is to initialize the ysql_bench utility after exiting YSQL and entering a shell on yb-tserver-0.

Then execute:

Note: The default isolation level in PostgreSQL is Read Committed (Weaker), while in YugabyteDB it is Snapshot Isolation (Stronger) which maps to PostgreSQL’s Repeatable Read. Because of this, to run any “apples to oranges” tests between PostgreSQL and YugabyteDB, you might want to set both databases to Repeatable Read or Serializable.

A quick cheat sheet concerning the options we are using:

    • -h – This host IP will be unique to your deployment.
    • -p – This post number should be 5433 which is what the YSQL API listens on.
    • -i – Specifies that this is an initialization run.
    • -U – This section specifies the user and database we are running the initialization against.
    • -s – Specifies the scale factor. The default scale factor for pgbench is 1. This means that the tables initially will contain this many rows:

Specifying a scale factor of 20 (as in our example) would mean an initial row count of the following in ysql_bench:

For additional information on the various options that are available, check out the PostgreSQL Docs concerning pgbench.

After the initialization run, you should see output in your terminal similar to the below:

Running ysql_bench with YugabyteDB

We can now run ysql_bench using the following command:

The output should look something similar to the below:

Now that we can initialize the benchmarking objects and run a basic test, we are ready to start experimenting with the various option flags that are available in pgbench. For example we can rerun the test with more threads, detailed logging, latency limits and target transaction rates. As an example, let’s do a second run specifying the following options:

  • scalefactor = 10
  • jobs = 8
  • runtime = 180 seconds
  • clients = 1

Initialize the test run:

Run the test:

You should see output like this:

That’s it! You are now ready to run your own benchmarking experiments using ysql_bench, YugabyteDB’s version on pgbench!

What’s Next?

  • Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner and MongoDB.
  • Get started with YugabyteDB on macOS, Linux, Docker and Kubernetes.
  • Contact us to learn more about licensing, pricing, or to schedule a technical overview.

Related Posts

VP Developer Relations