Best Practices and Recommendations for Distributed SQL on Kubernetes
Yugabyte and Kubernetes have very complementary design principles because they both rely on an extensible and flexible API layer, as well as a scale-out architecture for performance and availability. In this blog post we’ll look at best practices and recommendations when choosing Kubernetes as the cluster foundation for a distributed SQL system. This will begin with a review of relevant architectural decisions of the Yugabyte DB. Then we’ll walk you through how to handle the provisioning, management, “Day 2” operations and auditing of a Yugabyte deployment running on a Kubernetes cluster.
Webinar Playback – Sept 11, 2019: Kubernetes Best Practices for Distributed SQL Databases
Guiding Architectural Principles
From a logical perspective, Yugabyte DB is divided into API and storage layers. Similar to Kubernetes, Yugabyte DB supports multiple query languages with a flexible and extensible API layer. Yugabyte DB currently supports YSQL, which is compatible with PostgreSQL and is best suited for use cases that require referential integrity. It also supports YCQL for use cases where semi-relational queries dominate. The API layer is stateless so that queries can be balanced and handled by the storage layer. The storage layer is abstracted by a distributed document store. This DocDB store takes on the responsibilities of sharding and balancing data persistence, data replication, distributed transaction management and multiversion concurrency control (MVCC).
From a virtual process perspective, a Yugabyte DB environment contains master nodes and worker nodes akin to Kubernetes. The masters maintain Raft (or cluster) consensus, metadata, and the replication of data. Tservers represent the worker nodes of a Yugabyte cluster which implement both the front-facing API layer and the bulk of the backend document storage layer. The tserver scales out its storage layer into high-performance nodes as Kubernetes scales out with worker nodes. Tservers can optimize reads and writes across multiple disks targets, as well as across multiple nodes. Understanding the architectural principles of both platforms should help guide the practical deployment of Yugabyte DB on your Kubernetes platform of choice.
With Kubernetes, you normally use a “Kubernetes-as-a-service” offering from one of the major cloud vendors or you can “roll your own” a cloud agnostic tool such as Kubespray. The benefit of using a cloud-based service is that it should ensure you have a reliable foundation to build off of and an integrated AP for reliably creating your Kubernetes cluster so that it adheres to that vendor’s recommendations. With Yugabyte DB, we’ve created a simple checklist for deploying on Kubernetes.
Helm has emerged as a methodical and easily shared method for consistent deployment and release management of applications on to Kubernetes clusters. Yugabyte provides a helm repo that will allow for the standardized deployment of the database environment into your target cluster. With the directions available here, it’s easy to deploy a basic and highly available Yugabyte DB. The default install assumes a replication factor (RF) of 3. You should plan on at least an equal amount of Kubernetes worker nodes to accommodate the RF sizing. With
helm search yugabytedb/yugabyte -l you should see the versions available to install with 1.3.0 being the latest and most current as of this writing. Installing the latest version will typically give the best performance, most features and fixes available. Release information can be found via GitHub here.
With any new technology, benchmarking and testing your own workload will guide the amount of CPU and memory you should allocate. When sizing the nodes expected to run the Yugabyte DB instance, a minimal starting point that you should use are 8 vcpus and 30GB of memory. This can be customized for resource-constrained environments with a Helm option:
helm install yugabytedb/yugabyte --set resource.master.requests.cpu=0.1,resource.master.requests.memory=0.2Gi,resource.tserver.requests.cpu=0.1,resource.tserver.requests.memory=0.2Gi --namespace yb-demo --name yb-demo
These options can also be used to tailor your resource sizing while using the Helm chart for deployment. For previous examples of resource sizes and benchmarks, please refer to either these benchmarking and this scalability blogs.
In regards to persistent disks, the default disk type assigned to new GKE and EKS clusters are HDD-based persistent volumes. It’s important to note that Yugabyte DB is optimized for SSD-based deployments. As a Kubernetes admin, you can add a new storage class to allow SSD assignment, for example on GKE.
After adding an appropriate storage class, you can override the options in the Helm chart, for example:
helm install yugabytedb/yugabyte --set storage.master.storageClass=faster,storage.tserver.storageClass=faster --namespace yb-demo --name yb-demo
You also have the option of changing the default disk type to SSD for all persistent disks. You can find a few examples here.
Master and Tserver intra-cluster communication is necessary for cluster coherency and availability. This occurs via the name resolution of the local environment occur with minimal latency/network hops. Communication within a Yugabyte DB cluster is handled via ClusterIP for internal communications. You can see this using
get services within the relevant namespace. Helm configures a namespace with the
--namespace option. Logical Masters and Tservers need to communicate for their coordination and data replication protocols. This requirement should be considered latency-sensitive for designing their network path.
(base) epsodi:yb-k8s andrewnelson$ kubectl get services -n yb-demo NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE yb-master-ui LoadBalancer 10.104.0.188 184.108.40.206 7000:32241/TCP 47h yb-masters ClusterIP None none 7100/TCP,7000/TCP 47h yb-tservers ClusterIP None none 7100/TCP,9000/TCP,6379/TCP,9042/TCP,5433/TCP 47h yedis-service LoadBalancer 10.104.8.74 220.127.116.11 6379:31683/TCP 47h yql-service LoadBalancer 10.104.12.122 18.104.22.168 9042:31009/TCP 47h ysql-service LoadBalancer 10.104.12.80 22.214.171.124 5433:31442/TCP 47h
Name resolution is handled using the integrated Kubernetes DNS, such as CoreDNS or kube-dns for intra-cluster name discovery, for example:
Multi-region deployment and 2 datacenter deployment (2DC) will be the topic of an upcoming blog. Integration with a service mesh such as Istio or Consul is in progress and can be tracked via our GitHub.
By default, client communication with the API layer external to the Yugabyte cluster is not configured by default. In order to expose these interfaces via
LoadBalancer and public IP, you’ll need to use the
expose-all option with your Helm install command. For example:
helm install yugabytedb/yugabyte -f https://raw.githubusercontent.com/Yugabyte/charts/master/stable/yugabyte/expose-all.yaml --namespace yb-demo --name yb-demo --wait --set "disableYsql=false"
Pod Priority, Preemption, and PodDisruptionBudget
Assuming you have successfully created a Kubernetes cluster and installed Yugabyte DB, there are still a few items worth mentioning about resources while in operation. Even if Yugabyte is the only app running on your Kubernetes cluster, it will also have management components, typically deployed in the
kube-system namespace. This management overhead will be distro-dependent and which components you choose to deploy for management, networking, logging and auditing. If you choose to have additional apps deployed to the same Kubernetes cluster, it is recommended to be familiar with pod priority classes and how preemption may affect the running workloads. Because Yugabyte DB is a stateful app, it is recommended that it have a higher priority for stability and availability purposes. For more information on Kubernetes preemption works, please check out this article.
In our example of a default deployment with RF3, the resiliency of a cluster depends on only 1 master node being down at a time or 1 worker node being down at a time. The Helm chart sets this up using the following configuration:
==> v1beta1/PodDisruptionBudget NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE yb-master-pdb N/A 1 1 2d3h yb-tserver-pdb N/A 1 1 2d3h
For more information, check out the Kubernetes docs.
In the case of multiple apps being deployed on a single Kubernetes cluster, the administrator also has the option to dedicate nodes to specific workloads with nodeSelector & labels. This keeps the performance and availability profile completely separate to a node granularity level. For more information, check out these Kubernetes docs.
Owing to master node and replication factor instantiation, you will start a cluster with the same number of masters as your replication factor, i.e. 3 master pods for an RF3 instance. The number of tservers can be scaled natively with the
kubectl scale command, for example:
kubectl scale --replicas=5 statefulset/yb-tserver -n yb-demo
Performance Monitoring & Benchmarking
We always recommend testing a given environment in order to profile the performance of the given workload. If you don’t already have a database workload generator that is representative of your intended use-case, you can leverage one from Yugabyte here using:
java -jar target/yb-sample-apps.jar --workload SqlInserts --nodes $YCQL_IP --nouuid --value_size 256 --num_threads_read 32 --num_threads_write 4 --num_unique_keys 1000000000
This could be run either from your workstation or with a
kubectl run… command with a container where you have java installed and can reach our GitHub repo.
kubectl run --image=yugabytedb/yb-sample-apps --restart=Never java-client --workload SqlInserts --nodes $YCQL_IP --num_threads_write 1 --num_threads_read 4 -- value_size 4096
YCQL_IP should be your load-balancer IP with the port number for YCQL from:
kubectl get services -n $NAMESPACE
There are a couple out-of-the-box places to view the performance of the Yugabyte DB instance. You can view this view the yb-master-ui external IP on port 7000.
(base) epsodi:yb-k8s andrewnelson$ kubectl get services -n yb-demo NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE yb-master-ui LoadBalancer 10.104.0.188 126.96.36.199 7000:32241/TCP 48h yb-masters ClusterIP None none 7100/TCP,7000/TCP 48h yb-tservers ClusterIP None none 7100/TCP,9000/TCP,6379/TCP,9042/TCP,5433/TCP 48h yedis-service LoadBalancer 10.104.8.74 188.8.131.52 6379:31683/TCP 48h yql-service LoadBalancer 10.104.12.122 184.108.40.206 9042:31009/TCP 48h ysql-service LoadBalancer 10.104.12.80 220.127.116.11 5433:31442/TCP 48h
Performance counters could also be consumed via the metrics page from the YugaByte DB admin UI.
Another popular option is monitoring is to use Prometheus. Yugabyte presents a Prometheus endpoint via the
yb-master-ui port and
/prometheus-metrics URL. For example:
Common deployment issues involving Yugabyte on Kubernetes are often related to deviating from one of the design principles previously discussed. For example, a pod will typically remain in the ‘pending’ state if there is not enough resources. Basic Kubernetes troubleshooting commands will help point you in the right direction either by running a kubectl describe on the pods and nodes, as well as reviewing each of their applicable event logs with:
kubectl logs yb-master-0
You can also view the native Yugabyte DB logs via an interactive login to a node. You can also achieve this via the yb-master-ui IP and logs html endpoint. For example:
Please raise any issues via our GitHub issue page and tag it with “Kubernetes”. This is extremely useful for our team and other users to track and resolve issues.
In regards to additional ongoing work in the Kubernetes ecosystem, please look for future blogs around Istio, including mTLS integration, our Rook.io storage operator, as well as our OpenShift-compatible operator.
- Using YugaByte at your company? Tell us about it and we’ll send you a hoodie!
- Compare YugaByte DB in depth to databases like CockroachDB, Google Cloud Spanner and MongoDB.
- Get started with YugaByte DB on macOS, Linux, Docker, and Kubernetes.
- Contact us to learn more about licensing, pricing or to schedule a technical overview.