The Distributed SQL Blog

Thoughts on distributed databases, open source, and cloud native

Distributed SQL Tips and Tricks – April 21, 2021

Welcome back to our bi-weekly tips and tricks blog where we recap some distributed SQL questions from around the Internet. We’ll also review upcoming events, new documentation, and blogs that have been published recently. Got questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow. Let’s get into it:

Importing a Large Table Using Smaller Transactions

When first playing around with YugabyteDB many users import existing data from other databases, CSV files, etc. in order to test out functionality and performance. Occasionally with the right combination of low memory (running on a small memory machine), and high row size (large number of rows in the file that is being imported), one can run into an error such as the one below:

ERROR - IO error: Errors occurred while reaching out to the tablet servers: . Errors from tablet servers: [Remote error (yb/rpc/ Service unavailable (yb/rpc/ Call rejected due to memory pressure: Call yb.tserver.TabletServerService.Write => (request call id 123), 

Let’s run through an example. Say you are running a single node cluster with 1 YB-Master server, 1 YB-TServer, and a replication factor of 1. The data format we are looking to import looks like the following:

CREATE TABLE large_table( f1 int primary key, 
                          f2 char(200) , f3 text , f4 text );

Where f3 and f4 are long strings. When using the YSQL API and the COPY FROM command we are able to import all 1000 rows in the CSV. However, it breaks with 1500 rows with the error called out above. This is more than likely because your row size is too big for the amount of memory you are running on the machine. In such a case use the ROWS_PER_TRANSACTION attribute and set it to load in batches of smaller number of rows.

COPY large_table FROM '/home/yuga/Desktop/large_table.csv'

You can find more info on using the COPY command here, including how to import a large table using smaller transactions.

Zero Downtime Migrations with YugabyteDB

When looking to migrate from one database to another there are typically a handful of questions one must ask while evaluating the different migration options, such as: what is my source and target, how much downtime can we afford, how familiar are we with the tools, and what is the total cost? This being said, there are a number of ways to migrate to YugabyteDB from your source systems. The process of migrating to YugabyteDB largely looks similar to the process of migrating an application to PostgreSQL, however there are some minor differences. If downtime is not an issue, the recommended way to migrate your data to YugabyteDB is through a CSV import/export. You can perform these actions either manually or programmatically.

If you have a requirement for zero downtime one available avenue is the use of Blitzz, where YugabyteDB is a certified ecosystem partner. Blitzz can be used to migrate from a variety of databases like Oracle, SQL Server, MySQL, MongoDB, DynamoDB, Cassandra, and Postgres as outlined in this blog post. Another available solution is the open source distributed platform for change data capture by the name of Debezium.

Apart from this you can try approaches to reduce the window of downtime, as well as perform preliminary setups beforehand on the new setup to avoid any surprises at the time of actual data migrations. The usual pg_dump/pg_restore will work with YugabyteDB. Apart from that we are working on making pgloader available for YugabyteDB as well. You can track the status of the GitHub issue here.

Changing Default Ports on YugabyteDB

When using YugabyteDB there are default ports that need to be opened in order to allow different types of access such as client API communication, internode RPC communication, access to the UI web servers, common firewall rules, and Prometheus monitoring endpoints. In order to change these default port settings you can add flags to your configuration file in order to make the necessary adjustments. When would this be necessary? Let us say for example that your application uses port 9000, which is the default port for the yb-tserver web UI. Unless you change the default port of one or the other you will run into conflicts. If you decide to change the default port of the server rather than your application, you can do so by setting the --webserver_port flag for the yb-tserver process as shown here. In order to keep this consistent you can add it to your configuration file likewise.

Configuring Helm Chart Values for YugabyteDB Deployment

When using Helm charts to deploy YugabyteDB on Kubernetes there are default values set for the Helm chart that can be found in the helm/yugabyte/values.yaml file. The default values within this file are as noted in the prerequisites section of the “Deploy a YugabyteDB cluster on OSS Kubernetes using Helm Chart” documentation. Instead of using the default values in the Helm chart, you can also modify the configuration of the YugabyteDB cluster according to your specific requirements. An example of such a modification can be found below:

helm upgrade --set resource.tserver.requests.cpu=8,resource.tserver.requests.memory=15Gi yb-demo ./yugabyte

Also, see the values.yaml file within our GitHub page as a reference of (most) of the configurable Helm chart values.

New Documentation, Blogs, Tutorials, and Videos

New Blogs

New Videos

Upcoming Events

New and Updated Docs

We’re continually adding to and updating the documentation to give you the information you need to make the most out of YugabyteDB. We had so many new and updated docs for you, that we wrote a blog post this month to cover recent content added, and changes made, to the YugabyteDB documentation – What’s New and Improved in YugabyteDB Docs, Mar 2021.

Get Started

Ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our quickstart page to get started.

Related Posts