The Distributed SQL Blog

Thoughts on distributed databases, open source and cloud native

Distributed SQL Tips and Tricks – March 13, 2020

Welcome to this week’s tips and tricks blog where we recap some distributed SQL questions from around the Internet. We’ll also review upcoming events, new documentation, and blogs that have been published since the last post. Got questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stackoverflow. Ok, let’s dive right in.

How can I UPSERT multiple rows with an update?

PostgreSQL and YSQL enable you to do upserts using the INSERT ON CONFLICT clause. Similar to multi-row inserts, you can also batch multiple upserts in a single INSERT ON CONFLICT statement for better performance.

In case the row already exists, you can access the existing values using EXCLUDED.<column_name> in the query.

In the example below we have a table with where we are keeping counters of different products and incrementing rows in batches:

How can I optimize YugabyteDB for faster integration testing

There are several steps that we can take to make YugabyteDB faster when testing in integration tests, CI/CD, and other testing scenarios:

  • Point gflag --fs_data_dirs to a ramdisk directory
    This will make DML, DDL and create/destroy of a cluster faster because data is not written to disk.
  • Set gflag --yb_num_shards_per_tserver=1
    Reducing the number of shards lowers overhead when creating/dropping YCQL tables.
  • Set gflag --ysql_num_shards_per_tserver here=1
    Reducing the number of shards lowers overhead when creating/dropping YSQL tables.
  • Set gflag --replication_factor=1
    Keeping only the tablet leaders will remove replication overhead.
  • Use TRUNCATE instead of DROP TABLE followed by CREATE TABLE between test cases if you want to keep the same table schema but delete all the rows in it.

These options are all explained in the yb-tserver and yb-master configuration pages.

What is the YSQL equivalent to in SQL Server’s ISNULL()?

The expression ISNULL(expression, replacement) in MS SQL Server returns the replacement value if the expression returns NULL.

The same functionality can be achieved with the coalesce(arg1,arg2,argn) function. The function takes an arbitrary number of arguments and returns the first value that IS NOT NULL. Example in SQL Server:

The same functionality in YSQL:

How can I find duplicate rows using YSQL?

In many cases we want to find duplicate rows in a table to clean them up or process them. This can be easily achieved with the GROUP BY and HAVING clauses.

Suppose we have a table of employees with the following data:

You can find duplicate rows using:

You can also remove duplicate rows using DISTINCT ON:

DISTINCT ON(expression [, ...]) selects only the first row of each set of rows where the given expression (in this case the combination of first_name and last_name) are the same.

New Blogs, Tutorials, and Videos

New Blogs

New Videos

New and Updated Docs

We Are Hiring!

Yugabyte is growing fast and we’d like you to help us keep the momentum going! Check out our currently open positions:

Our team consists of domain experts from leading software companies such as Facebook, Oracle, Nutanix, Google, and LinkedIn. We have come a long way in a short time but we cannot rest on our past accomplishments. We need your ideas and skills to make us better at every function that is necessary to create the next great software company. All while having tons of fun and blazing new trails!

Get Started

Ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our quickstart page to get started.

High performance distributed SQL by Yugabyte

What’s Next?

  • Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner and MongoDB.
  • Get started with YugabyteDB on macOS, Linux, Docker and Kubernetes.
  • Contact us to learn more about licensing, pricing or to schedule a technical overview.

Related Posts