The Distributed SQL Blog

Thoughts on distributed databases, open source, and cloud native

Secondary Index Migration in YugabyteDB

Introduction

YugabyteDB has a great feature in strongly consistent secondary indexes. I have been asked a few times now about how to modify existing indexes. These can range from adding another column to the index or adding another column to the INCLUDE clause. The big question at the end of the day is: how to do this without having to take downtime on the indexes? In this blog post, using the Yugabyte CQL API we will look at how to switch a live running application from using one index to another without a performance hit during the transition, and without the need for an application restart or for re-preparing prepared statements.

The main issue

Currently there is no way to modify a secondary index. In order to change the columns that are included you will need to drop and re-create the index with the columns that need to be included. This can take time (to backfill the index), as well as possibly having to cycle your app servers. Thankfully, there’s a way to achieve the desired end state without taking a downtime, and this blog post will describe how to do so.

Migrating the index

Let us suppose that we have a user_profile table where user_id is the primary key. And that we have an index as shown below on the email column:

CREATE TABLE user_profile(
    user_id int PRIMARY KEY,
    first_name text,
    last_name text,
    email text
) WITH default_time_to_live = 0
  AND transactions = {'enabled': 'true'};
 
CREATE INDEX email_idx ON user_profile (email)
  AND transactions = {'enabled': 'true'};

Now, let’s assume we have a query such as:

SELECT first_name, last_name 
  FROM user_profiler 
 WHERE email = '[email protected]';

For the above query, the system would first need to look into the index using the email address to find the primary key. Then, using the primary key, the system would then look up the main table to retrieve the first_name and last_name columns. This can be a little inefficient since we need to hit both the index and the main table for a non primary key query. In order to avoid the second lookup where we hit the main table we can modify the index to include the ability for the index to “cover” the additional columns (first_name, last_name). We can do this by using the INCLUDE clause.

In YCQL, we do not have the ability to modify indexes. You cannot run an ALTER INDEX to try and update the index to include more columns, or modify its current structure. Given that, there are two possible options. One is to drop the index and recreate an index with the same name in the desired format (with the desired set of columns). The rebuilding of the index though could take time, and the application performance might suffer as a result. The other option is to create a new index, and then drop the old index once the new index has backfilled. For example:

CREATE INDEX email_idx_new ON user_profile(email)
       INCLUDE (first_name, last_name);

The above would create the new index on the same table, but this time we will include the first and last name columns as part of the index. As described above, this would make the query more efficient since we are not returning to the main table to find the additional values. We make one query that goes to the index and retrieves everything that we require. Once we have created this index and it has backfilled, we can drop the old one without restarting our applications or taking downtime.  

There are pros and cons to both methods discussed. When dropping and recreating an index your application performance can suffer until the index populates. However, if downtime can be tolerated this is not an issue. The approach of adding a new index first, on the other hand, will lead to more disk space being used temporarily. This is because we are storing the additional index and the current one until we decide to drop the original index. Also, we would have a penalty at write time since we will now be requiring an additional write to our newly created index. However, with this approach the application performance will not suffer during the transition.

Please note that it is best to test this thoroughly in a lower environment before doing this in production. Also note that on larger tables the initial index creation may timeout on the client side, which is OK. It will still be created and backfilled.

Conclusion

Although currently there is no way to alter a secondary index, we can make the transition between new index needs and the original without having to take downtime. YugabyteDB makes migrating secondary indexes easy and seamless with the above example. I always recommend testing this kind of procedure in your development environment before moving to production.  We have another great blog about secondary indexes that you can read about here: A Quick Guide to Secondary Indexes in YugabyteDB

We at Yugabyte are always here to help should you run into any issues. You can connect with us on our community Slack channel: yugabyte.com/slack.

Related Posts