The YugaByte Database Blog

Thoughts on open source, cloud native and distributed databases

YugaByte DB 1.1 New Feature: Speeding Up Queries with Secondary Indexes

Welcome to another post from our ongoing series where we highlight a new feature from the latest 1.1 release! Today we are going to look at secondary indexes.

Defining Secondary Indexes

A database index is a data structure that improves the speed of data retrieval operations on a database table. Typically, databases are very efficient at looking up data by the primary key. A secondary index can be created using one or more columns of a database table, and provides the basis for both rapid random lookups and efficient access of ordered records when querying by those columns. To achieve this, secondary indexes require additional writes and storage space to maintain the index data structure.

The Benefits of Secondary Indexes

Secondary indexes can be used to speed up queries and to enforce uniqueness of values in a column.

Speed up Queries

The predominant use of a secondary index is to make lookups by some column values efficient. Let us take an example of a users table, where user_id is the primary key. Suppose we want to lookup user_id by the email of the user efficiently. You can achieve this as follows.

Next let us insert some data.

You can now query the table by the email of a user efficiently as follows.

Read more about using secondary indexes to speed up queries in this quick guide to YugaByte DB secondary indexes.

Enforce Uniqueness of Column Values

In some cases, you would need to ensure that duplicate values cannot be inserted in a column of a table. You can achieve this in YugaByte DB 1.1 by creating a unique secondary index, where the application does not want duplicate values to be inserted into a column.

Inserts would succeed as long as the email is unique.

But upon inserting a duplicate email, we get an error.

How do Secondary Indexes Work in YugaByte DB?

Secondary indexes in YugaByte DB depend on distributed transactions. They have the following properties:

  • Fully decentralized with no single point of failure. This makes the cluster resilient to various faults when running a workload that utilizes secondary indexes.
  • Linear scalability to enable increasing the throughput/IOPS on demand by simply adding more nodes into the cluster.
  • ACID compliant to ensure always correct results. This means that the primary table and the index table are always in sync.
  • High performance since the index table supports efficient point-queries when looking up by the indexes columns. In addition, you can include additional columns (sometimes referred to as COVERING columns) into the index to further increase performance.
  • Multiple indexes per table are supported to enable use-cases that need to optimize lookups by multiple columns.

The data for each secondary index is an internal table in YugaByte DB. The index tables are sharded into tablets, internally replicated and distributed across nodes much like user tables. When data is inserted or updated into the table, YugaByte DB internally uses a distributed ACID transaction to update both the primary table and all its secondary index tables.

To illustrate the above point, let us take the simple example of a key-value table with an index on the value.

Now let us insert a row into this table.

This effectively results in the following sequence internally.

The high level flow for the above internal transaction is shown below.

Keeping the Secondary Index current using Distributed Transactions

The above ensures that both the tables get updated atomically. Querying by the key reads from the key_value table, while querying by the value would read from the value_idx table.

Next let us update this row.

The above update is equivalent to the following internal transaction.

Note that value_idx’s tablet for v1 and tablet for v2 may be on different nodes altogether, hence the need to delete the old row as well as insert the new row. The above flow diagram will now include updates to the tablet-leader for v2. This highlights the need for database that handles such cases in a manner that is completely transparent from the application developer.

We can verify the update by running the following queries.

What’s Next?

Karthik Ranganathan

Founder & CTO