The Distributed SQL Blog

Thoughts on distributed databases, open source and cloud native

How to Migrate the Sakila Database from MongoDB to Distributed SQL with Studio 3T

VP Developer Relations

Whether it’s consolidation, or modernizing your data infrastructure, data migration is a serious undertaking. Without sufficient planning, countless hours will be spent constructing the proper schema in the target database, and picking the proper ETL tools to help move the data. In this blog we will show you how to migrate the Sakila demo database from MongoDB to a YugabyteDB cluster using the Studio 3T tool.

Why migrate to a distributed SQL database from MongoDB?

There are a variety of reasons to choose a distributed SQL database like YugabyteDB over MongoDB, especially if the current needs of your application no longer match your original requirements.

What’s YugabyteDB? It is an open source, high-performance distributed SQL database built on a scalable and fault-tolerant design inspired by Google Spanner. Yugabyte’s SQL API (YSQL) is PostgreSQL wire compatible.

Here are a few reasons to consider making the switch to a distributed SQL database like YugabyteDB:

Auto-Sharding: In MongoDB, replica sets need to be manually converted into a sharded cluster. YugabyteDB automatically shards data to deliver the right balance of availability and performance the moment the first row of data is inserted.

Synchronous Data Replication: MongoDB offers asynchronous replication where a secondary pulls from a primary. YugabyteDB’s replication is synchronous and Raft-based.

Durable & Fast Writes: In MongoDB, you’ll only get durability with a majority writeConcern, which inherits an asynchronous replication lag. YugabyteDB is durable by default, while delivering lower latency.

Linearizable & Fast Reads: For linearizable reads, MongoDB requires readConcern, which is high latency and requires a quorum at read time. YugabyteDB offers linearizable single-key reads, which are served directly off the shard leader without the need for a quorum.

Distributed Multi-Shard Transactions: MongoDB doesn’t currently offer sharded cluster transactions. YugabyteDB on the other hand, has a built-in transaction manager to coordinate transactions across multiple shards.

Strongly Consistent Global Secondary Indexes: In MongoDB, secondary indexes are not global and have to be updated often, which involves blocking. YugabyteDB’s global secondary indexes leverage distributed transactions and are always consistent.

Prerequisites

Here is the environment and software we’ll be leveraging:

Before we import the Sakila JSON files, let’s create three collections in MongoDB:

  • customers to hold customer JSON data
  • films to hold film JSON data
  • stores to hold stores JSON data

To create the sakila database and the necessary collections in MongoDB, you can execute the following commands:

Sakila sample database

As mentioned, for the purposes of this demo we will be using MySQL’s well known Sakila sample database that features a normalized schema modeling a DVD rental store. It features things like films, actors, film-actor relationships, and a central inventory table that connects films, stores, and rentals. You can get the MongoDB version of Sakila here.

The Sakila Database to Migrate

Now, let’s unpack the compressed JSON files using tar zxvf sakila.tgz and then import the JSON files into MongoDB using the mongoimport utility using the following commands.

Migrating data from MongoDB to YugabyteDB

Now that the customers, films, and stores datasets are loaded into MongoDB, let’s start the process of migrating the data into YugabyteDB. The tool that we will be using for this demo is Studio 3T. However, there are several ETL tools on the market that can serve the same purpose and we’ll be covering a few of them in upcoming blogs.

Installing a target YugabyteDB cluster

For complete instructions on how to get up and running on a variety of platforms including prerequisites, check out our Quickstart Guide. In the following section we’ll cover the basic steps for getting up and running in just a few minutes with a local 3 node cluster on your Mac laptop.

Download and extract YugabyteDB

Note: At the time of this writing, 2.0.10 is the latest release of YugabyteDB. Make sure to check the Quickstart Guide for the latest version.

Configure loopback addresses

Add a few loopback IP addresses for the various YugabyteDB processes to use.

Create a 3 node cluster and Sakila database

With the command below, create a 3 node cluster with a replication factor of 3.

You can verify that your cluster is up and running by checking out the YugabyteDB Admin UI, which is located at:

http://127.0.0.1:7000/

Verify your cluster on the YugabyteDB Admin UI

With the command below, enter the YSQL shell and create the sakila database:

About Studio 3T

Studio 3T is a graphical IDE for MongoDB available for Windows, Mac, and Linux. It helps DBAs speed up tasks like query building, data exploration, import/export, code generation, and more–with or without the knowledge of the MongoDB query language. Studio 3T is a commercial product with various editions available for purchase, all available to try with a free trial period.

Studio 3T migrates from MongoDB to distributed SQL database

Exporting and importing with Studio 3T

With Studio 3T, they’ve made it easy to export data between MongoDB and PostgreSQL. Because YugabyteDB is a PostgreSQL compatible database, you can export the MongoDB collections we created to SQL files. We can then execute these SQL files against YugabyteDB and rebuild the collections in a relational format.

To export a MongoDB collection, use the export option in the menu (shown in red below), and select the collections to be exported. In this case, we are exporting the customers collection.

Export from MongoDB to distributed SQL

Select the target export format. For the purposes of our demo, we’ll select SQL.

Next, let’s name the target table customers and check the boxes to include the ‘CREATE TABLE’ and ‘DROP TABLE IF EXISTS’ statements to the generated SQL script.

Format dataset for distributed SQL database

In the next step, ensure that the mappings of the columns are accurate. For example, Rentals is a JSON subdocument in the source, which is mapped into a JSON type column in the target YugabyteDB database.

Column mappings from MongoDB to distributed SQL YugabyteDB example

Here a snippet from the corresponding SQL file that gets generated for the customers collection:

Note that in the above DDL statement, Rentals is a column with type JSON. This means an entire unstructured JSON document can be put into that column and queried. In YugabyteDB, the JSON data type represents the exact text format of JSON while the JSONB data type represents its binary format. YugabyteDB supports both JSONB and JSON data types. For more information about working with JSON data types in YugabyteDB, check out:

https://docs.yugabyte.com/latest/develop/learn/data-types/

Repeat the above steps for the films and stores collections.

Creating tables and loading data into YugabyteDB

Now that you have your SQL files, return to the YSQL shell and switch to the sakila database:

Execute the sakila.customers.sql file:

You can verify the data by running:

Or, by browsing it in your favorite GUI tool, like DBeaver:

Verify data in DBeaver distributed SQL GUI example

Repeat the above steps for the films and stores SQL files by running:

and

Conclusion

That’s it! With the right database tools such as Studio 3T, you can easily migrate data from MongoDB to YugaByteDB within minutes. Tools like Studio 3T are especially useful if your database requirements have evolved beyond the capabilities of MongoDB to something like a distributed SQL database.

What’s Next?

  • Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner, and MongoDB.
  • Get started with YugabyteDB on macOS, Linux, Docker, and Kubernetes.
  • Contact us to learn more about licensing, pricing, or to schedule a technical overview.

Related Posts

VP Developer Relations