How to Migrate the Sakila Database from MongoDB to YugabyteDB with Studio 3T

VP Developer Relations

January 23, 2020

Whether it’s consolidation, or modernizing your data infrastructure, data migration is a serious undertaking. Without sufficient planning, countless hours will be spent constructing the proper schema in the target database, and picking the proper ETL tools to help move the data. In this blog we will show you how to migrate the Sakila demo database from MongoDB to a YugabyteDB cluster using the Studio 3T tool.

Why migrate to a distributed SQL database from MongoDB?

There are a variety of reasons to choose a distributed SQL database like YugabyteDB over MongoDB, especially if the current needs of your application no longer match your original requirements.

What’s YugabyteDB? It is an open source, high-performance distributed SQL database built on a scalable and fault-tolerant design inspired by Google Spanner. Yugabyte’s SQL API (YSQL) is PostgreSQL wire compatible.

Here are a few reasons to consider making the switch to a distributed SQL database like YugabyteDB:

Auto-Sharding: In MongoDB, replica sets need to be manually converted into a sharded cluster. YugabyteDB automatically shards data to deliver the right balance of availability and performance the moment the first row of data is inserted.

Synchronous Data Replication: MongoDB offers asynchronous replication where a secondary pulls from a primary. YugabyteDB’s replication is synchronous and Raft-based.

Durable & Fast Writes: In MongoDB, you’ll only get durability with a majority writeConcern, which inherits an asynchronous replication lag. YugabyteDB is durable by default, while delivering lower latency.

Linearizable & Fast Reads: For linearizable reads, MongoDB requires readConcern, which is high latency and requires a quorum at read time. YugabyteDB offers linearizable single-key reads, which are served directly off the shard leader without the need for a quorum.

Distributed Multi-Shard Transactions: MongoDB doesn’t currently offer sharded cluster transactions. YugabyteDB on the other hand, has a built-in transaction manager to coordinate transactions across multiple shards.

Strongly Consistent Global Secondary Indexes: In MongoDB, secondary indexes are not global and have to be updated often, which involves blocking. YugabyteDB’s global secondary indexes leverage distributed transactions and are always consistent.

Prerequisites

Here is the environment and software we’ll be leveraging:

YugabyteDB – Version 2.0.10
MongoDB Community Server – Version 4.2.2
OS – MacOS
Studio 3T – Version 2020.1.2
MongoDB port of the Sakila sample database

Before we import the Sakila JSON files, let’s create three collections in MongoDB:

customers to hold customer JSON data
films to hold film JSON data
stores to hold stores JSON data

To create the sakila database and the necessary collections in MongoDB, you can execute the following commands:

$ mongo sakila

> use sakila
switched to db sakila

> db.createCollection('customers');
{ "ok" : 1 }

> db.createCollection('films');
{ "ok" : 1 }

> db.createCollection('stores');
{ "ok" : 1 }

Sakila sample database

As mentioned, for the purposes of this demo we will be using MySQL’s well known Sakila sample database that features a normalized schema modeling a DVD rental store. It features things like films, actors, film-actor relationships, and a central inventory table that connects films, stores, and rentals. You can get the MongoDB version of Sakila here.

The Sakila Database to Migrate

Now, let’s unpack the compressed JSON files using tar zxvf sakila.tgz and then import the JSON files into MongoDB using the mongoimport utility using the following commands.

$ mongoimport --db sakila --collection customers --file /Users/jguerrero/sakila/customers.json 

2020-01-21T14:36:07.607-0800	connected to: mongodb://localhost/
2020-01-21T14:36:07.739-0800	599 document(s) imported successfully. 0 document(s) failed to import.


$ mongoimport --db sakila --collection films --file /Users/jguerrero/sakila/films.json 

2020-01-21T14:47:17.882-0800	connected to: mongodb://localhost/
2020-01-21T14:47:17.916-0800	1000 document(s) imported successfully. 0 document(s) failed to import.


$ mongoimport --db sakila --collection stores --file /Users/jguerrero/sakila/stores.json 

2020-01-21T14:47:37.097-0800	connected to: mongodb://localhost/
2020-01-21T14:47:37.121-0800	2 document(s) imported successfully. 0 document(s) failed to import.

Migrating data from MongoDB to YugabyteDB

Now that the customers, films, and stores datasets are loaded into MongoDB, let’s start the process of migrating the data into YugabyteDB. The tool that we will be using for this demo is Studio 3T. However, there are several ETL tools on the market that can serve the same purpose and we’ll be covering a few of them in upcoming blogs.

Installing a target YugabyteDB cluster

For complete instructions on how to get up and running on a variety of platforms including prerequisites, check out our Quickstart Guide. In the following section we’ll cover the basic steps for getting up and running in just a few minutes with a local 3 node cluster on your Mac laptop.

Download and extract YugabyteDB

$ wget https://downloads.yugabyte.com/yugabyte-2.0.10.0-darwin.tar.gz
$ tar xvfz yugabyte-2.0.10.0-darwin.tar.gz && cd yugabyte-2.0.10.0/

Note: At the time of this writing, 2.0.10 is the latest release of YugabyteDB. Make sure to check the Quickstart Guide for the latest version.

Configure loopback addresses

Add a few loopback IP addresses for the various YugabyteDB processes to use.

$ sudo ifconfig lo0 alias 127.0.0.2
$ sudo ifconfig lo0 alias 127.0.0.3
$ sudo ifconfig lo0 alias 127.0.0.4
$ sudo ifconfig lo0 alias 127.0.0.5
$ sudo ifconfig lo0 alias 127.0.0.6
$ sudo ifconfig lo0 alias 127.0.0.7

Create a 3 node cluster and Sakila database

With the command below, create a 3 node cluster with a replication factor of 3.

$ ./bin/yb-ctl --rf 3 create

You can verify that your cluster is up and running by checking out the YugabyteDB Admin UI, which is located at:

https://127.0.0.1:7000/

Verify your cluster on the YugabyteDB Admin UI

With the command below, enter the YSQL shell and create the sakila database:

$ ./bin/ysqlsh

yugabyte=# CREATE DATABASE sakila;
CREATE DATABASE

About Studio 3T

Studio 3T is a graphical IDE for MongoDB available for Windows, Mac, and Linux. It helps DBAs speed up tasks like query building, data exploration, import/export, code generation, and more–with or without the knowledge of the MongoDB query language. Studio 3T is a commercial product with various editions available for purchase, all available to try with a free trial period.

Studio 3T migrates from MongoDB to distributed SQL database

Exporting and importing with Studio 3T

With Studio 3T, they’ve made it easy to export data between MongoDB and PostgreSQL. Because YugabyteDB is a PostgreSQL compatible database, you can export the MongoDB collections we created to SQL files. We can then execute these SQL files against YugabyteDB and rebuild the collections in a relational format.

To export a MongoDB collection, use the export option in the menu (shown in red below), and select the collections to be exported. In this case, we are exporting the customers collection.

Export from MongoDB to distributed SQL

Select the target export format. For the purposes of our demo, we’ll select SQL.

Next, let’s name the target table customers and check the boxes to include the ‘CREATE TABLE’ and ‘DROP TABLE IF EXISTS’ statements to the generated SQL script.

Format dataset for distributed SQL database

In the next step, ensure that the mappings of the columns are accurate. For example, Rentals is a JSON subdocument in the source, which is mapped into a JSON type column in the target YugabyteDB database.

Column mappings from MongoDB to distributed SQL YugabyteDB example

Here a snippet from the corresponding SQL file that gets generated for the customers collection:

DROP TABLE IF EXISTS "customers";
CREATE TABLE "customers" (
    "Address" TEXT,
    "City" TEXT,
    "Country" TEXT,
    "District" TEXT,
    "First Name" TEXT,
    "Last Name" TEXT,
    "Phone" TEXT,
    "_id" INT NOT NULL,
    "Rentals" JSON,
    PRIMARY KEY ("_id")
);

Note that in the above DDL statement, Rentals is a column with type JSON. This means an entire unstructured JSON document can be put into that column and queried. In YugabyteDB, the JSON data type represents the exact text format of JSON while the JSONB data type represents its binary format. YugabyteDB supports both JSONB and JSON data types. For more information about working with JSON data types in YugabyteDB, check out:

https://docs.yugabyte.com/latest/develop/learn/data-types/

Repeat the above steps for the films and stores collections.

Creating tables and loading data into YugabyteDB

Now that you have your SQL files, return to the YSQL shell and switch to the sakila database:

yugabyte=# \c sakila
You are now connected to database "sakila" as user "yugabyte".

Execute the sakila.customers.sql file:

sakila=# \i /Users/jguerrero/sakila.customers.sql

You can verify the data by running:

sakila=# select count(*) from customers;
 count 
-------
   599
(1 row)

Or, by browsing it in your favorite GUI tool, like DBeaver:

Verify data in DBeaver distributed SQL GUI example

Repeat the above steps for the films and stores SQL files by running:

\i /Users/jguerrero/sakila.films.sql

and

\i /Users/jguerrero/sakila.stores.sql

Conclusion

That’s it! With the right database tools such as Studio 3T, you can easily migrate data from MongoDB to YugabyteDB within minutes. Tools like Studio 3T are especially useful if your database requirements have evolved beyond the capabilities of MongoDB to something like a distributed SQL database.

What’s Next?

Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner, and MongoDB.
Get started with YugabyteDB on macOS, Linux, Docker, and Kubernetes.
Contact us to learn more about licensing, pricing, or to schedule a technical overview.

January 23, 2020

How to Migrate the Sakila Database from MongoDB to YugabyteDB with Studio 3T

Related Posts

Explore Distributed SQL and YugabyteDB in Depth