The Distributed SQL Blog

Thoughts on distributed databases, open source and cloud native

Announcing the Kafka Connect YugabyteDB Sink Connector

For customers that run Kafka for their streaming data platform, the Kafka Connect Sink plugin handles delivery of specific topic data to a YugabyteDB instance. As soon as new messages are published, the Sink manages forwarding and automatic addition to a destination table.

YugabyteDB is a high-performance, distributed SQL database built on a scalable and fault-tolerant design inspired by Google Spanner. Yugabyte’s SQL API (YSQL) supports most of PostgreSQL’s functionality and is wire-protocol compatible with PostgreSQL drivers.

Apache Kafka is a community distributed streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged event streaming platform.

Kafka Connect allows movement of data in and out of Kafka easily. In this case, the Kafka Sink Connector specifically takes topic data as a source and delivers it to a YugabyteDB as its destination. Benefits of this service are:

  • Simple data abstraction for forwarding data from Kafka to Yugabyte.
  • Flexible and scalable for many-to-many interactions between multiple topics and multiple YugabyteDB instances.
  • Re-use and customize the YugabyteDB Sink Connector for individual customer use cases for their data pipelines.

Getting Started

There are several easy ways to get started using the integration components of Kafka and YugabyteDB. It is also very easy to test the Sink Connector in a distributed environment. To start your local YugabyteDB cluster, please refer to the desired quick start guide for your chosen environment.

Prepare your environment to compile the YugabyteDB Kafka Sink Connector. For example, with a basic Debian/GNU Linux 9 GCP image:

Download a copy of our connector via Github and setup the connector and environment libs:

For a non-production environment, start your Kafka instance from the CLI and fork the process with “&”:

Create a test topic in Kafka:

Ensure your Kafka Sink Connector is configured to point to your YugabyteDB instance:

Prep your YugabyteDB for entries made by the Kafka Sink Connector. Ensure your $CQLSH_HOST is set to your target IP of your YugabyteDB instance:

Load the YugabyteDB Kafka Sink Connector:

Create some events in the sample topic to be processed:

Verify that the events were consumed by the YugabyteDB:

For more information, please see YugabyteDB’s Kafka documentation or on our Github. For any questions, please join the integration channel on our Slack instance. For support for our Kafka sink connector, please use our Github issues.

What’s Next?

  • Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner and MongoDB.
  • Get started with YugabyteDB on macOS, Linux, Docker, and Kubernetes.
  • Contact us to learn more about licensing, pricing or to schedule a technical overview.


Related Posts