The YugaByte Database Blog

Thoughts on open source, cloud native and distributed databases

Building a High Growth Business by Monetizing Open Source Software

Whenever a venture-funded software infrastructure startup takes the open source route to market, a few questions emerge:

  • What open source license and project governance model will it choose?
  • How will it monetize the open source project?
  • What if AWS, Microsoft Azure or Google Cloud offer the startup’s open source project as a managed service?

At YugaByte, we answered these questions for the open source YugaByte DB project in the following way:

  • Distributed under the highly permissive Apache 2.0 license and managed under an open self-governance model.
  • Open core monetization with the intent of adding a managed service in the future.
  • Compete in the market by building high value products along with fanatical users and awesomely successful customers.

This post explains how we reached the above answers with the goal of highlighting our philosophy around building a high growth business that is also sustainable in the long run.

State of Open Source Infrastructure Software Today

Before we dive into the details, the table below gives us a high level overview of the state of major open source infrastructure projects today. The list is not exhaustive by any means but simply a way to learn from the success of 15+ popular open source projects in 12 infrastructure categories highlighted below.

  • Databases – SQL : MySQL, PostgreSQL
  • Databases – NoSQL: MongoDB, Apache Cassandra
  • In-Memory Cache: Redis
  • Time Series Data Store: InfluxDB
  • Full-Text Search Engine: Elasticsearch
  • Real-Time Streaming: Apache Kafka
  • Big Data Analytics: Apache Hadoop, Apache Spark
  • Platform-as-a-Service: Cloud Foundry
  • Container Runtime: Docker
  • Container Orchestration: Kubernetes
  • Infrastructure Automation: Terraform, Vault, Consul, Nomad
  • Operating System: Linux

We can draw the following conclusions from the above table.

  • The permissive Apache 2.0 is the most popular license, especially in the last 10 years during which multiple open source projects have become indispensable to the software infrastructure at enterprises.
  • Since MIT is also a permissive license similar to Apache 2.0, we can also conclude that GPL-style protective licenses have fallen out of favor starting 2010. In fact, MongoDB is the last major project to choose such a protective license.
  • Open source projects that originate at the internet-scale giants (such as Facebook/Google/Yahoo) or at academia (such as UC Berkeley) tend to be released under permissive licenses. Also, such projects are subsequently donated to a neutral third-party foundation for governance. On the other hand, projects started by direct individuals are usually not governed by foundations since those individuals create a commercial company that defines a self-governance model for the project. Only exception is Linux, which was started by an individual but is still governed by a foundation.
  • Apache Cassandra is the only project in the above list that does not have a primary commercial company. Industry observers would know that DataStax lost that status after falling out with the Apache Software Foundation in 2016 and then later abandoning Apache Cassandra altogether earlier this year (with the introduction of a new proprietary storage engine in DataStax Enterprise 6.0).

Let’s dive deeper into why the above projects made their respective choices and how we approach open source at YugaByte based on those learnings.

Open Source License & Project Governance

Licenses – Permissive vs. Protective

A permissive software license, also known as non-copyleft license, is a free software license with minimal requirements about how the software can be redistributed. Examples include the MIT, BSD, Apple Public Source and the Apache License. On the other hand, a protective software license, also known as copyleft license, is a free software license that allows free distribution of copies and modified versions of a work with the stipulation that the same rights be preserved in derivative works down the line. The most well-known license in this category is GNU General Public License (GPL). Two variants that are also quite popular are Lesser GPL (LGPL) and Affero GPL (AGPL).

Projects pick permissive licenses when the goal is to drive mass adoption by virtue of unburdening users from any contributions back to the project. Protective licenses inherently disallow such mass adoption since users are now forced to release their own software as open source and that too with a similar protective license.

Project Governance – Foundation vs. Self-Governed

Internet-scale giants such as Facebook and Google usually face software problems that cannot be solved by the mass-market tools of today. Given their ability to deploy extremely smart engineers on to any important problem, they often create an entirely new solution. The solution initially matures inside the company and slowly but steadily becomes valuable to other companies looking to solve similar problems. Additionally, the solution becomes a vehicle to recruit even more smart engineers (e.g. Facebook’s open sourcing of Cassandra) or attract even more customers to their commercial products (e.g. Google’s open sourcing of Kubernetes). Enter open source and that too with a permissive license to ensure uninhibited adoption. If the project becomes hugely successful with a wide community, any notion of single party control over the governance becomes a sticking point. This leads to donation of the project to a neutral foundation such as the Apache Software Foundation. Google and Pivotal led the creation of entirely new foundations to house their critical projects – Cloud Native Computing Foundation was created to house Google’s Kubernetes and Cloud Foundry Foundation was created to house Pivotal’s Cloud Foundry.

The above is not the only way to build and scale open source projects. Another equally common way involves one or more committed individuals creating a new open source project to solve an important problem that they experienced themselves in the real-world. MySQL, MongoDB, Elasticsearch, InfluxDB, Docker and the HashiCorp tools are examples. Since there is no internet giant subsidizing their open source project, these individuals have to drive the maturity of their project on their own and they do so usually by founding a venture-backed startup. In the initial days there is no multi-party community with divergent views and neutral foundation-driven governance is deemed unnecessary. Even the foundations themselves do not see a need to house a new open source project. Self-governance becomes the natural course of action.

The YugaByte Way

We built YugaByte DB because many of us experienced first-hand how current generation mission-critical databases fell significantly short on the promised benefits of development agility and operational simplicity. The questions that kept us up at night were: our infrastructure is becoming cloud native, why aren’t our databases? why can’t we have both NoSQL and SQL in a single distributed database? We chose to open source YugaByte DB and distribute it under the permissive Apache 2.0 license. We want our users to contribute back to the open source project through their own volition and not because their hand is forced by a restrictive GPL-like license. Given that our open source project is neither subsidized nor marketed by an internet giant, we are reaching our users through our own evangelism and our project is self-governed. We are excited and thankful that Lightspeed Ventures and Dell Technologies Capital, two of the most respected VC firms in the software infrastructure space, have joined us in our mission to redefine the operational database market.

Monetization Models

There are primarily three non mutually exclusive models of monetizing open source infrastructure software. However, as shown in the figure below, there is a logical ordering that open source companies use to adopt these models.

 

Service, Support & Training

Red Hat is the original pioneer of this model where a company offers paid expertise on an open source project in the form of professional services, customer support and user training. Red Hat Linux, introduced to the market in 1994, is essentially a fork of the community-developed Linux OS that is packaged along with service, training and support for enterprises. Hortonworks and WSO2 are more recent examples of this model. Note that as companies succeed with more scalable monetization models (see below), they usually start deprecating a purely service/support/training model.

Open Core

Open core refers to the approach of commercializing an additional layer of potentially closed source software on top a core software that is fully open source. MySQL (owned by Oracle), MongoDB, Redis Labs, Elastic, Cloudera, InfluxData and many more are following this approach. The additional layer can itself be open source and not necessarily closed source. For example, Redis Labs’ commercial modules and Elastic’s commercial X-Pack are open source but distributed under protective licenses. Another aspect to note here is that companies often start with the service/support model, learn more about what users are willing to pay for and subsequently offer open core products to serve those needs. For example, Confluent started out by providing expertise on mission-critical Apache Kafka deployments and then later introduced its Enterprise offering.

Managed Service

This approach involves running the open source project as a managed service so that users do not have to deal with the challenges involved in operationalizing a mission-critical service. Value-added features usually included in this model are multi-tenancy, hourly billing, analytics dashboards and so on. For example, Databricks offers managed services based on the Apache Spark project. Service/support and open core companies tend to add a managed service offering when the project achieves significant popularity. For example, MongoDB Atlas, Elastic Cloud, Confluent Cloud and InfluxCloud are all managed service offerings from their respective companies.

The YugaByte Way

The service/support/training based monetization model is human capital intensive and is notoriously difficult to scale. There are additional challenges that impede high growth and customer satisfaction. In the near term, the project has to be already popular for enough people to care and need support. Red Hat and Confluent could rely on the existing popularity of Linux and Apache Kafka respectively. In the long-term, sustainability of this model is questionable as highlighted by Elastic’s Why Open Source? (see below for quote). It was easy for us to drop this option altogether for these reasons.

Some open source companies base their bottom line on a support-only business model. We believe this approach puts what’s best for the company and what’s best for the user in direct conflict with one another, where one can only succeed if the other struggles. This approach lacks the incentive to make the product easy to use or empower customers to be successful since the company’s revenue is based upon customers needing regular support.

Between the two remaining options, we opted for open core with the intent of adding a managed service offering down the line. We understand that there are enough open source experts out there who see the open core business model as a corrupting influence. The rationale here is that by design, a select set of features (e.g. encryption and backups in a database) have to be reserved for the commercial product and are therefore disallowed in the open source project. Conflict between the user community and the project maintainers arises whenever a user wants to add the same encryption and backups features to the open source project. This is a conflict that none of us desire. At the same time, we want to ensure a sustainable long term business that avoids the existential challenges faced by companies that came before us. Two examples of database companies that shut down after failing to monetize their open source offerings are RethinkDB in 2016 and Basho (creators of Riak) in 2017. Another example is InfluxData which in 2016 had to pull the database clustering feature away from open source InfluxDB into its commercial products. Paul Dix, Founder & CTO of InfluxData, explains their rationale in this post on open source business models. The important lesson for single vendor open source projects is that a healthy commercial business is a must-have for continued investment in open source. The net result is still more open source than otherwise possible!

So how do we at YugaByte strike the necessary balance between open source and commercial features? For our commercial Enterprise Edition which extends the open source core, we focus on the needs of post-revenue companies. These companies span the spectrum of large enterprises to mid-market startups. They gain a competitive advantage in the market by building user-facing apps on YugaByte DB along with unbelievably easy operations on both public and private clouds, across any number of geographic regions, in a highly secure yet extremely cost effective manner. Enterprise Edition features such as multi-cloud orchestration, remote region read replicas, in-flight/at-rest encryption, comprehensive monitoring and distributed backups power these benefits. Paying us our fair share of the resulting value created so that we continue to build a better product becomes a win-win situation for both sides. For all other companies as well as individual developers, the open source Community Edition is an excellent offering in itself. Detailed comparison between our two editions is listed here.

The Cloud Provider Threat

Over the years major public cloud platforms have built hugely successful managed services by simply running popular open source projects on their platform. This is especially true for data infrastructure where MySQL, PostgreSQL, Redis, Elasticsearch, Apache Hadoop, and Apache Spark are all offered as managed cloud services today. Docker is the most important project outside data infrastructure that is experiencing the same dynamics. Some maintainers and users of these open source projects view the cloud providers as “freeloaders” who are monetizing their hard work without making commensurate contributions back to the project.

Learning From the Past

Cloud providers are able to monetize open source projects only when those projects use permissive licenses such as Apache 2.0 and MIT. For single vendor open source projects, this presents a critical dilemma. Choose a permissive license to drive user adoption with the assumption that such adoption can lead to monetization down the line but also risk monetization getting severely limited because of competition from public cloud platforms hosting your own project.

So how have others dealt with this issue? MongoDB adopted the AGPL license early on so that it can be protected by AGPL’s “Remote Network Interaction” provision that treats a source-code-modified managed service as equivalent to distribution (for purposes of invoking the copyleft provisions of GPL). This essentially makes MongoDB toxic for the likes of AWS and Google Cloud because they can no longer make the code changes at the storage/replication/administration layers that are must-have for cost-effective operations at scale. For smaller providers such as mLab and Compose, this is not a problem because they are simply offering MongoDB as black box software. More recently, Redis Labs realized that the AGPL license on its commercial modules does not offer sufficient protection against the cloud provider threat. This is understandable since the cloud providers can avoid invoking AGPL’s copyleft provisions by simply using these modules as black boxes on top of their existing Redis managed services. Redis Labs changed the license of its modules last month from AGPL to a special “Apache 2.0 modified with Commons Clause” license and intense debates have taken over the airwaves since then.

The YugaByte Way

As a company on the mission to redefine mission-critical databases, a closed source core or a crippleware open source project were unacceptable options to us. That is why YugaByte DB is open source with all the critical features one would expect in a transactional, high performance, geo-distributed database. And by combining this feature richness with a permissive Apache 2.0 license, we have chosen to put user adoption above everything else.

So how would we deal with the cloud provider threat if it were to materialize in the future? We believe the answer lies in building high value products that users and customers love to use. The one company we admire in this regard is Elastic, an open source success story and a 2018 IPO candidate (with ~$160M revenue and 81% YoY growth in 2018). Elastic has its own Elastic Cloud offering that competes directly with the Amazon Elasticsearch service. This blog post from Elastic does an excellent job of highlighting why the customer is better off buying Elastic Cloud. The competitive advantage boils down to a potent combination of “no cloud lock-in” promise, proprietary features, same day releases, higher configurability, lower costs and above all the ability to bank on the expertise of the software creators themselves. If we couple this with a passionate user community as well as well-known reference customers, then it becomes easy to understand how Elastic wins against the cloud providers. In fact, this is how any open source company should compete in the market even if the cloud provider threat was non-existent.

Summary

Matt Klein, engineer at Lyft and creator of the popular Envoy proxy, recently authored an insightful article on the broken economics of open source software. He makes the case for startup-led open source projects moving from a “pure open core” model to a “loose open core” model with increasing popularity. Additionally, engineers who maintain the project should now be compensated via fellowships from foundations that are themselves funded by all such “loose open core” companies. Adam Jacob, Founder & CTO of Chef Software, has also called out the need for a “Community Compact”. Joseph Jacks’s Open Consensus blog series is exploring the issues and opportunities of commercial open source software. Overall, there are many excellent ideas on the table for all of us in the open source community to discuss and debate. As with all things new, implementing these ideas in practice will not be easy initially and broader adoption will kick in only after a few success stories. Till then, we believe the approach we have outlined in this post is the most effective approach for any company similar to ours. We say so with high confidence based on the overwhelming support we have received from our users, partners, and above all, our customers. We thank them from the bottom of our hearts and look forward to keep building products they love for a long time!

What’s Next?

  • Compare YugaByte DB to databases like Amazon DynamoDB, Apache Cassandra, MongoDB and Azure Cosmos DB.
  • Get started with YugaByte DB on macOS, Linux, Docker & Kubernetes.
  • Contact us to learn more about licensing, pricing or to schedule a technical overview.
Sid Choudhury

VP, Product