Open Source Databases in the Age of the DBaaS
While I love open source for the freedoms it provides, this was not the reason why I started working with and developing free and open source software. Back at the beginning of my career in 1999, I was a student in Russia starting my first startup and I needed a way to do it in the most cost-effective way possible. Servers were already commoditized by this time and they could be acquired relatively inexpensively — especially if you were frugal enough to obtain three-year-old hardware — but software was not cheap. If you wanted to purchase it legally you would need to pay a lot of greenbacks for Microsoft stack or Oracle database. This being Russia in the late 1990s, you also could buy a pirate version for a couple of bucks, but I wasn’t interested in doing that.
In truth, my first startup only got off the ground because of open source software. Linux, Apache, PHP, MySQL were all in their infancy and not nearly as powerful — and easy to use — as most open source technologies today, but they enabled innovation to flourish which would not have happened otherwise.
Now, more than 20 years later the world has changed and resourceful founders around the world are unlikely to be racking second-hand servers, but would rather start their technology company using cloud services.
However, the same logic applies: if you’re using commodity cloud technology you will likely have many low-cost options available compared to choosing a solution that is only available from one vendor. For example, nowadays if you’re looking for inexpensive Amazon Web Services EC2 alternatives, there are providers such as Linode or Hetzner or cheaper S3 alternatives, and there’s Backblaze which offers more cost-effective storage. Just the fact that you have plausible alternatives means vendors are forced to be more reserved in their pricing model.
If you are using highly differentiated cloud services, which are far from commoditized, you don’t have the same freedom of choice, so you are forced to accept the pricing and service quality of the vendor who offers the solution. In other words, you’re stuck with vendor lock-in.
If we look at databases, the preferred way to consume databases by many is through DBaaS (Database-as-a-Service), rather than installing and managing database software manually. If this is the approach you have embraced, your choices in the cloud will vary, although none of them will offer the same portability that you get from a solution built on open source software and commoditized services:
Undifferentiated open source solution as DBaaS: If you look at MySQL or PostgreSQL, there are DBaaS options from a variety of vendors, including all top cloud providers. All of them will offer compatibility with, more or less, the latest open source software and as such comparable functionality. However, there will be many minor differences such as: the API you use to provision the database; how exactly high availability is implemented; how you monitor the database, and so on. This all means that the effort required to move is higher than for do-it-yourself environments. This issue is acknowledged by cloud vendors and is probably why they have been slowly increasing the price premium in exchange for the convenience of DBaaS versus doing it yourself.
Differentiated/Enhanced open source solution as DBaaS: Cloud vendors are introducing more differentiation by building proprietary “open source compatible products,” which offer you additional functionality compared to the original open source version. Amazon Aurora is perhaps the most well-known database of this kind, but it is not alone. HybridDB for MySQL from Alibaba Cloud is another good example. One of the features those databases tend to offer is better performance, which means that even if you do not think you’re relying on any special features in your applications you probably are. Again, this will mean you have vendor-lock in, albeit a golden cage, but greater hardship if you ever want to move to a different vendor.
Proprietary Cloud Native Databases: There are a myriad of database technologies designed for the cloud which do not have any open source (or shared source) equivalent that you can run yourself. Examples of this are DynamoDB, CosmosDB, Google Cloud Spanner, BigQuery and Snowflake. These solutions are an extreme vendor lock-in, but unlike a wolf in sheep’s clothing, at least there’s no mistaking them for something that resembles open source.
Shared Source DBaaS: Over the last few years, many open source database vendors have changed their licenses to protect themselves from the competition, such as Amazon and other cloud vendors. MongoDB Inc, changed its server license to non-open source Server Side Public License (SSPL) and as a result, MongoDB Atlas does not have any competition in the cloud. It’s hardly surprising the company’s stock price is on the up, but the pockets of MongoDB Atlas users aren’t faring as well. It would be wrong to single out MongoDB here as many other “open source” database companies have changed the license to their key components to some form of Shared Source License, or even a proprietary license, to ensure that competition with them isn’t practical: Redis Labs, Confluent, and Elastic are companies pursuing this strategy. Unlike MongoDB, they have not gone as far as to change licenses for the whole product and so they have cloud competition, albeit with reduced functionality. This is a perfectly legitimate strategy, and it often makes business sense for the companies — but it is resulting in vendor lock-in for users and, in the end, will hit those user’s pockets.
The Path Forward
Open source innovation takes time. It took time for Linux to become the leading server operating system, it took time for Apache and Nginx to lead in the web server market, and for MySQL and PostgreSQL to become the dominant relational databases for new application development. I am sure with time DBaaS will also be commoditized by truly open source solutions, which work on any public or private cloud.
While today there is no open source DBaaS solution that can go toe-to-toe, in terms of simplicity, with offerings from major cloud vendors, we do see the foundations of these technologies being built. For instance, Kubernetes has emerged as the dominant container orchestration solution for data centers, and Terraform enables the ability to abstract the nuances of different cloud vendors even further. We already have a variety of open source projects for Kubernetes operators available for PostgreSQL, MySQL and MariaDB, and Percona offers fully-supported Kubernetes Operator for Percona XtraDB Cluster (MySQL compatible) and Percona Kubernetes Operator for Percona Server for MongoDB. If avoiding vendor lock-in is important to you, this is the direction I would be pursuing to secure the future of open source in the Cloud.