Cloud Native / Data / Kubernetes / Sponsored / Contributed

The Perfect Pair: Kubernetes and Distributed SQL

10 Apr 2020 9:45am, by

KubeCon + CloudNativeCon sponsored this post, in anticipation of KubeCon + CloudNativeCon EU, in Amsterdam, Aug. 13-16.

Karthik Ranganathan
Karthik, co-founder and CTO of Yugabyte, was one of the original database engineers at Facebook responsible for building distributed databases including Cassandra and HBase. He is an Apache HBase committer, and also an early contributor to Cassandra, before it was open-sourced by Facebook. He is currently co-founder and CTO of the company behind YugabyteDB, a fully open-source distributed SQL database for building cloud native and geo-distributed applications.

A number of reasons account for containers’ popularity. Many credit their growth to the numerous benefits containers provide for developers, DevOps teams and enterprises running (or looking to run) modern, microservices-based cloud native applications. Containerization helps organizations become more flexible, move faster and gain freedom in their choice of the underlying infrastructure. In fact, it is predicted that more than 50% of companies will use container technology this year, up from less than 20% in 2017.

The power of containers stems from their portability, agility and ability to enable consistency across application environments. Today, containers and container orchestration technologies enable enterprises to adopt Infrastructure as Code (IaaC) and accelerate the pace at which they can take an app from development to test to production. But looking back a decade or so ago, the IaaC movement required … a lot more coding. A typical enterprise had data centers and Amazon Web Services (AWS), and had to write a lot of code just to deploy their applications to those locations because different infrastructure providers had unique nuances to consider. DevOps teams spent a significant amount of time and effort writing code to decouple the app from the underlying infrastructure and before they were able to deploy the app across different environments.

Then a few years later, the landscape changed — enter Kubernetes (on the heels of Docker before it), and more public clouds, including Azure and Google Cloud. Almost overnight, the old IaaC era, even though it was better than the “everything manual” era before it, was no longer agile enough to develop applications, and test and deploy them consistently from the laptop and across application environments, hosted in the cloud or multiple clouds and on-premises.

Kubernetes is the technology that allowed the old era of IaaC to be replaced with a new, faster way of deploying and operating infrastructure. Over the years, much has been written about the war between the top-three container orchestration contenders: Kubernetes, Docker Swarm and Apache Mesos. However, there is no denying that Kubernetes has been named the king by consistently holding the leading position as the most widely deployed container orchestration technology due to its open source software and diverse community of developers. Today, using Kubernetes, developers and DevOps engineers are truly able to build one and deploy many times because, for example, the way to ask for disk on AWS is similar to the way on Google Cloud.

The IaaC DevOps movement has evolved hand in hand with trends we also see in application development. Transactional and user-facing applications have increasingly required higher availability, instant scalability, the ability to run anywhere (including multicloud and hybrid cloud environments) and operational simplicity in order to more easily operate an application throughout its lifetime.

As infrastructure and the applications themselves have evolved, so too has the data tier. SQL has been the de facto language for relational databases (aka RDBMS) for decades and decades. However, the original SQL databases like Oracle, PostgreSQL and MySQL are single-node SQL solutions and are unable to distribute data and queries across multiple instances automatically to provide high availability and scale. On the path to scalability and resilience, NoSQL databases like MongoDB and Apache Cassandra came into prominence in the mid-to-late 2000s. They were originally positioned as alternatives to the monolithic SQL databases of the time, and their distributed nature was attractive to applications and application developers.

The various NoSQL languages focused on single-row (aka key-value) data models and gave up on the relational/multi-row constructs of the SQL language. However, enterprises quickly realized NoSQL databases have to coexist alongside SQL databases rather than replace them. The primary reason for the continued need of SQL databases was the need for relational data modeling with support for single-row consistency as well as multi-row ACID transactions. The early 2010s saw the advent of NewSQL databases, also known as “scalable” SQL databases to support large-scale OLTP workloads where both data correctness and scalability were important; however, even NewSQL databases come with compromises, especially in Kubernetes-native, multicloud deployments.

In turn, enterprises have turned to distributed SQL databases to gain the combined capabilities of traditional single-node SQL systems: strong consistency, ACID transactions and support for the SQL syntax, the distributed nature of NoSQL and the scalability of NewSQL.

With Kubernetes-driven orchestration of containerized applications, enterprises get the ability to automatically scale services, make them fault-tolerant, deploy upgrades with no downtime and more. This all makes sense when the application is stateless — control is complete with Kubernetes and Kubernetes does the entire lifecycle.

When it comes to stateful applications (applications that store data; a database is one example), Kubernetes can offer scaling, fault tolerance and other benefits, but the stateful app itself needs to be orchestration ready and deliver on those promises as well. The stateful app has to be ready to be scalable and fault-tolerant, all without losing data.

A SQL database is a stateful application and is one of the most complex workloads to run in Kubernetes. The ephemeral nature of Kubernetes pods and the constant need to reschedule them onto a new Kubernetes host requires the underlying database tier to also become equally agile. Otherwise, the application will see outages, slowdowns and worse of all, data loss and incorrect results. Most stateful SQL databases cannot derive the benefits of Kubernetes; developers have to essentially tell Kubernetes not to apply those benefits because the database can’t handle it.

However, a distributed SQL database can solve these challenges, and enables enterprises to take advantage of the inherent benefits that Kubernetes offers. For example, the distributed SQL YugabyteDB database guarantees that applications never experience outages, slowdowns, or data loss scenarios by constantly monitoring and re-balancing the data shards across the available nodes, even in a highly dynamic environment such as a Kubernetes cluster.

As enterprises move from cloud-hosted to cloud native environments, the way DevOps builds applications and stores globally distributed data is changing. With Kubernetes as the industry’s standard container orchestration system, developers can efficiently build more complex applications and data storage environments. By pairing your containerized environments with distributed SQL, this process becomes much easier, allowing for enterprises to solve business problems and enable true digital transformation.

To learn more about containerized infrastructure and cloud native technologies, consider coming to KubeCon + CloudNativeCon EU, in Amsterdam Aug. 13 – 16.

Cloud Native Computing Foundation, which manages KubeCon + CloudNativeCon, is a sponsor of The New Stack.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.