Data / Data Science / Kubernetes / Contributed

How Kubernetes and Database Operators Drive the Data Revolution

4 Mar 2022 10:00am, by
Dmitrii Chechetkin
Dmitrii Chechetkin is a senior software engineer and developer advocate at Couchbase. With 14 years of experience as a full-stack software engineer in making web and mobile applications, Dmitrii has a deep knowledge of IT technologies. Prior to Couchbase, Dmitrii was a software architect at the Media Trust, as well as an API Solutions Architect at Marriott International.

These are challenging days for IT. The amount of data our systems need to process grows exponentially. This challenge is further amplified by the growing complexity of data. Information is useless without its context, and context is established by relations between different data points, but each relation also requires logic and processing resources. As a result, demands on our data storage and retrieval systems and their management complexity increases, rendering manual database management practices less and less viable.

Luckily, this is not the first time engineers faced a problem like this. Our history is full of inspiring examples of how to deal with growing demand. From the first windmills through steam engines, and screw-cutting lathes to Ford’s conveyor belt — looking back at the Industrial Revolution, one can conclude that appropriate and successful automation can ignite new levels in productivity and increase economic growth.

Kubernetes as a Lathe

Another example of successful automation comes from the early 2010s. At that time, we faced a similar problem but with software architecture: The internet changed everything about how a user application works. Our first approaches, rooted in applying well-known centralized architectures from the client/server era, did not work. Big, centralized application backends just could not provide the flexibility required to scale from thousands to millions of requests per second. Most of us probably can remember at least a couple of cases when “monolithic” web applications were experiencing severe performance issues after going viral.

The solution to this problem came from adapting an approach where organizations split these monoliths into smaller “micro” services running on docker containers that can be horizontally scaled both independently of each other and much quicker than monoliths. With each microservice adding to the demand on development operations, this strategy, however, wouldn’t be so successful without container orchestration frameworks like Kubernetes. Introduced publicly in 2014, Kubernetes, formerly known inside Google as Borg, quickly proved itself as a top choice for automating deployment workflows and today is one of the industry standards for modern development operations.

Also, being an open source, cloud native component, Kubernetes continues to evolve and improve. Echoing the idea of automated software installation packages, Kubernetes not only abstracts away specific infrastructure implementations but also automates environment creation and deployment procedures. Most organizations that use Kubernetes trust it to run at least 50% of their overall workloads.

Autonomous Operators

Today, we find ourselves in the early days of the data revolution. And, just like at the dawn of the Industrial Revolution, the world is expecting us to meet the demand in data processing by automating the management functions of our data platforms.

When it comes to working with data and databases, automating management operations can give impetus to the growth of any organization relying on data insights and decision-making: stability as well as agility through repeatability. Human operators, while being great at solving problems and innovating, are not that great with routine tasks and quickly become error-prone. Operations tasks like scaling up and down, backups, patching and routine database maintenance are examples of such activities. Hearing “oops, wrong command” from your database administrator can be a nerve-wracking experience.

The problem to solve is to take the best practices of these human operators and efficiently automate them in a standardized way.

Ten years ago, creating an automated management system for databases required a lot of effort, as it had to be built from scratch. This naturally caused the emergence of managed database-as-a-service (DBaaS) solutions. AWS was the first big company to create such a service with their DynamoDB launching in 2012. Following its success, other big players rushed to the new market as well. Using a generic DBaaS, however, has its problems (e.g. vendor lock-in, usage requirements for specific versions, minimal customization for specialized workloads, etc.).

The evolution of Kubernetes into an automation Swiss Army knife changed all that by providing a great and stable software management framework. An especially important milestone in this evolution was support for stateful sets and persistent volumes, as databases are a textbook example of a stateful application.

Employing elements of control theory, Operators work as Kubernetes extensions/plugins and use custom resource definitions (CRDs) to define and control the state of your services. Building your database environment with declarative CRDs is fairly easy: What you type is literally what you get. The operator reads your CRDs of the desired state of the system and not only creates it for you but also monitors the environment using internal events and ensures that the system is always close to the desired state. No more complicated setup scripts — your whole database system is standardized, described in a declarative language (YAML) and is self-explanatory.

Couchbase Autonomous Operator was one of the earliest products to utilize this framework extensively for database automation. Many other community-built database operators for Kubernetes have also become popular in recent years. Several communities and interest groups have also arisen around the technology, for instance, the DoK (Data on Kubernetes) community.

The New Horizons

The rise of DevOps, DBaaS, Kubernetes and Operators creates a compelling end-to-end platform for distributed applications. Developers need not worry about how their code is deployed, or how different components communicate with each other. Instead, developers can concentrate on the data and the logic that governs its evolution to provide improved insights and decision-making abilities for the business. Finally, the same consistent tool/framework can be used for managing all layers of the application stack, including the mission-critical database layer. Freeing important resources of organizations from routine labor-intensive tasks, automation creates space and time for innovation and further progress.

More widely within the industry, the future looks bright. Cloud providers are turning to fully managed services to offer a new business model. With most of these database innovations being open source, some of the cloud providers added a wrapper around those open source technologies and offered it as a DBaaS. Unfortunately, this strategy hasn’t been good news for everybody. It has greatly impacted other database vendors’ revenue, forcing them to change their licenses. While the precise approach has differed from vendor to vendor — with some opting for Business Source Licenses (BSL) and others for Server Side Public Licenses (SSPL) — the end goal has been the same. However, this is an evolving landscape, and it’s hoped it will settle down to a state where it benefits most and original research and innovation are rewarded.

Feature image via Pixabay.