How to Map Cloud Native Workloads to Kubernetes Controllers
CloudBees sponsored this post.
Kubernetes is more than just a container manager. It’s a platform designed to handle a variety of workloads packaged in any number of containers and combinations. There are multiple controllers built into Kubernetes that map to the layers of cloud native architecture.
DevOps engineers can think of Kubernetes controllers as the means for dictating the infrastructure needs of the various workloads your team is running. They can define the desired configuration state through a declarative approach. For example, a container/pod deployed as a part of a ReplicationController is guaranteed to be available all the time. A container packaged as a DaemonSet is guaranteed to run on every node of the cluster. The declarative approach enables DevOps teams to take advantage of paradigms such as infrastructure as code. Some of the deployment patterns discussed below follow the principles of immutable infrastructure, where each new rollout results in an atomic deployment.
Understanding Cloud Native Use Cases
The control plane of Kubernetes constantly tracks the deployments to ensure that they are adhering to the desired configuration state defined by DevOps.
The fundamental unit of deployment in Kubernetes is a pod. It is the basic building block of Kubernetes, which is the smallest and simplest unit in the Kubernetes object model. A pod represents a running process on the cluster. Irrespective of a service being stateful or stateless, it is always packaged and deployed as a pod.
A controller can create and manage multiple pods within the cluster, handling replication that provides self-healing capabilities at cluster scope. For example, if a node fails, the controller might automatically replace the pod by scheduling an identical replacement on a different node.
Kubernetes comes with multiple controllers to handle the desired state of pods. ReplicationController, Deployment, DaemonSet and StatefulSet are a few examples of controllers. Kubernetes controllers use a pod template that is provided to create the pods for which it is responsible to maintain the desired state. Pods, like other Kubernetes objects, are defined in a YAML file and submitted to the control plane.
When running cloud native applications in Kubernetes, operators need to understand the use cases addressed by controllers to get the most out of the platform. This helps them in defining and maintaining the desired state of configuration of the application.
Each of the patterns explained in the previous section maps to specific Kubernetes controllers which allow more precise, fine-grained control of workloads on Kubernetes, but in an automated fashion.
The declarative configuration of Kubernetes encourages an immutable infrastructure. The deployments are tracked and managed by the control plane to ensure that the desired configuration state is maintained throughout the application lifecycle. When compared to traditional deployments based on virtual machines, DevOps engineers will spend significantly less time maintaining workloads. An effective CI/CD strategy that takes advantage of Kubernetes primitives and deployment patterns frees operators from performing mundane tasks.
Scalable Layer: Stateless Workloads
Stateless workloads are packaged and deployed as a ReplicaSet in Kubernetes. A ReplicationController forms the basis of a ReplicaSet, which ensures that a specified number of pod replicas are always running at any given time. In other words, a ReplicationController makes sure that a pod or a homogeneous set of pods is always up and available.
If there are too many pods, the ReplicationController may terminate the extra pods. If there are too few, the ReplicationController proceeds to launch additional pods. Unlike manually created pods, the pods maintained by a ReplicationController are automatically replaced if they fail, are deleted or terminated. The pods are re-created on a node after disruptive maintenance such as a kernel upgrade. For this reason, it is recommended to use a ReplicationController even if the application requires only a single pod.
A simple use case is to create one ReplicationController object to reliably run one instance of a pod indefinitely. A more complex use case is to run several identical replicas of a scale-out service, such as web servers. DevOps teams and operators package stateless workloads as ReplicationControllers when deploying in Kubernetes.
In the recent versions of Kubernetes, ReplicaSets replaced ReplicationControllers. Both of them address the same scenario, but ReplicaSets use a set-based label selector which makes it possible to use complex queries based on annotations. Additionally, Deployments in Kubernetes rely on ReplicaSets.
Deployments are an abstraction of ReplicaSets. When a desired state is declared in the Deployment object, the Deployment controller changes the actual state to the desired state at a controlled rate.
Deployments are highly recommended to manage stateless services of cloud-native applications. Though services can be deployed as pods and ReplicaSets, Deployments make upgrading and patching your application easier. DevOps teams can upgrade a pod in place using a Deployment, which cannot be done with a ReplicaSet. This makes it possible to roll out a new version of an application with minimal downtime. Deployments bring Platform as a Service (PaaS)-like capabilities to application management.
Durable Layer: Stateful Workloads
Stateful workloads can be classified into two categories: services that need persistent storage (single instance) and services that need to run in a highly reliable and available mode (replicated multi-instance). A pod that needs access to a durable storage backend is very different from a set of pods that run a cluster for a relational database. While the former needs longterm, durable persistence, the latter needs high availability of the workload. Kubernetes addresses both scenarios.
Individual pods can be backed by volumes that expose underlying storage to the services. The volume may be mapped to an arbitrary node on which the pod is scheduled. If multiple pods are scheduled across different nodes of the cluster and need to share the backend, a distributed file system such as Network File System (NFS) or Gluster is configured manually before deploying applications. Modern storage drivers available within the cloud-native ecosystem offer container-native storage where the file system itself is exposed through containers. Use this configuration when pods just need persistence and durability.
For scenarios where high availability is expected, Kubernetes offers StatefulSets — a specialized set of pods that guarantees the ordering and uniqueness of pods. This is especially useful in running primary/secondary — previously known as master/slave — configurations of database clusters.
Like a Deployment, a StatefulSet manages pods that are based on an identical container specification. Unlike a Deployment, a StatefulSet maintains a unique identity for each of its pods. These pods are created from the same spec, but are not interchangeable: Each pod has a persistent identifier that it maintains across any rescheduling.
StatefulSets are useful for workloads that require one or more of the following:
- Stable, unique network identifiers.
- Stable, persistent storage.
- Ordered, graceful deployment and scaling.
- Ordered, graceful deletion and termination.
- Ordered, automated rolling updates.
Kubernetes treats StatefulSets differently than other controllers. When pods of a StatefulSet are being scheduled with N replicas, they are created sequentially, in order from 0 to N-1. When pods of a StatefulSet are being deleted, they are terminated in reverse order, from N-1 to 0. Before a scaling operation is applied to a pod, all of its predecessors must be running and ready. Kubernetes ensures that before a pod is terminated, all of its successors are completely shut down.
StatefulSets are recommended when services need to run clusters of Cassandra, MongoDB, MySQL, PostgreSQL or any database workloads with a high availability requirement.
Not every persistent workload needs to be a StatefulSet. Certain containers rely on a durable storage backend to store data. For adding persistence to these type of applications, pods may rely on volumes backed by either host-based storage or container-native storage backends.
Parallelizable Layer: Batch Processing
Kubernetes has built-in primitives for batch processing, which is useful for executing run to completion jobs or scheduled jobs.
Run to completion jobs are typically used for running processes that need to perform an operation and exit. A big data workload that runs until the data is processed is an example of such a job. Another example is a job that processes each message in a queue until the queue becomes empty.
A Job is a controller that creates one or more pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the Job tracks the successful completions. When a specified number of successful completions is reached, the Job itself is complete. Deleting a Job will clean up the pods it created.
A Job can also be used to run multiple pods in parallel, which makes it ideal for machine learning training jobs. Jobs also support parallel processing of a set of independent but related work items.
When Kubernetes runs on hardware with GPUs, machine learning training can take advantage of Jobs. Emerging projects such as Kubeflow — a project dedicated to making deployment of machine learning on Kubernetes simple, portable and scalable — will expose primitives to package machine learning training as Jobs.
Apart from running parallelized jobs, there may be a need to run scheduled jobs. Kubernetes exposes CronJobs that can run once at a specified point in time or periodically at a specified point in time. A CronJob object in Kubernetes is similar to one line of a crontab (cron table) file in Unix. It runs a job periodically on a given schedule, written in cron format.
Cron jobs are especially useful for scheduling periodic jobs such as database backups or sending emails.
Event-Driven Layer: Serverless
Serverless computing refers to the concept of building and running applications that do not require server management. It describes a more fine-grained deployment model where applications, bundled as one or more functions, are uploaded to a platform and then executed, scaled and billed in response to the exact demand needed at the moment.
Functions as a Service (FaaS) runs within the context of serverless computing to provide event-driven computing. Developers run and manage application code with functions that are triggered by events or HTTP requests. Developers deploy small units of code to the FaaS, which are executed as needed as discrete actions, scaling without the need to manage servers or any other underlying infrastructure.
Though Kubernetes doesn’t have an integrated event-driven primitive that responds to alerts and events raised by other services, there are efforts to bring event-driven capabilities. The Cloud Native Computing Foundation, the custodian of Kubernetes, has a serverless working group focused on these efforts. Open source projects such as Apache OpenWhisk, Fission, Kubeless, OpenFaaS and Oracle’s Fn can be run within a Kubernetes cluster as the event-driven, serverless layer.
Code deployed in the serverless environment is fundamentally different from the code packaged as pods. It consists of autonomous functions that can be wired to one or more events that may trigger the code.
When event-driven computing — serverless computing — becomes an integral part of Kubernetes, developers will be able to deploy functions that respond to both internal events generated by the Kubernetes control plane along with custom events raised by applications services.
Legacy Layer: Headless Services
Even after your organization is regularly building and deploying applications using a microservices architecture into containers on the cloud, there may be applications that continue to live outside of Kubernetes. Cloud-native applications and services will have to interact with those traditional, monolithic applications.
The legacy layer exists for interoperability, to expose a set of headless services pointing to the monolithic applications. Headless services allow developers to reduce coupling to the Kubernetes system by allowing them freedom to do discovery their own way. Headless services in Kubernetes are different from ClusterIP, NodePort and LoadBalancer type of services. They don’t have an internet protocol (IP) address assigned to them, but have a domain name system (DNS) entry that points to an external endpoint such as API servers, web servers and databases. The legacy layer is a logical interoperability layer that maintains DNS records to well-known, external endpoints.
Each layer of a microservices application can be mapped to one of the controllers of Kubernetes. Depending on the pattern they wish to deploy, DevOps teams can choose the appropriate option. In our next article, we’ll discuss some best practices for deploying cloud native applications to Kubernetes.
The Cloud Native Computing Foundation is a sponsor of The New Stack.
Feature image via Pixabay.