The Linux Foundation sponsored this post.
This article is part of a series by speakers at the upcoming Open Source Summit, coming to Vancouver August 29-31. See below for more info.
Kubernetes is quickly becoming the de facto way to deploy workloads on distributed systems. In this post, I will help you develop a deeper understanding of Kubernetes by revealing some of the principles underpinning its design.
Declarative Over Imperative
As soon as you learn to deploy your first workload (a pod) on the Kubernetes open source orchestration engine, you encounter the first principle of Kubernetes: the Kubernetes API is declarative rather than imperative.
In an imperative API, you directly issue the commands that the server will carry out, e.g. “run container,” “stop container,” and so on. In a declarative API, you declare what you want the system to do, and the system will constantly drive towards that state.
Think of it like manually driving vs setting an autopilot system.
So in Kubernetes, you create an API object (using the CLI or REST API) to represent what you want the system to do. And all the components in the system work to drive towards that state, until the object is deleted.
For example, when you want to schedule a containerized workload instead of issuing a “run container” command, you create an API object, a pod, that describes your desired state:
simple-pod.yaml apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: internal.mycorp.com:5000/mycontainer:1.7.9
$ kubectl create -f simple-pod.yaml pod "nginx" created
This object is persisted on the API server after creation:
$ kubectl get pods NAME READY STATUS RESTARTS AGE nginx 1/1 Running 0 17s
If the container crashes for some reason, the system will restart the container.
To terminate the container, you delete the pod object:
$ kubectl delete -f https://k8s.io/examples/pods/simple-pod.yaml pod "nginx" deleted
Why “declarative over imperative”?
A declarative API makes the system more robust.
In a distributed system any component can fail at any time. When the component recovers it needs to figure out what to do.
With an imperative API, the crashed component may have missed a call while it was down and requires some external component to “catch it up” when it comes back up. But with a declarative API, the components simply look at the current state of the API server when it comes back up to determine what it needs to be doing (“ah, I need to ensure this container is running”).
This is also described as “level-triggered” rather then “edge-triggered”. In an edge triggered system, if the system misses “an event” (“an edge”), the event must be replayed in order for the system to recover. In a level-triggered system, even if the system misses the “event” (maybe because it is down), when it recovers it can look at the current state of the signal and respond accordingly.
So a declarative API makes the Kubernetes system more robust to component failures.
No Hidden Internal APIs
If you look under the hood at how various Kubernetes components work, you encounter the next principle of Kubernetes: the control plane is transparent, as there are no hidden internal APIs.
This means that Kubernetes components interact with each other using the same API that you use to interact with Kubernetes. Combined with our first principle (the Kubernetes API is declarative over imperative), it means that Kubernetes components can only interact with each other by monitoring and modifying the Kubernetes API (instead of calling out to each other directly with instructions on what to do next).
Let’s walk through a simple example to illustrate this. In order to start a containerized workload, you create a pod object on the Kubernetes API server, as demonstrated above.
The Kubernetes scheduler determines the best node for the pod to run on based on available resources. The scheduler does this by monitoring the Kubernetes API server for new pod objects. When a new unscheduled pod is created, the scheduler runs through its algorithm to find the best node for the pod. After the pod has been scheduled (the best node has been selected for the pod), the scheduler does not reach out to tell the selected node to start the pod. Remember, the Kubernetes API is declarative (not imperative) and the internal components use the same API. So instead, the scheduler updates the NodeName field in the pod object to indicate that the pod has been scheduled.
The kubelet (the Kubernetes agent running on the node) monitors the Kubernetes API (just like other Kubernetes components). When the kubelet sees a pod with a NodeName field corresponding to itself, it knows that a pod has been scheduled to it and must be started. Once the kubelet has started the pod, it continues to monitor the state of the containers for the pod and keeps them running as long as the corresponding pod object continues to exist in the API server.
When the pod object is deleted, the Kubelet understands that the container is no longer required, and terminates it.
Why No Hidden Internal APIs?
Having Kubernetes components use the same external API makes Kubernetes composable and extensible.
If for some reason a default component of Kubernetes (the scheduler, for example) is insufficient for your needs, you could turn it off and replace it with your own component that uses the same APIs.
In addition, if there is functionality that you want, that is not yet available, you can easily write a component using the public API to extend Kubernetes functionality.
Meet User Where They Are
The Kubernetes API allows storing of information that may be interesting to workloads. For example, the Kubernetes API can be used to store secrets or config maps. Secrets could be any sensitive data that you wouldn’t want in your container images, including passwords, certificates and other sensitive information. And config maps can contain configuration information that should remain independent of container images, such as application startup and other similar parameters.
Because of the second principle defined above about how there are no hidden internal APIs, your application running on Kubernetes could be modified to fetch the secret or config map information directly from the Kubernetes API Server. But this means that you would need to modify your application to be aware that it is running in Kubernetes.
And this is where the third principle of Kubernetes comes in: meet the user where they are. Meaning Kubernetes should not require an application to be re-rewritten to run on Kubernetes.
Many applications, for example, accept secrets and config info as files or environment variables. Therefore, Kubernetes supports injecting secrets and config maps into pods as files or environment variables. See, for example, the “Using Secrets” section of the secrets documentation.
Why Meet the User Where They Are?
By making design choices that minimize the hurdles for deploying workloads on Kubernetes, Kubernetes make it easy to run existing workloads on Kubernetes without having to rewrite or significantly alter them.
Once stateless workloads are running on Kubernetes, the natural next step is to try and run stateful workloads on Kubernetes. Kubernetes provides a powerful volume plugin system that enables many different types of persistent storage systems to be used with Kubernetes workloads.
A user may, for example, easily request to mount a Google Cloud Persistent Disk into their pod at a specific path:
apiVersion: v1 kind: Pod metadata: name: sleepypod spec: volumes: - name: data gcePersistentDisk: pdName: panda-disk containers: - name: sleepycontainer image: gcr.io/google_containers/busybox command: - sleep - "6000" volumeMounts: - name: data mountPath: /data
When this pod is created, Kubernetes will automatically take care of attaching the specified GCE PD to the node the pod is scheduled to and mount it into the specified container. The container can then write to the path where the GCE PD is mounted to persistent data beyond the lifecycle of the container or pod.
The problem with this approach is that the pod definition (the pod YAML) directly references a Google Cloud Persistent Disk. If this pod was deployed on a non-Google Cloud Kubernetes cluster, it would fail to start because GCE PD would not be available.
This is where another Kubernetes principle comes in: workload definitions should be portable across clusters. A user should be able to use the same workload definition files, such as the same pod yaml, to deploy a workload across different clusters.
Ideally the pod specified above should run even on clusters that don’t have a GCE PD. To make this possible, Kubernetes introduced the PersistentVolumeClaim (PVC) and PersistentVolume (PV) API objects. These objects decouple storage implementation from storage consumption.
The PersistentVolumeClaim object serves as a way for a user to request storage in an implementation agnostic manner. For example, instead of requesting a specific GCE PD, a user may create a PVC object to request 100 GB of ReadWrite storage:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mypvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi
The Kubernetes system either matches this request to a volume from a pool of available disks consisting of PersistentVolume objects or automatically provisions a new volume to fulfill the request. Either way, the objects used to deploy workload against a Kubernetes cluster are portable across cluster implementations.
Why Workload Portability?
This principle of workload portability highlights the core benefit of Kubernetes: In the same way that operating systems freed application developers from worrying about the specifics of the underlying hardware, Kubernetes frees distributed system application developers from the details of the underlying cluster. With Kubernetes, distributed system application developers don’t have to be locked in to a specific cluster environment. Applications deployed against Kubernetes, can easily be deployed to a wide variety of clusters in both on-premises and cloud environments without environment specific changes to the app or deployment scripts, other than the Kubernetes endpoint).
As a result of these principles, Kubernetes is more robust, extensible, portable and easy to migrate to. And that is why Kubernetes is quickly becoming the industry standard for deploying workloads on distributed systems.
The Open Source Summit connects the open source ecosystem under one roof. It covers cornerstone open source technologies; helps ecosystem leaders to navigate open source transformation with the Diversity Empowerment Summit and tracks on business and compliance; and delves into the newest technologies and latest trends touching open source, including networking, cloud-native, edge computing, AI and much more. It is an extraordinary opportunity for cross-pollination between the developers, sysadmins, DevOps professionals and IT architects driving the future of technology.
Feature image via Pixabay.