In the first part of this article, we explored the concepts of pods and services in Kubernetes. Let’s understand how to achieve the scale and reliability with replication controllers. We will also discuss how to bring persistence to cloud-native applications deployed on Kubernetes.
Replication Controller: Scaling and Managing Microservices
If pods are the units, and deployment and services are the abstraction layers, then who tracks the health of the pods? This is where the concept of a replication controller (RC) comes into the picture.
After the pods are deployed, they need to be scaled and tracked. An RC definition has the baseline configuration of the number of pods that should be available at any given point. Kubernetes ensures that the desired configuration is maintained all the time by tracking the number of pods. It can kill a few or launch a few to meet the baseline configuration.
The RC can track the health of pods. If a pod becomes inaccessible, it gets killed, and a new pod is launched. Since an RC essentially inherits the definition of a pod, the YAML or JSON manifest may contain the attributes for the restart policy, container probes, and a health check endpoint.
Kubernetes supports the auto scaling of pods based on CPU utilization, similar to that of EC2 Auto Scaling or GCE Autoscaler. At runtime, the RC can be manipulated to scale the pods automatically, based on a specific threshold of CPU utilization. The maximum and minimum number of pods may also be specified in the same command.
Flat Networking: The Secret Sauce
Networking is one of the complex challenges of containerization. The only way to expose a container to the outside world is through port forwarding from the hosts. But that becomes complex when scaling the containers. Instead of leaving the network configuration and integration to administrators, Kubernetes comes with an integrated networking model that works out of the box.
Each node, service, pod and container gets an IP address. A node’s IP address is assigned by the physical router; combined with the assigned port, it becomes the endpoint to access the external-facing services. Though not routable, the Kubernetes service also gets an IP address. All communication happens without a network address translation (NAT) layer, making the network flat and transparent.
This model brings the following advantages:
- All containers can talk to each other without a NAT.
- All nodes can talk to all pods and containers in the cluster without a NAT.
- Each container sees exactly the same IP address that other containers see.
The best thing about scaling pods through a Replica Set (RS) is that the port mapping is handled by Kubernetes. All pods that belong to a service are exposed through the same port on each node. Even if there is no pod scheduled on a specific node, the request automatically gets forwarded to the appropriate node.
This magic is achieved through the combination of the network proxy called kube-proxy, iptables, and the etcd key-value store. The current state of the cluster is maintained with etcd, which is queried by kube-proxy at runtime. Through the manipulation of iptables on each node, kube-proxy bounces the request to the right destination.
Kube-proxy also handles basic load balancing of services. Service endpoints are managed through environment variables compatible with Docker links. These variables resolve to ports exposed by a service. Kubernetes 1.1 includes an option to use native iptables, which will bring 80 percent reduction in latency. This design eliminates CPU overhead, thus improving efficiency and scalability.
Persistence: Bringing Statefulness to Containers
Containers are ephemeral. They don’t maintain state when they move from one host to the other. For production workloads, persistence is a must. Any useful application has a database backing it.
By default, pods are also ephemeral. They start with a blank slate each time they are resurrected. It’s possible to set up a volume that’s shared by all containers running in the same pod. Identified by the emptyDir moniker, it is similar to Docker volumes, where the host file system is exposed as a directory within the container. emptyDir volumes follow the lifecycle of pods. When a pod is deleted, so is the volume. Since these volumes are specific to the host, they are not available on other nodes.
To bring persistence across pods, irrespective of which nodes they are scheduled on, Kubernetes supports PersistentVolume (PV) and PersistentVolumeClaim (PVC) requests. PVC and PV share the same relationship as the pod and node. When a pod is created, it can be bound to a specific volume through the claim. The PV can be based on a variety of plugins, such as GCE Persistent Disk, Amazon Elastic Block Store (EBS), Network File System (NFS), Internet Small Computer System Interface (iSCSI), GlusterFS and RBD.
The workflow to set up persistence includes configuring the underlying file system or cloud volume, creating the persistent volume, and finally, creating the claim to associate a pod with the volume. This decoupled approach brings a clean separation between pods and volumes, making them extremely portable. The application, container, or pod doesn’t need to know about the actual file system or persistence engine backing it. Some of the filesystems, such as GlusterFS, can be containerized, making the configuration easier and portable.
Containers are not the new concept that many seem to view them as, and Google has been running most of its web-scale workloads in containers for a decade. Many of the lessons they’ve learned and built into Kubernetes can be translated to other orchestration platforms and even to concepts about orchestration in general. It solves some of the hard problems Google Site Reliability Engineers faced almost a decade ago, and it’s influencing the way ahead for many orchestrators.
Most importantly, Kubernetes has become a major focus in the container orchestration ecosystem and exists as a valuable open source platform for other related services. Understanding the current role and function of Kubernetes is necessary for looking at the future of the orchestration market.
Docker is a sponsor of The New Stack.
Feature Image via Pixabay.