The Kubernetes Way: Pods and Services
With containers gaining the attention of enterprises, the focus is slowly shifting to container orchestration. Complex workloads running in production need mature scheduling, orchestration, scaling and management tools. Docker made it extremely easy to manage the lifecycle of a container running within a host operating system (OS). Since containerized workloads run across multiple hosts, we need tools that go beyond managing a single container and single host.
That’s where Docker Datacenter, Mesosphere DC/OS, and Kubernetes have a significant role to play. They let developers and operators treat multiple machines as a single, large entity that can run multiple clusters. Each cluster runs multiple containers that belong to one or more applications. DevOps teams submit the job through the application program interface (API), command line interface (CLI) or specialized tools to the container orchestration engine (COE) which becomes responsible for managing the lifecycle of an application.
The hosted version of COE is delivered as CaaS, Containers as a Service. Examples of CaaS include Google Container Engine, Carina by Rackspace, Amazon EC2 Container Service, Azure Container Service, and Joyent Triton.
Kubernetes, the open source cluster manager, and container orchestration engine is a simplified version of Google’s internal data center management tool called Borg. At KubeCon 2015, the first inaugural Kubernetes conference, the community celebrated the launch of version 1.1 that came with new features.
I wrote an article that compares the COE market landscape with Hadoop’s commercial implementation. There are quite a few startups and established platform vendors trying to capture the enterprise market share for COE. Kubernetes stands out, due to its maturity that comes from Google’s experience of running web-scale workloads. Based on my personal experience, I am attempting to call out the features that make Kubernetes the standard for container orchestration.
Pods: The New Virtual Machine
Containers and microservices have a unique attribute – they run one, and only one, process at a time. While it’s common to see a virtual machine (VM) running the full stack LAMP application, the same application has to be split into at least two containers – one running Apache with PHP and the other running MySQL. If you throw Memcached or Redis into the stack for caching, they need to run on a separate container as well.
This pattern makes deployment challenging. For example, the cache container should be kept close to the web container. When the web tier is scaled out by running additional containers, the cache container also needs to be scaled out. When the request comes to a web container, it checks for the data set within the corresponding cache container; if it is not found, a database query is made to MySQL. This design calls for pairing the web and cache container together and co-locating them within the same host.
If Kubernetes is the new operating system, then a pod is the new process.
The concept of a pod in Kubernetes makes it easy to tag multiple containers that are treated as a single unit of deployment. They are co-located on the same host and share the same resources, such as network, memory and storage of the node. Each pod gets a dedicated IP address that’s shared by all the containers belonging to it. That’s not all – each container running within the same pod gets the same hostname, so that they can be addressed as a unit.
When a pod is scaled out, all the containers within it are scaled as a group. This design makes up for the differences between virtualized apps and containerized apps. While still retaining the concept of running one process per container, we can easily group containers together that are treated as one unit. So, a pod is the new VM in the context of microservices and Kubernetes. Even if there is only one container that needs to be deployed, it has to be packaged as a pod.
Pods manage the separation of concern between development and deployment. While developers focus on their code, operators will decide what goes into a pod. They assemble relevant containers and stitch them through the definition of a pod. This gives ultimate portability, as no special packaging is required for containers. Simply put, a pod is just a manifest of multiple container images managed together.
If Kubernetes is the new operating system, then a pod is the new process. As they become more popular, we will see DevOps teams exchanging pod manifests instead of multiple container images. Helm, from the makers of Deis, is an example of a service acting as a marketplace for Kubernetes pods.
Service: Easily Discoverable Endpoints
One of the key differences between monolithic services and microservices is the way the dependencies are discovered. While monoliths may always refer to a dedicated IP address or a DNS entry, microservices will have to discover the dependency before making a call to it. That’s because the containers and pods may get relocated to any node at runtime. Each time a container or a pod gets resurrected, it gets a new IP address. This makes it extremely hard to keep track of the endpoints. Developers have to advertise explicitly and query for services in discovery backends, such as etcd, Consul, ZooKeeper or SkyDNS. This requires code-level changes for applications to work correctly.
Kubernetes shines bright with its in-built service discovery feature. Services in Kubernetes consistently maintain a well-defined endpoint for pods. These endpoints remain the same, even when the pods are relocated to other nodes or when they get resurrected.
Multiple pods running across multiple nodes of the cluster can be exposed as a service. This is an essential building block of microservices. The service manifest has the right labels and selectors to identify and group multiple pods that act as a microservice.
For example, all the Apache web server pods running on any node of the cluster that matches the label “frontend” will become a part of the service. It’s an abstraction layer that brings multiple pods running across the cluster under one endpoint. The service has an IP address and port combination along with a name. The consumers can refer to a service either by the IP address or the name of the service. This capability makes it extremely flexible in porting legacy applications to containers.
If multiple pods share the same endpoint, how do they evenly receive the traffic? That’s where the load balancing capability of the service comes in. This feature is a key differentiator of Kubernetes when compared to other COEs. Kubernetes has a lightweight internal load balancer that can route traffic to all the participating pods in a service.
Services can be exposed in one of the three forms: internal, external and load balanced.
- Internal: Certain services, such as databases and cache endpoints, don’t need to be exposed. They are only consumed by other pods internal to the application. These services are exposed through an IP address that’s accessible only within the cluster but not to the outside world. Kubernetes obscures the sensitive services by exposing an endpoint that’s available to the internal dependencies. This feature brings an additional layer to security by hiding the private pods from the public.
- External: Services running web servers or publicly accessible pods are exposed through an external endpoint. These endpoints are available on each node through a specific port.
- Load balanced: In scenarios where the cloud provider offers an external load balancer, a service can be wired with that. For example, the pods might receive traffic via an elastic load balancer (ELB) or the HTTP load balancer of Google Compute Engine (GCE). This feature enables integrating a third-party load balancer with the Kubernetes service.
Kubernetes does the heavy lifting by taking over the responsibility of discovery and load balancing of microservices. It relieves DevOps from dealing with complex plumbing required at the infrastructure level. Developers can focus on their code with a standard convention of using hostnames or environment variables without worrying about additional code required for registering and discovering services.