What Kubernetes Needs to Run in Production
It’s no longer always the case that a production environment must be physically, or even virtually, barricaded from a development environment. That said, the Kubernetes open source container orchestration engine will have requirements for production above and beyond what’s needed in a development or a testing environment. Here are the key factors you should evaluate, and how they differ.
To begin this discussion, we’ll focus on deploying Kubernetes in such a way that it can survive a partial subsystem failure. This quality of survivability is often referred to as resilience (and sometimes as resiliency). Remember, Kubernetes is not the whole container system, but an entity deployed in a data center. It’s subject to the same operating conditions and perennial threats as any other data center software. When we think about maintaining availability in a Kubernetes environment, we tend to divide it into three distinct aspects:
- Infrastructure availability determines whether the base infrastructure that Kubernetes runs on is highly available and distributed regardless of the environment.
- Kubernetes availability ensures that the environment never has a single point of failure, and that the components of the orchestration system are properly and proportionally distributed along the same lines as the infrastructure.
- Application availability focuses on the application and ensuring that all pods and containers are achieving the correct level of availability. Based on the needs of the application, the application components will be loosely distributed across the Kubernetes environment, which itself is distributed across the infrastructure.
Since Kubernetes may be installed on several public and private Infrastructure as a Service (IaaS) platforms, the capabilities and properties of those platforms collectively frame a Kubernetes installation’s inherent resilience. For example, what is the level of partitioning an IaaS may provide?
Your public cloud’s IaaS platform is made up of subdivided deployment locations. Amazon Web Services calls these locations “regions” and the subdivisions “availability zones,” and other major cloud providers have followed suit. Kubernetes may be deployed across those zones. But by default, Kubernetes always deploys a single master, regardless of how many zones you have.
The most common, and typically most recommended, approach for handling this is with a multimaster Kubernetes environment. In this scenario, only one of the masters is “elected” as the live master, so if it goes down, a new master is elected. This new master is then used for handling all scheduling and management of the Kubernetes environment.
Kubernetes clusters are essentially sets of associated worker nodes. Since the generator of your business value is running on these clusters, it makes sense that each cluster is deployed across as many logical partitions as are available.
Please note that it’s considered best practice to dedicate your master node to managing your Kubernetes environment, and use worker nodes exclusively to host and run all the pods. This allows for a full separation of concerns between Kubernetes and your applications.
With a private IaaS, you can still partition servers into zones. While each private IaaS configuration may be as unique as a snowflake, there are some concepts common to all of them. Whether your IaaS is public or private, your storage and networking partitioning should correspond to your compute partitioning. In the data center, you would have redundant storage area network (SAN) capabilities and redundant top of rack and end of rack networking capabilities that mirror the redundancy in your blade enclosures. This way, you can provision redundant network, power, compute and storage partitions. With a public IaaS, each service provider handles storage and networking capabilities differently, though most build in partition redundancy. As for networking, you should make use of your IaaS’ load balancer for exposing Kubernetes service endpoints.
Ultimately, all this low-level resilience from your IaaS allows Kubernetes to support replica sets, which ensure that specified numbers of pod replicas are consistently deployed across partitions. This way, Kubernetes can automatically handle a partition failure. Recent editions of Kubernetes have implemented a kind of super-construct for replica sets called deployments, each of which involves instantiating what’s called a Deployment object. This object instructs a deployment controller to manage replica sets automatically, refreshing them at regular intervals so that they are as close as possible to the desired configuration that you declare.
This way, your pods maybe rebalanced across the surviving zones in an outage event, without an operator having to perform special actions. To be more direct about it, you can specify the failure-domain for a node as an explicit annotation, when you declare the node. You can tailor the scheduler to your specific needs, and ensure that it will replicate across nodes or across your specific infrastructure.
One of the ways that we have accomplished this in the past is through nodeAffinity (formally nodeSelector, changed in Kubernetes 1.6), which tells the scheduler exactly which nodes should be impacted when a scheduled event occurs. This ensures that application-level availability aligns with the infrastructure partitioning, thus removing or at least reducing downtime should the infrastructure fail.
(Editor’s note: This section has been temporarily removed pending a further review of the material for accuracy).
Containers are best suited for 12-factor stateless applications, where services respond to requests, then blink out of existence leaving nothing behind. This is, as you can imagine, a very new — and for many enterprises, foreign — way of presenting applications. And it simply cannot translate to the storage-heavy models of existing applications whose life cycles have yet to be fully amortized.
Kubernetes supports stateful and stateless applications simultaneously, by adding support for persistent volumes — specifically, the PersistentVolume subsystem — along with additional support for dynamic provisioning. This allows support for legacy and other stateful applications on Kubernetes clusters, thereby making Kubernetes an attractive candidate for enterprises. The orchestrator offers support for volumes that are attached to pods, as well as external persistent volumes.
A volume in Kubernetes is different from the volume concept in Docker. While a Docker volume is attached to the container, a Kubernetes volume is related to a pod. So even if a container inside the pod goes down, the volume stays on. However, a Kubernetes volume is still ephemeral, meaning that it will be terminated when the pod is terminated. In embracing Kubernetes, you have to keep these two equivalently-named concepts distinct in your mind.
Kubernetes offers support for stateful applications by abstracting away storage and giving an API for users to consume and administrators to manage storage. A volume is externalized in Kubernetes by means of the PersistentVolume subsystem, and PersistentVolumeClaim is how pods consume PersistentVolume resources in a seamless manner. Specifically, it creates a kind of tie between the containers inside pods, and a class of volume that will survive beyond the life cycles of those pods.
Quite simply put, very few people should have direct access to a Kubernetes node.
Kubernetes offers logical file system mounting options for various storage offerings from cloud vendors — for instance, Network File System (NFS), GlusterFS, Ceph, Azure SMB and Quobytes. The Kubernetes Storage Special Interest Group (SIG) is currently working on ways to make adding support for new storage services and systems easier, by externalizing the plugin support.
You should make sure the storage needs of your existing applications are supported before you begin the process of transitioning them into a distributed systems environment like Kubernetes.
Containerization changes the entire concept of security in the data center. Orchestration changes it yet again. So it’s fair to say that you are probably already contending with, and perhaps managing, your data center’s transition to a new security model. The insertion of Kubernetes in this picture affects what that model may be changing to.
Security topics pertaining to workloads running in the context of a Kubernetes environment fall into three categories:
- Application level: session modeling, data encryption, throttling.
- Platform level: key management, data storage, access restrictions, distributed denial of service (DDoS) protection.
- Environment security: Kubernetes Access, Node Access, Master Node Configuration, encrypted key/value storage in etcd.
Quite simply put, very few people should have direct access to a Kubernetes node. Every time you grant such access to someone, you are incurring risk. Instead of accessing the host directly, users can run kubectl exec to get visibility into containers and their environments without introducing a security vulnerability.
If you need fine-grained, container-level security, operators can fine-tune access via Kubernetes’ authorization plugins. These plugins enable restricting specific users’ access to certain APIs and preventing accidental changes to system properties, such as scheduler options.
Kubernetes allows for applications to be installed in different namespaces. This creates a lightweight boundary between applications, and helps to appropriately compartmentalize application teams. Such a boundary serves to prevent accidental deployment to the wrong pod or cluster, and enables the establishment of firm resource constraints (quotas) on each namespace. For example, you can give each app composed of microservices its own namespace in production. This allows for a single Kubernetes environment which hosts many applications, without the risk of collision between the applications (via namespaces).
A namespace acts as a logical boundary for workloads — it limits the breadth of an application to just that part of the system to which the same names apply. More technically speaking, namespaces represent multiple virtual clusters backed by the same physical cluster. This makes containerization possible in the first place.
A resource quota provides constraints that limit aggregate resource consumption per namespace. Your enterprise IT team should create multiple namespaces, each with its own quota policy. This way, each policy may restrict the amounts of CPU, memory, and storage resources that a workload may consume.
In addition, a resource quota can protect against a single pod scaling to the point where it eats up all of your resources. Without a resource quota, it would be pretty easy for a malicious source to denial-of-service (DoS) attack the application. At the highest level, a resource quota may be set at the pod, namespace or cluster level to monitor CPU, memory, storage space or requests. Your specific needs will be determined by your applications and available resources, though it is imperative that you establish resource quotas.
Running different enterprise applications on the same Kubernetes cluster creates a risk of one compromised application “attacking” a neighboring application — not by hacking, but instead by passively interfering with the flow of traffic in the network they share. Network segmentation ensures that containers may communicate only with other containers explicitly permitted by policy. By creating subnets with firewall rules, administrators can achieve the right level of isolation for each workload running in a Kubernetes cluster.
Segmenting networks ensures that no two systems can share resources or call the wrong pods, which can be a very powerful way to secure applications. But be careful: it’s very easy to fall victim to temptation and overdo your segmentation policies, leading to an environment that’s not only more difficult to manage, but produces a false sense of security among the development team.
Kubernetes has built-in storage capability for secrets — discrete values that are typically pertinent to an application as it’s running but should not be shared with other applications, or with anyone else. When you define something as a secret, it is stored independently from the pod, ensuring that it is encrypted at rest. A pod is only granted access to secrets defined in its pod definition file. On container startup, secrets are provided to the container so they may be used appropriately.
Kubernetes provides a mechanism for making secrets available directly to a container by way of an environment variable. Specifically, you define the secretKeyRef variable in your pod definition file.
As a supplement to Kubernetes secrets, you may choose to use another secret management tool, such as Vault. An application dedicated to key management typically offers more robust features, such as key rotation.
Software engineering company Kenzan, for instance, has been hesitant to use the secretKeyRef approach, noting it’s not too many steps removed from writing secrets directly into a properties file whose name and location everyone knows. Secrets between applications aren’t necessarily secrets between people, or about people, but it’s never a good idea to offer directions and a map to something intended not to be seen.
With the exception of network segmentation, there is little security between two pods. For many data centers this is a serious issue because a bad actor could end up accessing the cluster, or at the very least triggering a networking error, and then all services would be exposed.
You may find it best to enact some level of internal security control to validate that pod A can call pod B, assuming these interactions are permitted by policy. Something like a JSON Web Token (JWT) claim (a JSON-based assertion used to help validate authenticity) with a short-lived, signed key rotation tends to work nicely here. This allows security needs to be very granular for each specific endpoint by providing roles within the JWT and also ensures frequent JWT private key rotation (we recommend every minute or two). Using this model, anyone who did manage to penetrate through defenses and gain access to the network would still be unable to place a call successfully without the signed JWT. And a signed JWT is only valid for a minute or two.
You’ve just seen a number of the key factors that pertain to the everyday care and maintenance of Kubernetes, as well as the network with which it is designed to interoperate. You and your organization would probably rather not make a serious commitment to a deployment decision that affects all aspects of application staging, monitoring and operations without having had more than a glimpse of what to prepare for.