What does the expression “production-ready” mean? This is the first question you should answer if you want the minimum number of problems with your production workloads.
The answer to this question can be discussed from multiple perspectives: security, maintainability, testing, configurability, stability, upgradability, documentation, etc. There are even some who define production-ready solutions as “production code that is in production.”
Personally, I believe that a production-ready application should address all the elements mentioned above.
When deploying Kubernetes workloads in production, Kubernetes users are choosing the open source project Helm as the de facto option. I can easily understand why: Helm brings several benefits that match the approaches suggested by the experts. It extends the adaptability and customizability of your deployments, eases the testing process, allows history and rollback management and so on.
For more than two years, I have contributed to the project by extending the available catalog with a wide variety of infrastructure applications, as well as reviewing pull requests, adding features and attending to support cases. Based on my experience, there are five elements that developers should pay attention to if they want to create charts that are ready for deployment in production environments.
1) Ensure Security by Limiting the Container Operations: Run Non-Root Containers
To deploy containerized applications, you must limit the number of allowed operations to the minimum required. To ensure this, launch containers with a random user different from root. These are known as non-root containers. This is a mandatory requirement in some Kubernetes-based platforms, such as Red Hat’s OpenShift. Approximately half of the charts in the stable repository already use non-root containers and that number is increasing. If the application allows it, you can go even further and use full read-only filesystems or “scratch” containers (which do not have any underlying base OS).
2) Limit Access to the Cluster: Implement Role-Based Access Control Policies (RBAC)
The core of a Kubernetes cluster is its API server (kube-apiserver). By accessing it, you can obtain details about the current state of the cluster and the workloads deployed on it. Developers are adopting this approach: currently, there are many Kubernetes-aware applications that access the API server for operations like self-discovery. However, having containers with full access to the Kubernetes API server could compromise the cluster. To mitigate this risk, you must ensure that the processes inside the pods can only access the minimum necessary dataset.
That is where Role-Based Access Control Policies come into play. For example, if you deploy an infrastructure application that uses kube-apiserver for self-discovery in the namespace “test”, you may only need to allow “get” and “list” operations for pod objects inside that specific namespace. In the past, users were granting cluster-admin privileges (i.e. privileges to perform all operations within the cluster) to applications like the Helm client Tiller. This practice leads to catastrophe in production.
Don’t forget to make sure that the applications you deploy using charts have the smallest possible set of RBAC privileges.
3) Implement a Proper Testing Process, Especially with Upgrades
Updating a label in a StatefulSet could lead to breaking the helm upgrade command. To deal with this situation, including upgrade tests in your pipeline is a priority task. Obviously, you cannot assume that upgrades between major versions will work without manual intervention – that is what major version bumps are for. However, ensuring that upgrades will work between minor versions is doable.
The addition of features to a chart, which are disabled by default, is another common issue. As these are disabled by default, it is probable that a normal helm install test will not detect any issue.
One example of this situation is ingress rules. These parameters are disabled by default, so you can easily forget about them in your daily testing. I can foresee how several charts in the stable repository will break when the API Group extensions/v1beta — which most Ingress API Objects use — gets deprecated in Kubernetes 1.20. This potential issue can be prevented by increasing the test coverage of your charts with multiple values.yaml files. To aid with this, solutions like kubeval can come in handy.
4) Avoid Rolling Tags at Any Cost
You’re probably already familiar with container images and chances are that you have executed, at least once, a command like docker pull bitnami/redis:latest. This “latest” is an example of a rolling tag (i.e. a tag that will point to different images over time).
Imagine the following scenario: you want to deploy the “bitnami/redis” chart with the latest version of Redis. To do so, you use the “latest” tag so that you know you will have Redis 5.0.5 running in your cluster. Everything works seamlessly when deploying the chart. Now further imagine if, one day in the future, you need to scale your Redis cluster with new pods, which will download the “bitnami/redis:latest” image. And what if now the latest Redis is, for example, 5.0.8? You will have pods of the same Redis cluster running different versions of Redis. To make matters worse, what if Redis 6.0.0 is released? You are sure to end up with a broken Redis cluster.
If you want your deployments to be maintainable and under control, make sure that your charts use immutable images (for example: “bitnami/redis:5.0.5-debian-9-r10″). With this approach, every time you deploy or scale, you know what image you are using. Plus, you will have the guarantee that the deployed image has been tested with that specific version of the chart, something you cannot guarantee when using rolling tags.
5) Monitor Your Deployments
This tip is simple to follow: If you want your workloads to be production-ready you need to have them monitored. Most production-ready charts include support for metrics exporters, so your application status can be observed by tools like Prometheus and Wavefront or suites like BKPR. Also, it is important to ensure that your workloads also integrate with logging stacks like ELK for improving the observability of your containerized applications. The advantages are uncountable: early failure prevention, auditing, trend detection, performance analysis or debugging, among others.
Kubernetes in Production Is a Reality
By following the tips above, you will cover all the basics for Kubernetes production readiness. But there are many more areas that you should explore, such as stability, performance, network, auto-scaling and more. Check out the resources listed below to move your applications forward to production deployments. And, if you want to join me in the search for the true “production-ready” definition, don’t hesitate to contact me.
To learn more about containerized infrastructure and cloud native technologies, consider coming to KubeCon + CloudNativeCon NA, November 18-21 in San Diego.
Feature image by from Pixabay.