Kubernetes CI/CD Best Practices

Containerization and Kubernetes have ushered in a new paradigm of consistency in the computing world, giving increased velocity and agility to engineering teams. The convergence provided by a common declarative language to describe application and operational tasks makes Kubernetes a popular platform for running distributed workloads.
After designating the desired state in a declarative YAML, Kubernetes is off to the races resolving and fulfilling the state that has been declared, such as number of replicas of an application. If there is any deviation, Kubernetes will work to resolve the difference between the actual and declared state, for instance a pod/container dying and being re-spun up.
For those deploying to Kubernetes for the first time, the experience can be pretty rapid and anticlimatic. Authoring a minimum deployment YAML, once giving kubectl the command to go [apply], means you’re up and running. When the time comes to make a change, Kubernetes will take advantage of one of its strengths, the rolling update, to make changes incrementally. Watching the rolling update occur, if you are used to platforms where you had to hand-write the rolling update rules, makes Kubernetes seem like a breeze.
Despite all the benefits that Kubernetes has, having good CI/CD (continuous integration/continuous deployment) practices is key. Kubernetes should not magically erase the discipline that your CI/CD journey took you on prior to adopting Kubernetes. Leverage the strengths of Kubernetes to further your CI/CD journey.
Best Practices for CI and Kubernetes
Continuous integration (CI) is the process of build automation. For example, a Java application needs to be built into a JAR, then if headed to Kubernetes, needs to be Dockerized and potentially packed/described in a format such as a Helm chart. In the containerized world, since containers are immutable, any change that is needed will result in a new image — thus your CI process will be called a lot to build and package new images.
Running your continuous integration process on Kubernetes is a prudent move, because building and packaging software can take a lot of compute resources. The modern approach of every commit kicking off a build can be really taxing on infrastructure, especially with containerized builds. Taking advantage of Kubernetes to build and package software is a great use case because modern CI tools focus on creating ephemeral build runners/nodes in Kubernetes. As build requests come in, simply spin up a new instance to create the build artifacts and then spin down the instance when the job is complete.
The continuous integration confidence-building steps that can easily run in an ephemeral container are unit tests, integration tests and security scan steps. Image/container scanning steps can be especially compute-intensive decomposing and validating the Docker layers, similar to running compute-heavy build tasks. Because each build could be introducing new dependencies or new versions of dependencies, running a container scan is important every time you build a new image.
Although, there are items that need to be more long-lasting than an ephemeral container and require more durable storage. An exit step of continuous integration is publishing the created artifacts/packages to an artifact repository and/or manifests to a respective source code management/package manager solution. In the Kubernetes world, this can also be the creation of needed manifests that Kubernetes needs to deploy, such as Helm charts or Kustomize/JSONNET resources. A goal of CI with Kubernetes is to produce an easily deployable artifact, and package/configuration/templating managers allow for that.
Unless highly available/durable storage is available to workloads on your Kubernetes cluster, running your artifact repository as a SaaS or off a K8s cluster makes sense. The Achilles heel is that artifact repositories are storage heavy by design. Having a deployable artifact/manifest is only part of the equation of getting your idea into the hands of the end-user; the next step is the deployment.
Best Practices for CD and Kubernetes
The goal of continuous delivery (CD) is to get your changes into production in a safe manner. Kubernetes has the ability to deploy very quickly, especially if using a recreate strategy where all the pods are killed and replaced, versus incrementally with a rolling strategy. However, this causes downtime. Most of us deal with workloads that have been running, so having downtime would be a detriment. Because of the immediate nature of Kubernetes, resisting the urge to deploy as rapidly as possible seems counterintuitive, but is needed to build confidence.
Starting with the confidence-building exercises applications went through before Kubernetes is still important after you start using Kubernetes. For example, testing and coverage requirements are still required. With Kubernetes, there are possibly more concerns. For portability reasons, it’s not unusual to run conformance tests to validate the Kubernetes infrastructure you are deploying to. Portability is a big draw for leveraging Kubernetes in the first place.
Similar to running continuous integration steps on Kubernetes, running certain continuous delivery steps on Kubernetes itself is prudent. Standing up and then spinning down test infrastructure are easily achievable on a Kubernetes cluster. Depending on the length of the confidence-building steps, there can be a workflow aspect needed for orchestration, which needs to be long-living. The same design principles and decisions of running longstanding/stateful workloads on or off Kubernetes apply for the orchestration.
Leveraging release strategies such as a blue-green or canary release is possible with Kubernetes. While you could do this by hand with several well-crafted Kubernetes manifests and timely applications of these manifests, tooling to cover these release strategies is increasing. Building in proper health checks, such as liveness and readiness probes to enable incremental deployments to continue, are key when architecting for Kubernetes. The safety needs prior to Kubernetes do not go away with the introduction of Kubernetes. As the ecosystem and tooling continue to mature, new paradigms will appear.
Furthering the Journey
With Kubernetes blurring the line between infrastructure and application, a common system design paradox — “can the author be the enforcer” in the system — can play out easily in Kubernetes. Prior to Kubernetes, development engineers deploying directly to production were not the norm. Usually, it was fronted by some sort of CI/CD platform with varying levels of automation and approvals to get to production.
With Kubernetes, depending on how far you take isolation versus having singular clusters, you can easily run the build, confidence-building steps and deployment on and into the same cluster by namespace separation. With modern tooling and with the GitOps movement gaining traction, authors can enforce standards such as drift detection and self-healing of the declarative states of deployments.
Kubernetes has the ability to react in a generic sense — for example, as defined by a controller. When focusing on a deployment, understanding what is normal/acceptable to progress the go/no-go judgment call can be difficult. A good approach is to look at monitoring/observability systems from deviation from a baseline. Orchestrating these tools into judgment calls in an automated fashion — for example, deciding if a roll-back is needed — is possible today on the Harness Platform. As more organizations further their Kubernetes journey, it is wise not to forget the discipline that was around before Kubernetes, while embracing these new paradigms.