TNS
VOXPOP
AI-Aided Coding
On average, how much time do you think you save per week by an AI-powered coding assistant such as GitHub Copilot or JetBrains AI Assistant?
I don’t use an AI coding assistant.
0%
Up to 1 hour/week
0%
1-3 hours
0%
3-5 hours
0%
5-8 hours
0%
More than 8 hours
0%
I don’t save any time, yet!
0%
Observability / Operations / Platform Engineering

Setting Kubernetes Standards with Platform Engineering

Use software catalog scorecards to set Kubernetes quality and security standards, from production readiness to multiple clusters and more.
Mar 10th, 2023 7:39am by
Featued image for: Setting Kubernetes Standards with Platform Engineering

You can’t be an expert at everything. But lack of expertise should not prevent you from doing anything at all, or that it should take you ages to accomplish stuff. Instead, you just need to know what the standards are, to use golden paths and to see the data that matters.

This is what platform engineering is about. It creates reusable elements for developers, such as redeploying image tags, updating auto-scaling-group to provision a new package and more. Those capabilities are accessed through the internal developer portal, and scorecards play an important role there.

Let’s take Kubernetes and developers as an example. It just isn’t right to say that developers aren’t Kubernetes experts and therefore should either learn Kubernetes, roam free while hoping not to break anything or wait for DevOps tickets.

We need to allow developer autonomy, and this can be done through standards. Standards free developers from infrastructure complexity and allow developers to deal with Kubernetes while staying within bounds. Scorecards are where those standards are expressed.

This isn’t only about good platform engineering, it’s also about a good developer experience. Developers deserve well-designed solutions, as well-designed as those the very same developers would offer to non-technical people.

Internal Developer Portals Set Kubernetes Standards

We’ve written about how internal developer portals and their software catalog abstract away Kubernetes complexity. Internal developer portals automatically map all Kubernetes metadata and help developers immediately tell what matters through “whitelisting” data. In this post, we’ll discuss how internal developer portals can bring organizational Kubernetes standards into practice using scorecards.

Scorecards are closely tied to the idea that guardrails in internal developer portals end up defining and driving better engineering quality standards. Defining K8s production readiness or security standards not only helps the individual developer improve, but also engineering quality as a whole.

Showing Kubernetes Data in the Software Catalog

Raw unfiltered Kubernetes data is usually too much for developers. Some of the data is not relevant, and some may be relevant but presented in a way that makes little sense to developers. Internal developer portals contain software catalogs, and they abstract the data so developers can use it. Take a look at this single service view, taken from Port. Only the relevant data is presented.

Let’s examine some specific scorecard examples for Kubernetes.

Production Readiness Scorecards

A production readiness scorecard assesses the production readiness of existing Kubernetes objects, such as deployments or clusters. This helps to ensure that they meet the required standards for performance, reliability and availability in a production environment and can identify any necessary changes or upgrades to maintain or improve their readiness over time, reducing the risk of downtime and ensuring high-quality service for end users.

Scorecards should address the following categories:

  • For containers, metrics should validate container resource configurations such as memory requests and limits, and ensure that liveness and readiness probes are configured for all containers. These configurations are essential for ensuring that containers run efficiently and are able to respond quickly to any issues that may arise.
  • For namespaces, rules should ensure that workloads are not deployed in the default Kubernetes namespace, which can help prevent potential issues that may arise from interfering with Kubernetes system components.
  • For high availability, metrics should require a minimum replica count of two, which is important for ensuring redundancy in case of any node or pod failures. This redundancy is critical for ensuring high availability of the workload.

The following is a sample scorecard for Kubernetes production readiness.

Overall, the standards reflected in the scorecard are designed to ensure that Kubernetes workloads are production-ready and can operate in a reliable, scalable, and efficient manner. The scorecard system is a useful way to track compliance with these standards and ensure that both developers and operations teams are aware of the production-readiness status of their workloads. You can check a live demo version of this scorecard here.

Security Scorecards

Security scorecard standards aim to ensure that security measures are in place to protect sensitive information by validating those secrets to external systems like GitHub, DataDog and Terraform are not exposed. Container deployment compliance standards ensure that containers are configured with read-only root filesystems, do not access the underlying host and do not escalate privileges, all of which are critical for maintaining container security. The tags and labels standards verify that workloads have valid label values and that all container images have a tag version, which is important for organizing and managing workloads efficiently.

Here’s an example of such a scorecard: (a live demo version is here).

Overall, these rules help to ensure that Kubernetes workloads are deployed securely and can operate in a reliable and secure manner.

Resource Usage Scorecards

While developers may not care about resource usage as much, DevOps care quite a lot. You can’t scale well with resource usage problems, and problem resource usage is bound to create more incidents. Scorecards can both alert developers to such problems, so they can fix them on their own and set a quality standard.

Going back to the running service scorecard, we can see that it’s at the bronze level in terms of resource usage, due to issues with CPU usage and memory. This immediately becomes an action item, either through searching the software catalog or by using one of its reports.

ArgoCD Scorecards

This is a scorecard for evaluating the production readiness of ArgoCD workflows and rollouts. The rules ensure that workflows are reliable and that the rollout process is compliant, with checks for error handling, configuration management, revision history and scaling.

Managing Multiple Clusters: Scorecards to the Rescue

Managing multiple Kubernetes clusters can be challenging due to the complexities involved in maintaining consistent configurations across all clusters. With multiple clusters spread across different regions and clouds, it can be difficult to ensure that all clusters are configured consistently and correctly and that all configurations are up to date. Misconfigurations can lead to issues such as service outages and security vulnerabilities, which can have serious consequences.

Additionally, most Kubernetes visualization tools do not provide a unified dashboard that can display all clusters in different regions and clouds, which can make it difficult to monitor the health and performance of the entire Kubernetes environment from a single pane of glass.

Within internal developer portals, it is easy to create one dashboard that displays all Kubernetes clusters and the most crucial data about them. You can see what it looks like in a live demo version:

Let’s create a production readiness scorecard on top of this data:

The Production Readiness scorecard is used to evaluate the readiness of Kubernetes clusters based on a set of standards. These are different rules that can be applied to ensure the stability, availability, and reliability of Kubernetes clusters.

The “K8s version stable” and “Using latest K8s version” standards focus on ensuring that the Kubernetes version used is stable and up to date. For example, if there is an initiative in the organization to move the clusters from Azure to AWS, the rule “Cloud provider is not Azure” can help to track and push this initiative, and the “Using Argo CD” rule promotes automated and standardized deployments.

“Readiness and liveness configured for all Pods” and “CPU and Mem limits configured for all Pods” help to ensure that workloads are healthy and do not exceed the resources available. Finally, “Number of cluster nodes is at least three” ensures redundancy and high availability of workloads. Applying these standards can help organizations maintain a stable, secure and scalable Kubernetes environment.

Here are additional standards that focus on monitoring and visibility into the Kubernetes environment. The “Has K8s dashboard?” standard checks if the Kubernetes dashboard is installed, which provides basic monitoring and visibility into the Kubernetes environment. The “Has Prometheus?” standard checks if Prometheus is being used for monitoring Kubernetes metrics and alerts based on those metrics, which can help to detect and respond to issues quickly.

The “Has Grafana?” standard checks if Grafana is being used for more advanced monitoring and visualization of metrics, which can help to monitor the performance and health of the Kubernetes environment. By applying these standards, organizations can ensure that their Kubernetes environment is effectively monitored and maintained to meet their performance and availability requirements.

Setting Standards for Different Environments and Objects

It’s important to set different standards for different entities, reflecting different stages in the software development life cycle.

For example, in a production environment, it may be critical to ensure that Kubernetes clusters are running on the latest stable version of Kubernetes and that there are sufficient nodes to support the workload. It may also be necessary to ensure that all pods have the appropriate resource limits configured to prevent performance issues. Additionally, monitoring tools such as Prometheus and Grafana may be required to provide advanced monitoring and visualization capabilities.

On the other hand, in a staging environment, the focus may be on testing and validating new features or changes before they are deployed to production. In this case, standards may be more focused on ensuring that the Kubernetes environment is configured correctly, and that deployments are automated and standardized using tools such as Argo CD.

By defining different scorecards for different entities, organizations can tailor their rules and checks to different environments, ensuring that their Kubernetes environment is optimized for the specific needs of each environment. This can help to improve the efficiency of the development and deployment process while also ensuring that the Kubernetes environment is stable, secure, and reliable.

Conclusion

Scorecards do three things. They abstract K8s data in a way that works well for developers, ensure developers follow Kubernetes best practices and guardrails to maximize application reliability and help DevOps drive initiatives. They provide the tools to help developers own security, reliability and cloud spend. They are also valuable when workflows automatically run against scorecards to determine whether a build should fail or not. You can see for yourself in Port’s live demo or sign up for a free version of Port here.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.