How to Manage the Hidden Complexities of Kubernetes
The use of containers across development, testing and production environments has skyrocketed in the last few years, in part because of new tools that make it easier to deploy, scale and manage cloud native applications. According to a survey by the Cloud Native Computing Foundation, developers have more than 100 of these tools to choose from yet 89% are using some form of Kubernetes.
Google’s open source container orchestration project certainly has its advantages — automation being one of the biggest. Kubernetes takes the intensive, time-consuming manual labor out of container management, by automating the deployment and distribution of application services, the allocation of resources for application services, application network configurations and even load balancing across distributed infrastructure. As a result, lean teams can efficiently deploy and manage a significant amount of infrastructure and teams of all sizes can do so with greater operational velocity.
Kubernetes runs containers nearly anywhere because it creates abstraction at the infrastructure layer. This improves scalability and simplifies sharing and decision-making for teams working across multiple platforms and resources — from cloud to virtual machines to bare metal. DevOps teams can focus on building applications instead of managing the underlying infrastructure. It also has built-in mechanisms for resilience, including features like high availability, automated failover, and the ability to decommission, replicate and spin up new containers and services to essentially self-heal.
Because Kubernetes is open source, there are ample resources and documentation, and teams can integrate easily with a number of other tools in the ecosystem. With all of these benefits, it’s the silver bullet we’ve all been waiting for — or is it?
Kubernetes Comes with Trade-offs
While known for its features that simplify development, the orchestration platform itself can be extremely difficult to implement and manage. Many important functions and configurations require significant time and understanding to set up. Kubernetes “out of the box” is essentially a cluster with a set of nodes to run containerized applications.
Critical components (like DNS, user dashboard, and monitoring) are add-ons, and nearly all features require integrations through application programming interfaces. Managing this aspect alone can increase complexity exponentially because, without appropriate prioritization, surges in API calls can block important requests or crash the API server entirely.
There are also numerous ways to implement and use Kubernetes, such as how to access clusters (service discovery) and load balancing between pods. This becomes a problem for large distributed organizations with multiple teams sharing resources. The vast array of options and approaches to managing the platform can be overwhelming, creating additional complexity and discord.
Similarly, the automation features can create problems quickly for teams that do not take the time to learn and appropriately configure their Kubernetes instance. Teams need to understand and plan for how it builds and auto-scales, how it schedules resources and how it will auto-fail when something goes wrong. The automation only saves time when it is functioning correctly, but misconfigured functions can rapidly become unmanageable and cause issues with application functionality and performance.
Because Kubernetes is about automation, users do not write instructions or commands to tell it what to do. Instead, they describe their desired state, and Kubernetes decides what to do and how to achieve.
For instance, Kubernetes implements CPU limits with CFS’s quota mechanism to run multiple tasks in parallel, equally sharing CPU. Kernel throttling is used to limit CPU cycles when an application goes beyond a set limit, which is set based on time (e.g., 100 ms) and quota. If not configured appropriately — especially in a more complex environment — it will result in high throttle rates that lead to readiness probe failures, container stalls and network timeouts. In other words, increased application errors.
One of the platform’s key benefits can also serve as its Achilles heel. Because Kubernetes is about automation, users do not write instructions or commands to tell it what to do. Instead, they describe their desired state, and Kubernetes decides what to do and how to achieve this state. If that state involves four containers running with a specific amount of allocated memory, Kubernetes will launch those containers and then monitor them. If one fails, it will spin up another to replace it. This ability to self-heal will keep applications up and running initially, but it can conceal growing problems. Health metrics might show everything is functioning normally, but the application could be throwing errors every hour. It is very easy to have components of Kubernetes be unknowingly degraded because of a lack of visibility into the code.
Here are some best practices to help you manage complexity in Kubernetes.
Consider Outsourcing Management
IT organizations should determine the amount of time and resources they are willing to devote to Kubernetes. Managing and troubleshooting require deep domain expertise and for organizations that do not already have this talent in-house, the learning curve can be significant. Outsourcing offers faster deployment, access to many additional tools and resources, ongoing management of security patches, upgrades, troubleshooting and maintenance. Many cloud providers and independent vendors now offer Kubernetes management as a service, which can alleviate many of the burdens and free up development teams to focus on building better applications.
For those organizations that want (or need) to own Kubernetes management, here are a few additional recommendations:
Look under the Hood
Despite the hype, Kubernetes isn’t a magical solution. It is a connected collection of functions and features. Taking the time to understand the individual components can be extremely valuable.
Map It out
Companies can proactively reduce complexity by bringing teams together to map out all applications; where they reside, how they function, how they connect to each other and how to best approach migration. Determine who is responsible for management and, most importantly, who is responsible for it when it breaks.
Develop Proprietary Documentation
Complexity can be managed more easily when teams discuss and agree on how Kubernetes will be implemented and used, as well as how to manage problems that may arise.
Monitor Systems and Code
Monitor application health, as well as containers and back-end systems. A comprehensive approach to monitoring will provide greater visibility into issues and events, so that problems can be identified and remediated before there is any significant impact on users. It will also help teams understand the impact and scope of issues to identify patterns and changes over time.