What Does It Take to Manage Hundreds of Kubernetes Clusters?
- Deploying a production-ready Kubernetes cluster with all the required dependencies for going into production takes days.
- Managing a float of Kubernetes clusters is very hard if you do not automate the process.
- Managing a float of Kubernetes clusters across several cluster providers is even harder.
- Upgrading a Kubernetes cluster requires intensive tests to make sure changes do not break anything. It’s not just pushing an “upgrade” button like often suggested.
Managing one Kubernetes cluster is hard, but what about managing hundreds of Kubernetes clusters worldwide?
Hundreds of Kubernetes clusters with multiple nodes, services, applications and load balancers. This is what we do at Qovery; we manage hundreds of production Kubernetes clusters worldwide to help more than 16,000 developers deploy their apps on AWS.
But what does it take to run and manage hundreds of Kubernetes clusters? This is what I will share with you in this article.
With Qovery, every user can get started to deploy their apps on AWS in a few seconds. The goal is to turn AWS into a smooth developer experience. Typically, our users come from Heroku and want to jump into AWS. They want to have the simplicity of Heroku and the flexibility of AWS. This is where using Qovery comes in.
To succeed in that path, Qovery uses EKS (AWS-managed Kubernetes) to run and scale stateless applications. Every single user has at least one or more Kubernetes clusters. The promise of Qovery is to get a prod-ready Kubernetes cluster, meaning the Kubernetes deployment, run and management is on us. The piece of software responsible to manage the clusters is called the Qovery Engine and is open source.
Deploying Prod-Ready Kubernetes Clusters
To automate the deployment of Kubernetes on AWS, we have created an open source deployment engine, an application written in Rust. Basically, it initializes the virtual private cloud (VPC) for Kubernetes, the ingress, the auto-scaler, Loki, S3 to store the Kubernetes logs and lastly Kubernetes. The Qovery engine uses Terraform, Helm and the AWS API. Curious to take a look? All the files are available here. It takes 30 minutes from zero to a prod-ready Kubernetes cluster on AWS instead of weeks.
The first time you use Qovery on an AWS account, a VPC and an EKS cluster are set up. Once fully set up, the Qovery Engine is installed and connects to the Qovery Control Plane to receive the application deployment instructions.
Running and managing Kubernetes is simplified (no etcd, master nodes, network overlay to manage) since Qovery relies on the managed Kubernetes offer from AWS (EKS). AWS guarantees that the cluster is always operational by taking care of the master nodes responsible for the integrity of the complete cluster.
However, the added value of Qovery here is simplifying app deployment and making sure those apps are running correctly. All the information of those applications and the cluster itself are reported in real time to the user if something goes wrong. This is handled by the Qovery Engine and our Qovery Agent.
Keeping Kubernetes Up to Date
Every 10 weeks a new version of Kubernetes is released. At this pace, staying up to date is challenging. Especially when breaking changes happen. Testing the upgrade on a staging cluster before doing it in production is mandatory but requires a fair amount of time.
Even for us. At Qovery, a dedicated team is in charge of managing the upgrade of the Kubernetes clusters of our users. The good news is that once we have made it once, it works (almost) the same for every cluster.
For security reasons, the Qovery Engine connects to the Qovery Control Plane and pulls cluster update instructions. The Qovery Engine takes care of the rolling update on all the Kubernetes worker nodes with the newer version and the associated dependencies (Loki, ingresses, etc.). The Qovery Engine guarantees that the cluster is fully operational and ready to receive new app deployments.
In this article, we have seen how a float of hundreds of Kubernetes clusters is managed by the Qovery Engine, an open source library written in Rust. Deploying, running and updating a float of Kubernetes clusters takes a huge amount of time and needs to be automated to guarantee their uptime.