Is Cluster API Really the Future of Kubernetes Deployment?
Here at Sidero Labs, we love Cluster API (CAPI). We’ve built a whole bunch of stuff around it. I’m talking about multiple CAPI providers. Not to mention testing Talos Linux with CAPI several times a day. We’re fans. But in this post, we’ll talk about where we think some issues lie and why we chose not to use CAPI for Omni, our new SaaS for Kubernetes deployments on bare metal and edge.
First, what is Cluster API? According to the docs, “Cluster API is a Kubernetes sub-project focused on providing declarative APIs and tooling to simplify provisioning, upgrading, and operating multiple Kubernetes clusters.” This essentially means that Cluster API gives people a way to create and manage Kubernetes clusters that is similar to what they use when managing their application workloads in Kubernetes. This gives a really nice experience to folks who are already Kubernetes pros to manage things in a way they love.
There are cluster API providers for all the cloud providers, VMware, and for bare metal — Sidero Metal is our own CAPI provider for bare metal that does full management of servers (powering them on/off when needed, adding them to clusters, removing and wiping machines, etc). This, according to the CAPI docs, “enables consistent and repeatable cluster deployments across a wide variety of infrastructure environments.”
Sounds dope, so what’s the issue?
The Problems with CAPI
We are lucky enough here at Sidero Labs to have a bunch of passionate users. We have a sizable amount of enterprises running Talos Linux at a huge scale, with hundreds of thousands of cores in their clusters. We also have a lot of SMBs running a few small clusters of just a few nodes, and also many users running home labs. These different use cases result in a bimodal distribution of people’s appetite for something like CAPI. The teams that are running hundreds of bare metal clusters as an internal service like the power that CAPI provides. The smaller teams don’t get the hype. Here are a few reasons why.
- CAPI requires a dedicated “management plane”. This means you need a Kubernetes cluster to manage your Kubernetes clusters. For someone with limited hardware, and just wanting to run a cluster or two, dedicating another cluster and nodes to this purpose is wasteful and expensive.
- It’s hard. In a lot of ways, one must deeply understand the primitives that Cluster API and the specific providers offer. These primitives differ depending on the providers selected and it can lead to confusion for the average user when trying to understand their management plane and provisioning system. This is doubly confusing for teams that are new to Kubernetes altogether. Trying to understand pods, deployments, etc. is hard enough without also bringing in extra mental load.
- CAPI makes some assumptions that don’t work well for bare-metal or edge deployments. In the CAPI world, the upgrade process is “bring up a new node with newer configs, then tear down the old one”. This doesn’t work at all for edge use cases with single-node clusters, nor does it work for cluster nodes with lots of data that need to be replicated if a node is torn down (think a bare metal node with a lot of Ceph storage in locally attached disks). Nor does it work if you don’t have a spare server to bring into your cluster for the rolling upgrade — and leaving an expensive server of each class idle, as well as the management plane servers, imposes a hefty tax (this isn’t an issue if you run your clusters in a cloud provider).
- Troubleshooting is rough. Because of the modularity of Cluster API, it can be tough to figure out where things are breaking. A lot of the orchestration of cluster creation comes down to the status of various resources and whether those resources are done provisioning so the next provider has enough information to provide its resources. Related to the point above, you have to deeply understand the integration between CAPI itself and the infrastructure/bootstrap/control plane providers to even know where to look for failure logs.
All of this leads us to recommend the average user not use CAPI, despite the fact we develop a CAPI provider unless they are deploying many clusters with hundreds of servers.
We started building Omni about nine months ago. The goal of Omni is to provide the absolute smoothest experience for creating Kubernetes clusters and managing them over time. This includes a whole slew of awesome features like easily joining nodes, handling upgrades, user management for clusters that integrates with enterprise identity providers, etc. As one might suspect, we had a lot of discussions about how to architect this system and whether we would base it on Cluster API. The eventual decision was that no, we wouldn’t. The problems mentioned above are some of the reasons why not, along with some other goals for Omni that CAPI just could not meet:
- The dedicated “management plane” was a no-go for some of our users. We have on-prem users who want a fully air-gapped and simple way to deploy, manage and upgrade a cluster, and we wanted to give them that with Omni. I’m talking “ship a rack to the desert and expect it to work” level of airgap. For these users, requiring extra hardware to run an HA management plane cluster is a big waste of resources — or in some cases not possible, as they already have a rack of servers in place.
- Couple that with the fact that the folks operating these devices won’t even know what Kubernetes is. We need the easiest architecture possible for bringing up Omni, and the clusters it will manage. Requiring a Kubernetes cluster with CAPI, a bunch of providers, Omni itself, and then trying to enable bootstrapping and troubleshooting in the field is pretty much a non-starter. Removing Cluster API from the requirements allows us to ship Omni as a single Go binary which makes management straightforward. (Yes, Omni is a SaaS, but you can easily run it yourself.)
- Omni supports truly hybrid Kubernetes clusters via Kubespan. We use this internally and it’s beautiful. We save about $1500/month by hosting our control plane nodes inexpensively in Azure, while our workers are powerful bare metal nodes in Equinix Metal. This capability is powerful and allows you to add nodes from anywhere (any cloud, VMware, bare metal) to the cluster (be careful with how you label these nodes and schedule against them). However, in Cluster API, none of the providers are written in such a way that providers can be mixed within a single cluster.
- One of Omni’s goals was to make Kubernetes at the edge simple — and a big portion of the folks that use Omni are using it to do exactly that. There isn’t a good way for edge deployments to happen with Cluster API. There’s no real way to allow for Preboot Execution Environment (PXE) booting at the edge: these nodes need to follow a “check-in” flow instead. The way it works in Omni is that nodes are booted from an image downloaded from a company’s Omni account. This image is preconfigured to form a point-to-point Wireguard connection back to the Omni account. So as soon as a machine boots, it shows up in Omni as an unallocated machine and allows the user to then attach the machine to an existing cluster or create a new one with it. This just doesn’t really match the Cluster API provisioning flow and we felt like anything we came up with to use in concert with Cluster API would be a bit hacky and wouldn’t give the simplicity we wanted.
- Finally, some Talos Linux capabilities would be limited with Cluster API. A good example of this is upgrades of both Kubernetes and Talos Linux itself. We provide really nice APIs for both of these that allow upgrades in place and provide nice checks around readiness as the upgrade is carried out. Building outside of CAPI allows us to leverage these upgrade APIs we already have, and avoid the rolling updates limitations CAPI imposes.
So all of this is to say that given our design goals, Cluster API was not the right fit for Omni.
“What does this mean for Sidero Metal?” one might ask. And the answer would be a resounding NOTHING! Sidero Labs will still very much be part of the CAPI community and all of our providers will continue to be maintained and improved. While we think there are limitations involving CAPI, as you can see from the points above, CAPI is still good choice for particular use cases — large-scale provisioning of many clusters, without the need for hybrid clusters. For those users, we will continue to recommend that they use and enjoy it.
But for Omni, well, we’re going to continue building it without CAPI, because it makes for a better experience for more users. In fact, we expect Omni will soon be a better experience than CAPI even for those users running hundreds of clusters and servers. Omni makes it simple to mix workers from any platform (unlike CAPI); is (like CAPI) declaratively driven; brings SaaS simplicity, and will soon support IPMI for bare metal, all wrapped up in an elegant UI (and API). This is a platform that is decidedly kick-ass and will simplify the life of almost all users. All the while encouraging them to stop worrying so much about operating clusters and just run their workloads!
If you want to learn more about Omni or CAPI, you are welcome to attend our free virtual user conference, TalosCon on March 21. Nokia will be discussing “Scaling a Private Cloud Managed Kubernetes Service to 100k+ Cores,” using CAPI, Talos Linux and Sidero Metal. We’ll also have several talks about Omni and Kubernetes at the edge.
Hope to see you there!