The Optimal Kubernetes Cluster Size? Let’s Look at the Data
TL;DR: smaller clusters clearly win, but the why matters.
As Kubernetes emerges as the de facto management platform, we are still figuring out what that means. The challenge is that it exists in a gray zone between a specialized application platform and general purpose infrastructure abstraction. I wanted to understand if one set of use-cases was more common.
A few months ago, I free-form surveyed Twitter about Kubernetes clusters. I was wondering if Kubernetes was emerging as a wholesale replacement for virtualization platforms or simply growing as an application framework. If the former, I’d expect to operators using the platform as a large scale, multitenant system that would provide a common operational baseline for the “Data Center O/S” like Google Borg, VMware vCenter, OpenStack, or Mesosphere have attempted. If the later, I’d expect to see developers using the platform as a single application, lifecycle management system that would provide improved automation for managing continuous deployments. Obviously, both are happening, but was one dominate and why?
Of nearly 30 responses, over half suggested smaller application specific clusters were the right choice. In general, answers reflected on the desire for the separation, isolation and control provided by small clusters. I also expect that small clusters are more likely taking advantage of the underlying VM based IaaS. This aligns with my field anecdotes where users are leveraging K8s on existing IaaS while they gain operational experience.
So, are small clusters better? For most use cases, yes. They minimize the “blast radius” for issues and segregate Kubernetes version and upgrade concerns. Kubernetes management overhead is low so sprawl is tolerable for now. Further, the cloud providers have virtually eliminated the management cost for creating new clusters even if multi-cluster governance is missing. Expect this trend to change as multitenant operational patterns emerge and operational teams show up with governance requirements.
hey @kubernetesio ops… are you planning large multi-tenant clusters (like we'd have for VM managers) or will you setup dedicated clusters per application, team or use-case? I'm seeing a pattern and wanted to hear from others before I blog about it. please RT #TYVM
— Rob Hirschfeld (@zehicle) March 11, 2019
My personal question for this survey was to predict a trend line for bare metal Kubernetes. Since bare metal servers are larger, we’re more likely to see it used for larger multitenant clusters so that users can share resources like VM clouds. This use-case, while less common, is clearly emerging in the survey and suggests that bare metal clusters need more time.
It’s worth also noting an interim approach from the OpenStack community of using Kubernetes as an underlay. Since strategy is building a single function cluster as an installer, it’s really a small cluster approach and does nothing to advance the broader usability of Kubernetes. I’ve observed other independent software vendors (ISVs) attempting a similar strategy of adopting a single application Kubernetes as an installer; however, this approach creates new challenges since the vendors are not ready to own the per-requisite Kubernetes distro or install problem.
My takeaway is that small clusters will ALWAYS win in a poll because they are simpler to setup and operate. That means small customers are the gateway for Kubernetes; however, I don’t think that tells the full story. There’s enough momentum on the large, shared cluster front to predict for Kubernetes to grow into a more common infrastructure underlay. Today, namespaces may have limited adoption but the pool points to them becoming more prevalent. Namespaces allow multiple users in a cluster to limit their scope and visibility in the cluster. It’s not a full multitenant feature but gives some of the same benefits for now.
|Cluster Based / Small||15 (53%)|
|Large MT||7 (25%)|
|Name space||3 (11%)|
|Mix – Use-case||3 (11%)|
Public comments presented (in their original form) with Twitter attributions for reference:
Comments for Small Cluster Sizes
|Michael Gasch||K8s is not vCloud. Many micro services per cluster, multiple environments per cluster but currently splitting when security or team requires differences more splitting between teams than security but both play a role|
|Justin Garrison||I think small clusters is the trend|
|Scott S. Lowe||I think it’s generally trending toward more clusters (instead of fewer), but that doesn’t preclude multi-team usage of one or more clusters.|
|Dave Anderson||Define “large”. I’m planning roughly one cluster per environment*location*system (where system might be “company’s core service”, or “batch jobs”, or “peripheral IT stuff”). Two main reasons I don’t unify more are some distrust of k8s at larger scales, and insufficient security.
(insufficient meaning k8s is not safe for hard multitenancy yet, so any time I want a hard boundary between systems I need a separate cluster)
|Karthik Gaekwad||I’ve seen a trend with our OKE @OracleIaaS usage- folks who use clusters and do it for stateless end up in a multi cluster per dev or devops team generally. Federates the dev team to deploy whatever whenever and fixing issues is easier because it correlates to apps from that team
Bleeding edge customers end up using a cluster for an app for a team- so some might have multiple clusters for a team based on the apps. In both these scenarios they use either an lb or gateway to route traffic and kill clusters behind the scenes to operate. #ClustersNotPets
I have seen a slow transition on this front to the former model because cluster creation is simple.
However the folks that only use k8s in single cluster mode generally have stateful data in a cluster- pvc’s etc. more fragile to transition that on the fly seems to be the general consensus, or just too painful to do.
|Andrew Hatfield||We see orgs adopt many small clusters, project & lifecycle bifurcation. Avg company has ~15 clusters with ~85% of them fewer than 25 nodes. Our customers consolidate deployment, management & utilization overhead with High Density Multi #Kubernetes (HDMK) w/o need #virtualization|
|#!/sh||Smaller multiple clusters over large monolithic ones any day, this will avoid blast radius for unforseen problems|
|Greg Taylor||Reddit has a concept of clusters and cluster groups. The latter is a set of clusters with an identical workload. A cluster is single-AZ, whereas the group contains clusters in multiple AZs. This has saved our bacon a few times now.|
|Chris Ciborowski||Small clusters. Two main reasons. Separation of concerns and to limit the blast radius for unforeseen events. Why @nebulaworks recommends managed k8s while orgs determine their requirements.|
|Matt Jarvis||Small clusters is definitely the emerging pattern, average ~10+ in enterprises. Implementing harder separation through increasing complexity …|
|Bill Mulligan||More smaller clusters Loodse, treat them like cattle not pets|
|Julien Pivotto||We usually want increased security isolation so multiple clusters|
|Justin King||K8S is not a multitenant system. Beyond that, I see k8s deployments resembling More like traditional clusters (one per app) than a singlular, large platform.|
|Ble4Ch =^.^=||At the moment, we use one cluster per team, and one namespace per app. More ops (not that much with openstack actually), but more secure, and we avoid critical scenarios.|
Comments in Favor of Large Cluster Sizes
|Michael Francis||At my company we run two “big” clusters – one for “development & staging” and another for “staging and production” – envs are split by namespace and we use taints to pin certain workloads to certain nodes.|
|Jonathan Tronson||Big clusters, but enough of them to be able to rotate out for maintenace and still maintain availability for all the workloads. The rotation runbook needs to be fine-tuned and easy to follow. Thats the magic sauce.|
|Michael Goodness||Today we’re exclusively multi-tenant, but looking to allow dedicated clusters for large-scale, independent workloads. Multi-tenancy will remain the default.|
|Ricardo Katz, Christopher Adigun||We do have a large cluster, with 10 big baremetal nodes but almost 4k PODs, 1.5k ingresses and so. We’re now moving to clusters per SLA
Q: “How are you dealing with the 110 pod limit per node?” (@Futuredon)
|Max, Michael Gasch||multi-tenant. Better resource usage and no vm(ware) overhead.
Q: “Max can you please elaborate on the “vmware overhead”? Do you mean technical or commercial overhead? Disclosure: I work at vmw and focus on K8s/ vSphere resource management so would like to understand your comment better. Thx”
A: More the commercial overhead. We use a setup with vms mainly for the controll plane and hardware nodes for real workloads.
|Micheal Benedict||Large multi-tenant clusters — 1. Works well for predictable workloads types, follow 80/20 and provide best e2e experience for 80% customers 2. reduce ops overheard of managing the clusters + services/sidecars that make up the compute platform|
|Alex Lovell-Troy||One large cluster to provide a set of multitenant applications. Namespaces and cnis represent convenient management abstractions for application tenants, but the cluster isn’t inherently multitenant.|
Comments for Namespace-Based Clusters
|David Betz||Sounds like namespaces with rbac.|
|yuriy brodskiy||Cluster per environment with namespaces|
|Ax.l||We’re using namespaces to separate teams. But running multiple clusters to separate tenants.|
Comments for Mixed-Sized Clusters
|Alexis Richardson||Kubernetes is a funky new kind of app server, not a private cloud|
|Raunak Jhawar||Define large in the context of features to be packed and bundled in one single cluster. Network policies and namespace isolation works best to have a multitenant setup and both have their merits and demerits.|
|Mark DeNeve||Currently multi-tenant clusters, but we are starting to question if this is the right idea long term due to some apps specific requirements and scaling concerns. No firm change in direction yet, but starting to wonder …|