Avoiding the Pitfalls of Multitenancy in Kubernetes
In DevOps, we discuss how clusters should be treated like cattle, not pets. This becomes more impactful when taken to the next level and treating them like a single hydra-amalgamation of all your pets. That’s what happens when you create a multitenant cluster. Every group with their own disparate needs and their own requirements are all grouped into one piece of infrastructure. Multitenancy creates an operational monolith — and there are certainly plenty of other resources that will tell you why monoliths are bad both in development and operations for complexity and availability. However, I will give you a building analogy and share a personal story from my experience running and in this case failing at hosting a couple of highly visible websites in a multitenant environment.
Let’s take the analogy of modern skyscrapers. They are strong but not indestructible, much like our clusters and the applications built on them to take advantage of high availability. Some of these have one tenant, some have many, and others have a few. That difference creates different needs for management and coordination. Security, doors, walls, square footage limits all must be coordinated by the building for multitenancy. This, on the other hand, can be handled internally for a single tenant building, only requiring the building to protect external access and maybe the top floor penthouse.
Multitenant buildings, on the other hand, must invest in more internal security and coordination between tenants for events that affect the building. The costs of doing these enhanced measures can be quite high both in terms of manpower and equipment. The situation is much the same with Kubernetes clusters that are multitenanted. They require internal traffic monitoring, resource limits, and admission controllers that must all be configured to ensure restricted access as well as non-interference. Want to change the building security system or the cluster ingress controller? In a multitenant environment, this must all be coordinated. Having worked in such buildings, I know it can be done well, but with lots of effort.
We failed because I had not taken the time to ensure enough isolation. This same scenario could apply to any two Kubernetes hosted applications.
If you’re building single-tenant clusters for each one of your easily divisible organizational units or projects, then you are at a natural level of division and freedom. This simplifies the management burden on your operations and DevOps teams. While Rancher can manage either scenario well, when you start deviating away from natural organizational divisions, you are forced to make trade-offs and restrictions. More than one tenant in a cluster means more security and more isolation, which all costs more time to test and implement. This costs money and leads to delays. The question to ask is: can we manage that complexity effectively either alone or with outside help? Stand by, the shameless plug is coming later… If the answer is still yes, then continue.
I have gotten this wrong before. In my agency development days, before Kubernetes even, in the first decade of the millennium, I managed the hosting for two of our biggest A-List music clients at a rapidly growing music agency, BubbleUp. There, we ran version-separated PHP auto-scaling clusters in Amazon Web Services since 2009. The first tech-savvy artist shared a store link on Twitter to his 5 million fans, at the time. The incoming traffic brought our e-commerce cluster, which was shared with multiple stores and brought the second artist’s brand to a crawl for the approximately 10 minutes our application servers scrambled to spin up and become available. Both stores went down briefly.
The stores weren’t down for long, but it was the longest 10 minutes of my life. It was also long enough for our office to get a call from the artists’ management companies. In the post mortem, my CEO and mentor, Coleman Sisson, asked me a question I carry to this day to make these decisions. I paraphrase, but it was essentially this: “is it ok to tell one of our biggest clients that his stores went down because another artist shared a link?” I think we started to split a new cluster for the second artist later that day. We failed because I had not taken the time to ensure enough isolation. This same scenario could apply to any two Kubernetes hosted applications. Oh, and by the way, the answer was “no.”
In conclusion, even though Kubernetes does support multitenancy, the act of securing a cluster to support it safely requires experienced engineers and lots of time, while not taking the time to do it right will have disastrous consequences. Evaluate your multitenancy strategy to include those costs. Then, if this is the path you want to take, make sure to take the time and develop the expertise and processes required to do it robustly and securely. If you would like assistance, SUSE Cloud Consulting is here to help with your most challenging use-cases.