Private vs. Public Cloud: How Kubernetes Shifts the Balance
One thing is certain: If you’re an enterprise, you either are using the cloud or soon will be in some capacity. Odds are that your organization is well on its way to being a hybrid cloud user. A 2021 Statista survey of 750 enterprises found that 82% had adopted a hybrid cloud computing strategy.
This hybrid adoption statistic does not mean, however, that the private cloud vs. public cloud question is neatly decided simply by adopting a mix of both cloud flavors. Rather, it means that deployment architectures are complex, evolve quickly and that it makes no sense to decide cloud architecture in a “one-and-done” fashion.
In fact, we’d argue that right now things are happening in the cloud technology stack that unleash new potentials. After years of global trending toward greater public cloud adoption, these new dynamics are changing cloud economics, revitalizing the idea of private cloud, offering new strategic options for managing public cloud and enabling new takes on bare-metal models such as edge, IoT and “bare-metal cloud” offerings.
Cloud Economics: v1.0
The cloud journey started because on-premise data centers are costly to build and maintain. Servers are expensive and become obsolete. They need facilities, power, cooling, internal networking, WAN connections and backbone connectivity from multiple service providers. Data centers have architects to segregate failure domains and distribute redundant resources, enabling resilience. Staff members maintain, operate, secure and protect them. Full-time minders analyze their performance, utilization efficiency and other characteristics before optimizing them, usually slowly and expensively, to reflect technical and business needs. Commit to this, and you’re still locked into specific locations, carriers, labor pools, and likely trapped in an elasticity paradox — paying for unused capacity, but unable to scale quickly to meet unanticipated demand.
That’s only the start. To achieve low-level operational efficiency, many users go on to implement a “standard” data center stack with a licensed OS (I’m looking at you, Red Hat), add tooling for server provisioning and license management, and end up locking themselves into that solution set. Then, to achieve speed and enable developer and business-unit self-service, they add a similarly contrived and cost-based private cloud infrastructure-as-a-service solution (I’m looking at you, VMware). This gains some efficiencies, but also new requirements for skilling up, new lock-in factors and lots of new costs.
Compared with this, of course, public cloud looks great. Everything is OpEx. There’s no physical plant and none of the direct overheads or headcount required to run a physical data center. You never have to deal with carriers. You can start small and grow big fast, or change scale as often as needed. You can put workloads anywhere your provider has a region, and you can provision and configure whatever you want using one set of web UIs and/or APIs. It’s like skipping the whole “physical infra” step — all that complexity and limitation — and going right to the “elastic compute cloud-as-a-service” part.
Cue the Cloud Complexity
Except nothing comes for free. Everyone who uses public clouds knows the shock of receiving a monthly bill that feels outlandish. And, adding insult to injury, a bill that’s impossible to understand or justify in detail — certainly not in terms that map to normal data center expectations, such as, “OK, I get why I pay for north/south traffic, but east/west traffic?” Once shocked, cloud pricing complexity can also make it hard to develop and implement cost-mitigation strategies. Even picking low-hanging fruit, such as turn off all the VMs you aren’t using, can feel difficult.
Meanwhile, few organizations move into public cloud without taking some legacy thinking along. To gain agility and elasticity, many just sacrifice consistency. They’ve spent years building VMware-centric infra-as-code in-house, and now they hire CloudFormation-certified folks to create and maintain a new codebase for the AWS environment. The cost of doing this is massive and ongoing. It can introduce security holes, provide new opportunities for human error and make everything harder by a factor of N, where N is the number of platforms you need to support.
In other cases, organizations may try to preserve consistency by doubling down, taking their “this is our standard OS” and “this is our standard IaaS” operating models with them when they venture into public cloud, maintaining a legacy cost structure, though metered by instance-hours. Or, they may veer in the opposite direction, buying into private cloud solutions driven from public cloud Web UIs and APIs, gaining consistency and reducing the need for diverse skills, but at the cost of greater lock-in.
Cloud Economics: Version 2.0
Put Kubernetes into this mix, however, and it changes things:
- Kubernetes doesn’t, or shouldn’t, really care about much beyond the Linux kernel. Adapting any real or virtual box and Linux variant to run a well-designed Kubernetes, these days, usually requires no effort beyond appropriate resource provisioning (box needs enough cores/RAM/storage/network+security to get the job done safely.) And perhaps, but less and less often, requiring a little Linux-level fiddling (for example, to create a memory cgroup on startup, required when running Ubuntu 20.04+ on ARM64).
- Consistent Kubernetes enables radical workload and configuration portability. Unless workloads have specific requirements for hardware — most simply don’t — you’d expect a workload developed on Linux Spin X to run on any Linux X box, anywhere. The same is, or should be, true of both containers and configurations deployed on logically self-similar Kubernetes clusters. The fact that it often isn’t is because people build or acquire divergent clusters and then try to weave them into larger architectures to accelerate software delivery, using CI automation and other tools, or doing manual labor, to paper over the deltas.
- Except in extreme situations, Kubernetes keeps complicated and tricky applications running like gangbusters, without urgent need for human attention. That is, after all, what it was engineered to do. As long as your properly designed workload isn’t narrowly locked to some scarce resource, you can kick plugs out of walls and let SSDs fill up and customers won’t even notice. You can also run rolling updates on everything in your stack without taking applications or operations offline.
- Kubernetes lets you add custom functionality to observe and maintain complex system states, and converge systems on new states in response to changes in declarative configurations. These abilities can work both upward and downward from where Kubernetes resides in the stack. So, Kubernetes operators and similar constructs can be used to manage applications (upwards) and underlying infrastructure (downwards) such as physical and virtual hosts, virtual networks and cloud services.
- Kubernetes can be configured and extended to provide applications with a host of services. Storage, DNS, whatever applications need, Kubernetes can provide and orchestrate services on behalf of workloads. Again, that’s what it was engineered to do.
- Kubernetes scales like crazy, very fast. Not counting time required to spin up nodes, a modern Kubernetes distro can be scaled up to basically any number of nodes in minutes.
- Kubernetes can be very granular. A modern Kubernetes distro should let a developer start a controller and worker in a single desktop container or on a pair of Raspberry Pis. The same distro should be equally capable of running a 50/100/500-node cluster. As most long-term users have discovered, the organic “lots of clusters vs. one huge cluster” model works best for many reasons (security, limited blast radius, etc.). Real benefits, though, accrue only when “lots of little clusters” can all be self-consistent and centrally lifecycle-managed.
Kubernetes’ superpowers change the game in important ways. If you can deploy, scale and manage the lifecycle of Kubernetes, you can use it to pave over public and private cloud infrastructures, optimize costs and overheads aggressively, and treat everything underneath Kubernetes as a commodity.
Reduced cloud operations costs. Early experiments by users show that this approach quickly reduces TCO for private cloud operations. If you can keep the power and A/C running, a Kubernetes-centric infrastructure largely minds itself, with operators looking “up” to optimize workloads and “down” to manage hosts, operating systems and other support services and layers. This makes running private clouds, particularly as they get bigger, much less expensive and risky.
Reduced host/guest OS licensing and support costs. Kubernetes cares a bit about the Linux kernel and CPU, but not much else. A Kubernetes-centric infrastructure doesn’t really benefit from a costly “enterprise” Linux underneath. Features such as specialized encryption (FIPS, for instance) provided by enterprise Linux spins are swiftly moving up into container runtimes and Kubernetes distributions. Kubernetes itself can be enabled to manage raw bare-metal infrastructure, making server pool management solutions — while these can certainly be helpful in context — matter much less.
Reduced hardware costs. The global shift to cheaper ARM64 CPUs, for example, is accelerating partly because containers and Kubernetes are making it easier to shift and rebuild workloads. There’s even talk in some quarters that performance-oriented (vs. cost-, energy-, or otherwise-oriented) hardware evolution may stall, particularly in public clouds, because the economics of “dumbing down” hardware is so enticing to the largest-scale providers. What arguably matters more, meanwhile, are location for regulatory, jurisdictional, and sovereignty reasons and connectivity (e.g., low ping times to customers).
This is good news for private cloud operators. It’s cheaper than ever to build out centralized compute/storage/network, these days. Data centers themselves are changing by getting smaller, using less power and cooling, distributing across more locations and moving toward the edge and into constellations of tiny devices.
New and cheaper ways of running private IaaS clouds. A Kubernetes-centric infrastructure is ideal for running complex, critical software robustly and elastically with low operating costs. For example, a bare-metal server pool can run Kubernetes as a substrate under open source infrastructure-as-a-service OpenStack with no performance penalty (the KVM hypervisor runs directly on hosts and is managed by sidecars), delivering improved resilience, seamless updates and easy scaling while providing a robust home for classic virtual machines, networks and storage.
Reduced costs for automation. Crafting a secure, dependable, performant toolchain for developing, testing, staging and delivering applications to production is essential for keeping customers happy and reducing business risks. If you target a consistent Kubernetes cluster model (vs. divergent private- and public-cloud infrastructures), you only need to do this once, then spend time improving what’s relevant to your bottom line.
Consistent Kubernetes Everywhere
The bottom-line requirement for using Kubernetes as “the infrastructure” is that you need to be able to deploy, observe and manage the lifecycle of one consistent Kubernetes cluster model anywhere. This is what enables application, configuration, CI/CD and operations automation targeting the Kubernetes layer to work the same, whether it’s aimed at a cluster running on blades in your server room or at a cluster running on AWS instances. This implies that someone (hopefully not you) is enabling this functionality and enabling Kubernetes to actively manage diverse underlying infrastructures gracefully.