Can You Live without Kubernetes?
Containers provide a means to wrap up an application into its own isolated package with everything required for it to run successfully, such as libraries and runtimes. Obviously, you can run containers on individual hosts, using something like Ansible or Chef to deploy, but this is an increasingly rare approach. Because really, the key to running containers at scale is orchestration.
At a minimum, an orchestrator gives you the ability to place containers, migrate them if the host fails, and provides additional services such as configuration management, network configuration and storage.
Kubernetes (K8s) is obviously the mass-market leader and a natural default choice, but it requires significant time and deep understanding to deploy, operate and troubleshoot. Given its complexity, it is at least worth considering whether it is the right choice for your organization.
One other option, particularly if you are looking to deploy onto a public cloud, is simply to choose a managed Kubernetes service from one of the major cloud vendors, such as Microsoft’s Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Service (EKS) or Google Kubernetes Engine (GKE).
For enterprises that are using a hybrid cloud approach or on-prem, a Kubernetes distribution such as Red Hat’s OpenShift makes a lot of sense and adds a number of additional capabilities on top of what Kubernetes already provides, including routing traffic to your web service from the outside world via the OpenShift router.
“With Kubernetes, you need a curated experience,” said Sam Newman, an independent tech consultant and author of O’Reilly’s “Building Microservices,” told The New Stack. “So what you are seeing is that enterprise organizations are either building their own platforms, or they are buying OpenShift.”
It seems likely that this trend will continue, and that ultimately we will find ourselves using perhaps a Function as a Service (FaaS) or a similar abstraction and being largely unaware that it is, in turn, using Kubernetes under the hood. What other orchestration options, though, are worth considering?
Nomad: More Flexibility Than K8s
Of the direct competitors to Kubernetes, Hashicorp’s Nomad has a very flexible model for running different sorts of application workloads — including Java applications, virtual machines (VMs), Hadoop jobs and so on — and allows for a great deal of customization.
It also works well with the other members of the Hashicorp stack — Vault and Consul — so whilst it may not be a mass-market competitor, it may well be worth a look if you want this kind of flexibility.
“I worked recently with a very large-scale fintech client who is moving to Nomad,” Newman told us. “The reason they are doing it is because they’d already created their application stack around an in-house abstraction, and found that because Nomad allows for a lot more customization than Kubernetes — kind of like how Mesos does in terms of the workflows you can run — it was a better fit for them to run Nomad on their public cloud provider, and then adapt the Nomad stuff to run their existing stacks on top.”
Another Nomad case study involves the gaming company Roblox. As part of the large-scale migration of the company’s gaming servers from Windows to Linux, it chose Nomad as its orchestrator with Docker as the container runtime and Terraform for provisioning.
Roblox runs the majority of its services on its own bare-metal infrastructure, whilst also making strategic use of multiple cloud vendors including Azure, Amazon Web Services and Google Cloud Platform to allow the gaming company to run services as close to the game players as possible, regardless of where they are located.
Nomad’s flexibility was key to Roblox’s decision. The company stated that it “was able to remain in place as the single orchestrator, seamlessly deploying both Windows and Linux workloads in-place before, during, and after the migration.”
The migration gave the gaming company instant savings of over $5 million a year from reduced licensing costs. At the same time, by switching over to running 64-bit, they were able to increase available memory and support larger game instances.
The company has since deployed Nomad on more than 11,000 nodes in 20 clusters across bare metal and cloud—serving 100 million monthly active users in over 200 countries, with 99.995% uptime.
Can You Just Skip Container Orchestration?
A second option worth considering is to leapfrog straight to one of the FaaS platforms such as Azure Functions, AWS Lambda or Google Cloud Functions. These offer a means of building distributed systems that is delightfully simple.
You deploy some code — a function — which is dormant until something happens, such as a file arriving in a particular location or a message landing on a queue, and then your function runs. When it finishes, it shuts down.
The underlying platform handles spinning these functions up and down on demand, and you can have multiple copies running where appropriate. Some very advanced engineering keeps both cold and warm start-up times to manageable levels in a way that would be hard to replicate on-prem.
You gain some level of robustness from the platform without needing to do any work yourself, and you only pay for code that is running, making FaaS a particularly good choice where you have either low or unpredictable load.
The BBC took this route, making use of Lambda functions as part of the core technology stack which makes up the BBC website. The overall system makes use of a mixture of Lambda and EC2 instances, with the latter used where Lambda function invocation would be too expensive.
Perhaps a more surprising FaaS case study is observability vendor Honeycomb.io. The core of their product is a custom database called Retriever, inspired by Facebook’s Scuba. Retriever has no fixed schema, no pre-defined indexes apart from the timestamp, and is multitenanted.
It ingests data from Kafka topics, storing it either on local SSD or in Amazon S3, with the data reading and writing layer implemented in AWS Lambda. It was challenging for the vendor to implement, but doing so improved Honeycomb’s response times by an order of magnitude.
The disadvantage of using FaaS platforms is, however, that they still feel like they are in the early stages of development, and therefore they do have limitations.
You have limited control over the resources given to each runtime invocation; functions are typically capped in terms of how long they can run; and most function invocations are considered stateless, although Azure Durable Functions support the ability to suspend the state of a given function and allow it to restart where the invocation left off.
In addition, FaaS’s dynamic scaling nature can cause problems if other parts of your infrastructure don’t scale as well.
A third option would be a Platform as a Service (PaaS), such as Heroku, Platform.sh, or Railway. Heroku really set the benchmark for developer productivity but unfortunately hasn’t evolved much since Salesforce acquired it. That said, if your application can fit into the given platform constraints a PaaS might be a productive option.
Founded in 2015, Cycle.io is an interesting, comparatively new vendor in this space, sitting somewhere between a PaaS and an orchestrator. It isn’t built on top of Kubernetes and doesn’t use Docker. It also isn’t compatible with the Kubernetes API, but it is OCI compliant meaning the underlying containers are cross-compatible.
“We started building Cycle,” Jake Warner, CEO and founder of the company told The New Stack, “because I’ve been through this before with OpenStack, and I’m making a long-term bet that the same thing is going to happen with Kubernetes: that is, at some point, people are going to say ‘Hey, I’m not interested in being able to customize everything. I want it to just work.’”
Cycle takes a vendor-agnostic approach to container orchestration and currently supports AWS, Equinix Metal, and Vultr, with others on the way. During our conversation Warner emphasized that the firm delivers automatic updates frequently — about every 10 to 14 days — something which even managed Kubernetes services like EKS and GKE have struggled with. By way of contrast, a Datadog study in 2021 reported that the most popular Kubernetes version is 17 months old.
Alongside automatic updates, the focus for Cycle is to try to be developer friendly, as opposed to being DevOps-focused like the majority of platforms. Additionally, the team is aiming to find a sweet spot where it can keep complexity to a minimum whilst still supporting the majority of use cases, and thereby reduce the dependency on a large Ops team.
Currently, the majority of Cycle’s customers are Seed and Series A start-ups, across a variety of industries. What they have in common, Warner said, is that they’ve grown their overall engineering teams but haven’t started to specifically build out their Ops teams. With Cycle, Warner hopes these startups may not ever need a traditional operations team.
Reassuringly for an infrastructure company, the firm is run conservatively. “We’re building an actual business here, and our customers want to know we’re going to be around,” Warner said. “So we have a rule in the company that we have a minimum two-year runway at all times. That way, if something bad happens, we have plenty of time to adapt.”
Cycle is perhaps best thought of as the other end of the continuum from something like Nomad; whilst the platform can support deep technical use cases, it isn’t the most suitable fit for organizations that demand air-gapped deployments, need control of memory encryption, or have applications with very specific hardware requirements.
It also doesn’t currently have GPU support, although Warner told us that this is coming in the next month or so.
However, if you are looking for a solution that enables you to have the features of a PaaS, but on your own infrastructure, Cycle may be worth looking at.