Azure Kubernetes Service Replaces Docker with containerd
The news that Kubernetes 1.20 would deprecate Docker as a runtime and it would be removed in time caused a certain amount of confusion, even though the Docker runtime won’t go away until at least 1.23. But containerd support is already the default runtime for Azure Kubernetes Service (AKS) clusters using Kubernetes 1.19 and it will be the default for all new clusters once 1.19 is generally available.
“This has been in development for a long time across the community so I hope most everybody is ready,” Microsoft Corporate Vice President Brendan Burns told the New Stack. “We’re definitely excited to get it done and available to users.”
The Docker Disturbance
For a long time, Docker was the most popular container runtime in Kubernetes and it remains widely used, but containerd was designed (by Docker) to offer the minimum set of functionality for executing containers and managing images on a node, with versioned and stable APIs for container lifecycle and snapshot management.
In fact, the Docker engine is already built on top of containerd, so using Docker in Kubernetes means running the dockershim Container Runtime Interface implementation (because Docker doesn’t have a way to interact with the CRI), as well as Docker itself — and containerd inside Docker. A CRI plugin is built directly into containerd (from version 1.1 onwards), taking out two layers as well as running a smaller codebase (because Docker has code for things like networking, logging and volume management that Kubernetes already does itself).
The confusion is between Docker as the container runtime and Docker as an entire development stack, complete with a user interface for developers, Burns explained.
“Docker is really two things. Docker is the way that you run containers, but it’s also like a whole toolkit around building, managing and interacting with containers. The long-ago split between containerd and Docker was effectively splitting the runtime and the user interface. In a Kubernetes cluster, Kubernetes is really intended to be the user interface.”
Switching to containerd makes pod creation faster, lowers resource usage and promises better speed and stability in operation. That’s not just because you’re taking out two hops in the path, but because it ensures that Kubernetes can properly manage all your containers because Kubernetes users can’t control as much manually.
“Leaving around the vestigial [Docker] user interface has always been a little bit problematic,” Burns noted. “It’s like a side-channel.”
“It makes it harder for the scheduler to work right. For example, if you go in and you use Docker to run a container on that machine on the side, the scheduler doesn’t really know anything about it. It can’t terminate it. It doesn’t want to break anything that you might have done that it doesn’t understand, and it doesn’t have all of the metadata that it would normally have if you created it through the orchestrator, and supplied all the metadata like ‘these are the resources I need, this is part of this replica set’. It doesn’t have any of that information about some random container that you created and yet that container is sitting there and consuming resources.”
Most people use the Kubernetes interface but with the Docker interface exposed, there’s a risk someone can accidentally make orchestration harder. “It’s definitely nice to know that that surface area has been removed,” Burns added.
When developers expressed concern about the removal of dockershim, Mirantis announced that it would work with Docker to maintain the shim code as a standalone project outside Kubernetes. That might require a lot of work, since Burns noted that dockershim hasn’t been under active development and it’s made life more complicated for developers in the Kubernetes project.
Easy for Most
Users who use earlier Kubernetes releases in their clusters will be supported for the lifetime of those versions, so they don’t have to make the change yet.
But containerd been generally available and supported as a Kubernetes container runtime since 2018 and Burns believes that the ecosystem is well prepared. The Xbox Game Pass streaming service is already using containerd and ephemeral disks (which are also now GA in AKS) to reduce latency.
So how disruptive might it be to switch?
“One of the places where occasionally people have run into problems, is people who were mounting in the Docker socket directly. It’s not recommended but some people want to run Docker inside of their Docker container. And to do so they have taken the Docker socket that’s running on the host machine and mounted it directly into their container, and then they can run Docker in Docker. Or there were people who would do a Docker push inside a container for pushing an image up to the registry. That was always unsupported: it was possible but we really don’t recommend that you do it!”
With the Docker user interface gone, that kind of “off-label” uses won’t be possible anymore, but there are better approaches for all of them, he suggested.
“It’s a sign of the maturity of the platform that it took as long as it did because we couldn’t just break people: if nobody was using it, I think we probably would have yanked this four years ago but there are a lot of things to work through and get right and make sure are stable before you’re willing to make a change like this.”
Making the switch to containerd happens with the 1.19 release because releases are explicit upgrades.
“Every Kubernetes upgrade has the possibility of interrupting a workflow for a customer and so by aligning containerd with a Kubernetes upgrade, people are expecting that they need to be careful and they need to upgrade a development cluster first and make sure their workflows continue to work and go from there. But for most people, it should be pretty much a no op.”
There has been some anxiety about containerd in Kubernetes for Windows, simply because there have been fewer Windows releases of Kubernetes for it to be tested in, but the containerd interface has always been the interface for Windows containers, even though Docker was the only supported container runtime for Windows before Kubernetes 1.18 and the stable release target for containerd Windows support is 1.20.
“The Docker binary that they were using in Windows had already been split up into the Docker frontend and the containerd backend, in the code, even though they were distributed together,” Burns explained.
“This was the cutline when we talked to the Windows team about where does their responsibility build-up to and where does our responsibility take over between like people running the operating system and people building orchestration on top of the operating system.”
“I’m pretty confident about stability with containerd there, because it’s the thing they’ve been focused on. It’s not like in Linux where there was a long time where there wasn’t containerd and there’s a legacy built into the code that was split apart; in the Windows case, this was the world from the beginning. They got to take advantage of talking to the people working on Docker about ‘where is the roadmap going,’ not just ‘where is the roadmap is at currently’.”
As Fast as You Can
A number of recent improvements to AKS have been about operational efficiency and adding flexibility for the wider range of users Kubernetes now attracts and the maxSurge feature that’s now generally available in AKS falls into the same category. By choosing how many concurrent VMs in the cluster are being brought down and upgraded at the same time rather than one after another, maxSurge lets people upgrade clusters in the way that best matches their workload, Burns told us.
“Traditionally our upgrade had been quite slow, with the assumption that everybody was primarily interested in stability rather than speed of upgrade: move very slowly, minimize disruption to the applications in the cluster. That’s good for a typical user, but for some users and some workloads they’re actually happier if you run through it as fast as possible. Maybe it’s a development cluster where you just don’t care. You just want to get it done as fast as possible and you don’t want to sit around waiting for that development cluster to finish upgrading. Maybe it’s a really stateless web application or a batch job where the goal is really to get the upgrade done as quickly as possible, as opposed to focus on stability.”
For workloads where you can’t avoid disruption during an upgrade, it’s better to have the disruption be as short as possible. “Jupyter Notebooks are not particularly great for cloud native because they’re single-sourced: effectively if you’re a science person, you have one notebook, it’s running in one process, you can’t have replication. For someone who is running a cluster that has a bunch of Jupiter notebooks in it, the upgrade is disruption, and they want to run through it as fast as possible. Setting maxSurge really high enables them to do an upgrade in a few minutes instead of in an hour; it’s better for them to tell their users ‘we’re going to be unavailable for five minutes’ than ‘we’re going be unavailable for an hour.’”
Kubernetes already has a similar capability in deployments for single applications, Burns noted.
“When you’re upgrading a single application, you have the same choices; you can say ‘I want to move very slowly, very stably, no disruptions,’ or ‘I want to move very quickly; just get it done and minimize the length of time [it takes].’ This mirrors that capability for the cluster. It just gives the cluster administrator more flexibility about how they manage their upgrades.”
These small features that flexibility may become more important as the overall experience of Kubernetes on Azure becomes more integrated and more suitable for a mainstream audience.