The Quest to Build an Unprivileged Container

One aspect that has long bothered security professionals about Docker containers is that they run as root, even if the processes they run themselves are unprivileged.
Now Docker maintainer Jessie Frazelle, formerly of Docker and now with Google, is looking to remedy this issue, along with some fellow Docker users from the community, including many from the Gnome open-source desktop project.
To be exact, the root status the container enjoys is far more limited than full-on root privileges of the machine owner. Through the Linux kerenel, Docker drops more than half of the root capabilities by default. Additionally, Docker blocks dangerous system calls and confines containers using a number of other means.
But to run every Docker run command, end users have to run the sudo command — the command that grants administrative privileges to users. To eliminate all this extra typing, administrators might feel the temptation to add their users to a system Docker user group, which gives users root access on the system, thereby making systems vulnerable to privilege escalation attacks.
Frazelle and company are working on something called unprivileged containers, which removes the sudo requirement altogether, replacing it with a finer set of controls. She explained the work at the recent QCON New York, developers conference.
In addition to allaying security worries, unprivileged containers would be better suited to supporting multi-tenancy environments, in that many shops restrict their developers from shipping their apps to run in root, as a basic security measure.
Even with the default settings, containers are still more secure than running apps natively on a server, Frazelle said. Frazelle herself runs all her desktop apps in containers (an idea of interest to the Gnome folks).
Docker had good reasons for running containers as root. “There are a lot of operations you need to be root user for, to get a typically networked application up and running,” said Docker security engineer David Lawrence, during a short interview at Dockercon 2016 last month.
Chromium Inspiration
The idea for unprivileged containers is inspired by Google open source Chromium browser, Frazelle explained in a technical session. The Chromium browser restricts each browser tab to its sandboxed process, using namespaces and seccomp from the Linux kernel.
Guess what? Docker containers also use Linux seccomp and namespaces. Seccomp restricts the calls a process can make and namespaces limit what resources a user (i.e. container) can see. Recent advances in the Linux Kernel (version 3.8) grants each container the ability to create its own universal set of namespaces.
The difficult part of the project is dealing with control groups, known as cgroups. Cgroups can be used to limit the resources, such as CPU time, system memory and network bandwidth that a user can consume. Tighter control of cgroups would, for instance, prevent container fork bombs from cratering a server.
To date, unprivileged users creating cgroup controls is tricky at best, Frazelle pointed out, likening the job to a “huge tire fire,” in a recent blog post.
Nonetheless, Frazelle worked up a proof-of-concept unprivileged container, called binctrl, sans cgroup controls. And SUSE software developer Aleksa Sarai has submitted patches to the Linux kernel team to make cgroups more amiable to supporting unprivileged containers. Work is also being done to add support for unprivileged containers in the runc universal container runtime, upon which Docker containers can now be run.
Least Privileged Microservices
Frazelle is not alone in attacking this issue. Docker’s security team is working on an implementation of a concept called “least privileged microservices,” Docker’s Lawrence said. “The idea is that your microservice should run the fewest permissions as possible,” he said.
Ideally, containers should work a bit like mobile apps, Lawrence said, in that they come pre-configured with security settings and when they are downloaded they let the administrator know exactly what permissions they require.
Also, BubbleWrap, from people building the Project Atomic next-generation container-based operating system, is another tool that strives for something similar. It creates a new namespace, allowing the user to “run an application in a sandbox, where it has restricted access to parts of the operating system or user data such as the home directory,” according to the project’s GitHub page.
Docker is a sponsor of The New Stack.
Feature image: A “Box Truck Guardian,” painted by @sinned_nyc.