Interview: Google gVisor and the Challenge of Securing Multitenant Containers
Last week Google unveiled a new open source project, gVisor, a sandbox for securely running containers in multitenant environments. This approach, should it prove viable, could eliminate the cost of using virtual machines (VMs) to isolate containers. It also could be a building block to the growing set of cloud-native technologies now being developed, alongside Kubernetes, another project Google also released as an open source several years back.
We caught up with Google Product Manager Yoshi Tamura at the Kubecon+CloudNativeCon Europe 2018 to learn more about gVisor.
What is the need for gVisor? Containers are more performant and more flexible than VMs. But they also have their own set of security constraints?
Absolutely. So gVisor is a new type of sandbox that is lightweight yet provides strong isolation between containers at the VM level. It will boot up in about 150ms, and the footprint is the smallest, about 15MB. We’re also seeing the emerging use case of multitenancy. People have more and more demand to run multiple users group in one environment. Now that actually exposed the need for improving the isolation between these containers.
So just to clarify, the initial concern for the sandbox is not from people or from malicious users getting in through the container from the outside but rather getting into the container through the shared operating system?
Exactly. So when you actually have this shared environment, you typically do not have full visibility or control what kind of software to be running inside that container. Assume that there’s malicious software running inside a container and the containers are exposed to live system interface provided by the host kernel. If one of these malicious apps successfully tampers with this and compromises it basically all the containers on that node could be compromised or at least be affected. So what we’re trying to do with gVisor is to contain that particular malicious software from attacking the rest of the system.
In a recent survey @AquaSecTeam found that about 86% of containers run as root, mostly uneccesarily. As a result “intruders can do a lot of interesting things with the YAML file,” such as run root-level commands on the host — @lizrice #security #Kubernetes #kubecon pic.twitter.com/P2ziCOwoYK
— The New Stack (@thenewstack) May 4, 2018
So you can’t just seal off the container from the rest of the world?
Technically, there are a lot of existing technologies [to do this] such as seccomp filter to filter the system calls and application access. Also SELinux and AppArmor. For these, you need experts to configure them to meet those security or isolation requirements. The great part of gVisor is that this isolation is embedded in the engine, so even though you are not as familiar with those concepts, we can provide a very quick strong isolation boundary with a very lightweight footprint.
We’ve heard horror stories about configuring SELinux, it’s very strong but it doesn’t scale well I guess you could say, as it requires lots of expertise to configure.
I totally agree. If you’re familiar with those technologies, sure, you can still use them. So, therefore, we believe that gVisor itself is a complementary technology in the existing world. We believe that this is a very new approach and therefore we really wanted to open source as soon as we can so that we can advance container isolation, or container security field together in the rest of the open source communities.
Was gVisor originally an internal project? Or how did it come about?
Exactly. So Google has been using gVisor for various services, and it has actually been in production for a while. That actually gave us confidence that our approach is definitely valuable.
Could you describe the sandbox? What is it and how’s it different from, like say, using a virtual machine?
Yes. Maybe I should start from a virtual machine approach because people because this is the approach used now. This is where you put a container in a virtual machine. It is definitely a very reasonable approach for sure. The virtual machine has a very strong isolation boundary. The downside of this approach is the size of the footprint.
So the approach that we’re taking is that the container would issue the system calls as usual, gVisor will track those system calls, and yet without using the guest kernel inside the VM. There is a process called Century which will emulate the system calls that made by the application, and then propagate them to the host kernel or some IO services necessary. So technically this is actually not using a VM as a boundary, it is using the operating system layer to provide that isolation.
There are two ways to capture the system calls in the current gVisor. One is pTrace, the native operating system construct and the other one is KVM.
Why is intercepting the system call better than the using a new kernel? How is that better?
Let’s start from pTrace. The pTrace part is indeed provided by the host kernel. So the trick from there is that instead of the host kernel processing the system calls directly — which is the case for the usual container — that request will be forwarded to Century which will emulate the leanest kernel system behavior in the userland. Even if this gets compromised, it does not mean that the attacker can own the whole system yet because it is just one process on the drive.
That even the attacker compromises Century, it does not necessarily mean that the attacker owns the whole system. So for example, in case of a regular container if the attacker successfully compromises a kernel — gaining ownership — basically it’s fair to consider that attacker own the system at that point. But in case of gVisor, even if the attacker were able to somehow compromise Century, it will still remain there. You know, it compromises just one of the process.
So that stops the attacker from getting outside of the container?
But what stops the attacker from getting inside a container?
That is actually the very important point. I think there a lot of new technology that could be coming for protecting that part. But in case, just going back to gVisor especially, the typical deployment that we would expect is for all containers is that you run regularly. What kind of software do you want to sandbox? The software that you’re not so sure about. Software that you don’t have any control over. Those are the things that gVisor should be able to contain the risk within that container.
The blog post did mention that most system calls are covered, but not all of them. So what are the system calls that wouldn’t work and what should the developer do in those cases?
There are some system calls that are not fully implemented — I actually don’t have that data. However, we’re already making a lot of progress. And by open sourcing, if we could actually get some sort of contributions. I believe that the system call coverage will be solved. And it is definitely a solvable problem.
Tell us about the hopes of wanting to open sourcing gVisor. It sounds like, of course, that you want the community to take a look at it, add to it, maybe even assume control over it.
We wanted to discuss with the other open source communities about this technology so that… to really discuss and figure out what would be the best way to move forward. And we’re so excited to really work with those existing source code communities to really advance the container field.
Google is a sponsor of The New Stack.