Easily the most astonishing result from ClusterHQ’s most recent State of Container Usage survey [PDF] was that nearly three-fourths of 229 IT professional respondents said their data centers are running containers in a hypervisor-virtualized environment — that is to say, a container environment inside the safety of a separate virtualization layer. That figure was bolstered by about 61 percent of respondents saying that security remained a barrier to their data centers’ adoption of containers in production.
Businesses are not really adopting containers yet — not the way they were meant to be. They’re being tested inside a virtual environment like VMware’s vSphere. And in an interview with The New Stack, VMware’s Chief Technology Strategy Officer Guido Appenzeller — the co-founder and former CEO of SDN producer Big Switch Networks, and co-founder and former CTO of Voltage Security — says they’re doing it because they’re concerned about security.
“Half a year ago, we started seeing the first customers in our NSX customer base start using containers on top of NSX,” Appenzeller tells us. “The way all of them are deploying it today, they effectively have a container host like Docker or Cloud Foundry, and they’re running it inside a virtual machine on a hypervisor. The first time I saw this, I said, ‘Clearly these guys are doing it wrong. After all, it should be virtual machines or hypervisors.’
“But it turns out they have a fairly good argument for why they’re doing it,” he continues.
“It’s basically security. They’re saying, ‘If I run a container host directly on my physical server, that creates several problems.’”
The Isolation Dilemma
One major problem, contends VMware’s chief network security professional, is that it’s impossible to enforce strong container isolation between containers. Since containers are effectively managed by the kernel, a kernel-level exploit has the opportunity of compromising the applications running inside containers, he believes.
Appenzeller’s views are shared by Red Hat Security Engineer Trevor Jay, who in a company blog post last December wrote:
“Containers all share the same kernel. If a contained application is hijacked with a privilege escalation vulnerability, all running containers and the host are compromised. Similarly, it isn’t possible for two containers to use different versions of the same kernel module.”
One could draw the conclusion that virtualizing containerization is merely a temporary state of affairs, up until the time that the security question has been adequately addressed, says Appenzeller. But organizations have already been trained to think of relative security levels in terms of “attack surface.” There may be tens of thousands of ways to address the system kernel, which after all, is designed to be addressable because it’s the operating system. A hypervisor, on the other hand, is designed specifically to be non-addressable.
For this reason, he says, the number of known kernel-level exploits are at least two orders of magnitude greater than hypervisor-level exploits. What’s more, suppose a developer were to build a stateful service that binds to a storage volume on the backend, and creates a new vulnerability with respect to how requests are parsed. That could enable a malicious actor to execute arbitrary code inside the application.
Put another way, a malicious actor would not need to specifically craft a “container exploit” to effectively produce something that exploits a container. If someone were to demonstrate a common kernel exploit — say, at a public conference — he would be demonstrating a container exploit. If that exploit targets the application directly, then it is not only conceivable but likely, says Appenzeller, for a sophisticated attacker to gain root-level privileges on the container’s host system.
And put yet another way, for the benefit of everyone reading between the lines: A kernel exploit is a container exploit. Full stop.
One More Layer of Abstraction
Appenzeller’s remarks are actually not a complaint, but an argument in favor of a solution: a new type of security model for containers that would introduce a new level of abstraction, and perhaps pave the way for a familiar administrative environment to oversee container management.
VMware calls this model ‘microsegmentation,’ and in one respect, it’s like a stateful cluster for either stateful or stateless services running in containers. This compartmentalization would be provided by way of a stateful firewall running on (you guessed it) a hypervisor. As the CTSO explains, the firewall manages connections in and out of the segment. When it sees a questionable connection request, it forwards that request to a management server at the upper level, where at the very least it can be logged. Preferably, the management server can apply conditional rules to determine whether the connection should proceed.
In a worst-case scenario, where a container or a node (the exact terminology may yet be determined) exhibits questionable behavior, the management server can simply reboot it. If the services inside the container are explicitly stateless, then any stateful connections they may have had to, say, an external data volume would be maintained by the firewall. So restarting the service from a trusted image should pose no danger, and introduce only negligible latency … in theory, at least.
“We’re providing you with complete visibility of what’s going on at the network level,” Appenzeller explains (the “we” quite obviously referring to VMware), “not only what the container host thinks should go on, but what’s actually going on. An independent perspective of all your traffic. So if something is suspicious, you will see it.”
Appenzeller tells The New Stack that some VMware customers, including one major bank, are actually running microsegmentation in trials today.
“This allows them to take very complicated apps, where otherwise they wouldn’t be comfortable with the level of security if they were just put into a normal container framework,” he says. “There’s an additional layer of defense from NSX underneath.”
In order for banks and other financial institutions to maintain compliance, he notes, they must maintain particular security controls. Institutions that execute real-time trades, for example, are forbidden from using the same databases for “buy” signals as for “sell” signals; and payment processing centers maintain controls that forbid them from binding to the same databases throughout the lifecycle of payment transactions. These institutions maintain what are unfortunately still referred to as “Chinese walls” between them.
Perhaps the only way these institutional applications would be able to work inside containers beyond the experimental phase is if they continued to utilize NSX, argues Appenzeller, but in a different capacity. Microsegmentation would simplify the coordination of things, by enabling containers to pierce the boundaries of a single VM, while still relying on the hypervisor as the agent of orchestration. There isn’t much talk of Mesosphere or Kubernetes in this scenario.
Yet VMware is the key partner with Docker in the creation of libnetwork, the new native connectivity framework for Docker containers. And VMware is an equal partner with Docker in the Open Container Project, to which libnetwork may be contributed. The libnetwork framework will enable Docker’s new plugins model, which was demonstrated last week at DockerCon, and which Cisco endorsed last week.
In a company blog post last week, VMware spokesperson Roger T. Fortier summarized the optimum relationship between all these components thus:
“VMware NSX enables microsegmentation. Microsegmentation enables strong, granular security controls. Docker enables streamlined microservice architectures. And libnetwork enables VMware NSX’s strong microsegmentation to integrate with Docker’s microservice architectures.”
“Libnetwork is a big step forward, going to a richer networking model that allows customers to implement the kinds of connectivity and security policies that they need,” Guido Appenzeller tells us, “in order to move more sophisticated apps to containers. I think we’re very happy with the progress; that being said, there’s always improvement that can be made. It looks a little bit more like OpenStack, which is something we certainly like to see.”
Yet with OCP now a reality, Cisco, VMware, IBM, Microsoft, CoreOS, and Docker are now equal players in the development of a baseline definition for “containers” in general. Containers are no longer “Docker” and “other,” and we should get accustomed to that.
In this world of equal partners, VMware is introducing a new way to consider the arrangement of containers in a system — one which clearly fulfills the purpose of keeping VMware in the picture as containers take root. VMware is in the business of building layers of abstraction between components in a system, and selling the management consoles that bridge these layers.
VMware’s NSX could potentially compete against Ubuntu’s LXD, which has also been designed from a less Docker-centric point of view. If such a competition is joined, it could render moot the whole argument over whether systemd or a separate daemon should initialize the contents of containers. Indeed, containers themselves could become much more generic entities, and the differences that Docker, CoreOS, Microsoft, or anyone else endows them with could become superfluous.
Which could render the whole container market more palatable for VMware as a company. Don’t think for a moment that open source markets are not capitalist enterprises.
Cisco, CoreOS, Docker, IBM and Red Hat are sponsors of The New Stack.