If you’re reading this site regularly, you’d probably agree how containers are starting to change the way we perceive the application stack. However, major questions still linger around container security.
At least one company cognizant of this issue is Intel, which earlier this year launched Clear Containers, a technology designed to mix the security benefits of full virtual machines (VMs) with the deployment ease of containers. The version 0.8 release of CoreOS’s rkt container runtime (which everyone calls “Rocket”) incorporates Clear Containers to provide hardware-assisted security.
Where VMs inherently provide good isolation — as the attack surface on hypervisors is really small — containers still rely on borrowed security. They depend on underlying technologies on which they are built and, of course, careful configuration by humans, as well. You can see where the potential issues might be.
For example, the Docker daemon runs as root, so the onus is on the system architect to always be on the lookout. Do you want to use APIs through fancy dashboards for automatic provisioning of containers? Be careful. The use of namespace, control groups and a per-container network stack is a blessing and blocks major areas of attack possibilities. Every container is now isolated from each other and the host. Even then, it is generally recommended to use hardening techniques on the host, like use of carefully chosen policies for AppArmor or SELinux.
Intel’s approach with Clear Containers is a mashup of proven security in VM technology applied to container infrastructure. In very simple words, these containers are highly optimized, stripped-down VMs that provide quick boot and tear-down times and provide support for running rkt containers within. They have all the benefits of VMs along with the speed of containers.
At the heart of this system is a moderately old project called native KVM Tool, or kvmtool, which aims to provide a lightweight, kernel-based virtual machine as an alternative to the hefty QEMU-KVM approach. The kvmtool mini-hypervisor approach allows Clear Containers to boot directly to the Linux kernel without the need of a dedicated BIOS or UEFI. Apart from this, the kernel is stripped down to just the Virtio support and there is a tweaked systemd implementation in userspace.
To reduce the memory consumption footprint, the developers used the DAX (direct access) support in the Linux kernel 4.x. A DAX supported block device allows zero-copy data sharing between the host and the guest. The page cache and the virtual memory subsystem is totally bypassed with DAX. In addition to this, Clear Containers use kernel same-page merging (KSM) on the host to further reduce memory overheads.
How Fast Can it Go?
With all the optimizations, the kvmtool-based hypervisor setup takes 30ms, kernel boot now takes 32ms and the userspace is brought up in less than 75ms, reported Intel Linux kernel engineer Arjan van de Ven in Linux Weekly News. A container system can be brought up in around 150ms.
This is a seriously good number and comparable to container bring-up times. This is leagues ahead of traditional QEMU-KVM any day. The memory footprint is approximately 18 to 20MB and is impressive, considering this is still a VM trying to act like a container, but with all the security at its disposal.
Similar approaches to get more speed and use less resources — but limited features, limited attack surface and probably resulting in slightly elevated security — are those taken by unikernels and “library OSes” like MirageOS, OSv and Rumprun.
Lets see all the creativity developers come up with while trying to mitigate security issues in containers!
CoreOS, Docker and Intel are sponsors of The New Stack.
Feature Image: Intel.