Any rational organization that wishes to run mission-critical services on containers will at some point ask the question: “But is it secure? Can we really trust containers with our data and applications?”
Amongst tech folks, this often leads to a containers versus virtual machines (VMs) debate and a discussion of the protection provided by the hypervisor layer in VMs. While this can be an interesting and informative discussion, containers versus VMs is a false dichotomy; concerned parties should simply run their containers inside VMs, as currently happens on most cloud providers. A notable exception is Triton from Joyent, which uses SmartOS Zones to ensure isolation of tenants. There is also a growing community who believe that container security and isolation on Linux has improved to the point that one can use bare metal container services not using VMs for isolation; for example, IBM has built a managed container service on the public Bluemix cloud service that is running without VM isolation between tenants.
To retain the agility advantage of containers, multiple containers are run within each VM. Security conscious organizations may use VMs to separate containers running at different security levels; for example, containers processing billing information may be scheduled on separate nodes to those reserved for user facing websites. Interestingly, several companies — including Hyper, Intel and VMware — are working on building lightning-fast VM-based frameworks that implement the Docker API in an attempt to marry the speed of container workflows and the security of the hypervisor.
Once we accept that moving to containers does not imply surrendering the established and verified security provided by hypervisors, the next step is to investigate the security gains that can be achieved through the use of containers and a container-based workflow.
In a typical workflow, once the developer has completed a feature, they will push to the continuous integration (CI) system, which will build and test the images. The image will then be pushed to the registry. It is now ready for deployment to production, which will typically involve an orchestration system such as Docker’s built-in orchestration, Kubernetes, Mesos, etc. Some organizations may instead push to a staging environment before production.
In a system following security best practices, the following features and properties will be present:
- Image Provenance: A secure labeling system is in place that identifies exactly and incontrovertibly where containers running in the production environment came from.
- Security Scanning: An image scanner automatically checks all images for known vulnerabilities.
- Auditing: The production environment is regularly audited to ensure all containers are based on up-to-date containers, and both hosts and containers are securely configured.
- Isolation and Least Privilege: Containers run with the minimum resources and privileges needed to function effectively. They are not able to unduly interfere with the host or other containers.
- Runtime Threat Detection and Response: A capability that detects active threats against containerized application in runtime and automatically responds to it.
- Access Controls: Linux security modules, such as AppArmor or SELinux, are used to enforce access controls.
Organizations need to be careful about the software they are running, especially in production environments. It is essential to avoid running out-of-date, vulnerable software, or software that has been compromised or tampered with in some way. For this reason, it is important to be able to identify and verify the source of any container, including who built it and exactly which version of the code it is running.
The gold standard for image provenance is Docker Content Trust. With Docker Content Trust enabled, a digital signature is added to images before they are pushed to the registry. When the image is pulled, Docker Content Trust will verify the signature, thereby ensuring the image comes from the correct organization and the contents of the image exactly match the image that was pushed. This ensures attackers did not tamper with the image, either in transit or when it was stored at the registry. Other, more advanced, attacks — such as rollback attacks and freeze attacks — are also prevented by Docker Content Trust, through its implementation of The Update Framework (TUF).
At the time of writing, Docker Content Trust is supported by Docker Hub, Artifactory and the Docker Trusted Registry (currently experimental). It is possible to setup the open source private registry with Docker Content Trust, but this requires also standing up a Notary server (see these instructions for more details).
In the absence of Docker Content Trust, it is still possible to verify image provenance using digests, which are cryptographic hashes of the contents of an image. When an image is pushed, the Docker client will return a string (such as sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2) that represents the digest of the image. This digest can then be used to pull the image. Whenever an image is pulled in this manner, Docker will verify the digest matches the image. Unlike tags, a digest will always point to the same image; any update to the image will result in the generation of a new digest. The problem with using digests is organizations need to set up a proprietary system for automatically extracting and distributing them.
The provenance of images from third-parties — whether they are used directly or as base images — also needs to be established. When using Docker Hub, all “official” images have been vetted by Docker, have content trust information attached and should be considered the safest Hub images. Discretion should be applied when using other images, but note that “automated builds” are linked to the source code they are built from, and should be considered more trustworthy than regular user images. Organizations should consider building images from source themselves rather than pulling from untrusted repositories. This situation is currently changing somewhat with the emergence of Docker Store, which will provide a trusted store for publishers along the same lines as Apple’s App Store.
Security scanning of Docker images is a new service being offered by several companies. The basic idea is simple: take a Docker image and cross-reference the software it contains against a list of known vulnerabilities to produce a “bill of health” for the image. Based on this information, organizations can then take action to mitigate vulnerabilities.
The current offerings include Atomic Scan from Red Hat, Bluemix Vulnerability Advisor from IBM, Clair from CoreOS, Docker Security Scanning from Docker Inc., Peekr from Aqua Security, and Twistlock Trust. They vary widely in how they work, how they are accessed and how much they cost. One crucial difference is the way in which the scanners identify software installed in images.
Some scanners, including Clair, will just interrogate the package manager (e.g., Apt on Debian and Ubuntu) to find the installed software, but this won’t work for software installed through tarballs, or with package managers the scanner doesn’t recognize. In contrast, Docker Security Scanning performs a binary-level analysis of images that works regardless of the package manager and can also identify versions of statically-linked libraries. Twistlock is also interesting in that it performs scanning on software installed through tarball, features zero-day feeds in their vulnerability scanning, and works in air-gapped environments.
It is essential to consider how security scanning can be integrated into your systems. Docker Security Scanning is available as an integrated part of Docker Cloud and Docker Datacenter, but not as a stand-alone service. Other providers offer an application program interface (API), allowing integration into existing CI systems and bespoke workflows. Some scanners can be installed on-premises, which will be important to organizations with a requirement to keep all software within their boundaries.
Once you’ve integrated a security scanning service, your first thought may be to have a blanket ban on running any images with vulnerabilities in production. Unfortunately, you are likely to find that most of your images have some vulnerabilities, and this isn’t a realistic option. For example, Ubuntu has one of the best records for quickly updating images, but — at the time of writing — the 16.04 base image ships with a single major vulnerability due to the version of Perl used (most other images have considerably more issues). Therefore, you are likely to find that you need to investigate discovered vulnerabilities individually to ascertain whether or not they represent a real risk to your system.
This situation can be alleviated significantly by using lightweight containers that have unnecessary software stripped out. The simplest way to do this is to use a very small base image, such as Alpine, which comes in at only 5MB. Another, somewhat extreme, possibility is to build a statically linked binary and copy it on top of the empty “scratch” image. That way there are no OS-level vulnerabilities at all. The major disadvantage of this approach is that building and debugging become significantly more complex — there won’t even be a shell available.
Automated scanning is a huge move forward for security in our industry. It quickly surfaces potential risks and places pressure on vendors to patch vulnerable base images promptly. By paying attention to the results of scans and reacting quickly, organizations can stay one step ahead of many would-be attackers.
Auditing directly follows security scanning and image provenance. At any point in time, we would like to be able to see which images are running in production and which version of the code they are running. In particular, it is important to identify containers running out-of-date, potentially vulnerable images.
When working with containers, it is strongly recommended to follow what is sometimes called a “golden image” approach: do not patch running containers, but instead replace them with a new container running the updated code — blue-green deployments and rolling upgrades can be used to avoid downtime. With this approach, it is possible to audit large numbers of running containers by looking at the image they were built from. Tools, such as Docker diff, can be used to verify that container filesystems have not diverged from the underlying image.
Note that it isn’t enough to scan images before they are deployed. As new vulnerabilities are reported, images with a previous clean bill of health will become known-vulnerable. Therefore, it is important to keep scanning all images that are running in production. Depending on the scanning solution used, this doesn’t necessarily involve an in-depth rescan; scanners can store the list of software from scanned images and quickly reference this against new vulnerabilities.
It is still important to audit the hosts in a container-based system, but this can be made easier by running a minimal distribution, such as CoreOS, Red Hat Atomic or Ubuntu Snappy, which are designed to run containers and simply contain less software to audit. Also, tools, such as Docker Bench for Security, can be used to check configurations, and both Aqua Security and Twistlock offer solutions that audit hosts and configurations.
Isolation and Least Privilege
A major security benefit of containers is the extra tooling around isolation. Containers work by creating a system with a separate view of the world — separate namespaces — with regards to the filesystem, networking and processes. Also, cgroups are used to control the level of access to resources such as CPU and RAM. Further, the Linux kernel calls that a container can make can be controlled through Linux capabilities and seccomp.
One of the fundamental concepts in information security is the principle of least privilege, first articulated as:
“Every program and every privileged user of the system should operate using the least amount of privilege necessary to complete the job.” — Jerome Saltzer
With reference to containers, this means that each container should run with the minimal set of privileges possible for its effective operation. Applying this principle makes an attacker’s life much harder; even if a vulnerability is found in a container, it will be difficult for the attacker to exploit the weakness effectively. And if a container cannot access a vulnerable feature, it cannot be exploited.
A large and easy win for security is to run containers with read-only filesystems. In Docker, this is achieved by simply passing the –read-only flag to docker run. With this in place, any attacker that exploits a vulnerability will find it much harder to manipulate the system; they will be unable to write malicious scripts to the filesystem or to modify the contents of files. Many applications will want to write out to file, but this can be accommodated by using tmpfs or volumes for specific files or directories.
Constraining access to other resources can also be effective. Limiting the amount of memory available to a container will prevent attackers from consuming all the memory on the host and starving out other running services. Limiting CPU and network bandwidth can prevent attackers from running resource-heavy processes such as Bitcoin mining or torrent peers.
Perhaps the most common mistake when running containers in production is having containers which run as the root user. While building an image, root privileges are typically required to install software and configure the image. However, the main process that is executed when the container starts should not run as root. If it does, any attacker who compromises the process will have root-level privileges inside the container. Much worse, as users are not namespaced by default, should the attacker manage to break out of the container and onto the host, they might be able to get full root-level privileges on the host.
To prevent this, always ensure Dockerfiles declare a non-privileged user and switch to it before executing the main process. Since Docker 1.10, there has been optional support for enabling user namespacing, which automatically maps the user in a container to a high-numbered user on the host. This works, but currently has several drawbacks, including problems using read-only filesystems and volumes. Many of these problems are being resolved upstream in the Linux community at publication time, so expect that user namespace support will become more viable for a larger percentage of use cases in the near future.
Limiting the kernel calls a container can also make significantly reduces the attack surface, both by constraining what an attacker can do and reducing exposure to vulnerabilities in the kernel code. The primary mechanism for limiting privileges is using Linux capabilities. Linux defines around 40 capabilities, which map onto sets of kernel calls. Container runtimes, including rkt and Docker, allow the user to select which privileges a container should run with. These capabilities map onto around 330 system calls, which means several capabilities, notably SYS_ADMIN, map onto a large number of calls. For even finer control over which kernel calls are allowed, Docker now has seccomp support for specifying exactly which calls can be used, and ships with a default seccomp policy that has already shown to be effective at mitigating problems in the Linux kernel. The main problem with both approaches is figuring out the minimal set of kernel calls your application needs to make. Simply running with different levels of capabilities and checking for failures is effective but time-consuming, and may miss problems in untested parts of code.
Potentially, existing tools can be helpful to determine your application’s use of syscalls without resorting to trial and error. If you can fully exercise your application’s code paths with tracing capabilities, like strace2elastic, this will provide a report of used syscalls within your application during the container’s runtime.
While OS-level isolation and the enforcement of least privilege is critical, isolation also needs to be tied to application logic. Without an understanding of the application that is running on the host, OS-level isolation may not be in itself effective.
Runtime Threat Detection and Response
No matter how good a job you do with vulnerability scanning and container hardening, there are always unknown bugs and vulnerabilities that may manifest in the runtime and cause intrusions or compromises. That is why it’s important to outfit your system with real-time threat detection and incident response capabilities.
Containerized applications, compared to their monolithic counterparts, are distinctly more minimal and immutable. This makes it possible to derive a baseline for your application that is of a higher fidelity than with traditional, monolithic applications. Using this baseline, you should be able to detect real-time threats, anomalies, and active compromises, with a lower false-positive rate than what was seen with traditional anomaly detection.
Behavior baselining, where a security mechanism focuses on understanding an application or system’s typical behavior to identify anomalies, was one of the hottest trends at Blackhat 2016. The key to behavior baselining is to automate — as much as you can — the derivation of the baseline, the continuous monitoring, and the detection and response. Today, most organizations accomplish behavior baselining with a combination of manual labor and data science. However, due to the transient nature of containers, it is especially important that the whole process is automated.
Active response goes hand-in-hand with baselining. Active response is how to respond to an attack, a compromise or an anomaly as soon as it is detected. The response can come in many different forms, such as alerting responsible personnel, communicating with enterprise ticketing systems, or applying some pre-determined corrective actions to the system and the application.
In the container environment, an active response could mean performing additional logging, applying additional isolation rules, disabling a user dynamically, or even actively deleting the container. Again, automation is key here — all actions performed must not interfere with application logic in a negative way, such as getting the system in an inconsistent state or interfering with non-idempotent operations.
Some of the products currently offering this level of runtime threat detection and response include Aqua Security, Joyent Triton SmartOS, Twistlock and Red Hat OpenShift. As more mission-critical applications move to containers, automating runtime threat detection and response will be increasingly important to container security. The ability to correlate information, analyze indicators of compromise, and manage forensics and response actions, in an automated fashion, will be the only way to scale up runtime security for a containerized world.
The Linux kernel has support for security modules that can apply policies before the execution of kernel calls. The two most common security modules are AppArmor and SELinux, both of which implement what is known as mandatory access control (MAC). MAC will check that a user or process has the rights to perform various actions, such as reading and writing, on an object such as a file, socket or process. The access policy is defined centrally and cannot be changed by users. This contrasts with the standard Unix model of files and permissions, which can be changed by users with sufficient privileges at any time, sometimes known as discretionary access control or DAC.
SELinux was originally developed by the National Security Agency (NSA) but is now largely developed by Red Hat and found in their distributions. While using SELinux does add a significant layer of extra security, it can be somewhat difficult to use. Upon enabling SELinux, the first thing you will notice is that volumes don’t work as expected and extra flags are needed to control their labels. AppArmor is similar, but less comprehensive than SELinux, and doesn’t have the same control over volumes. It is enabled by default on Debian and Ubuntu distributions.
In both cases, it is possible to create special policies for running particular containers; e.g., a web server policy for running Apache or NGINX that allows certain network operations but disallows various other calls. Ideally, all images would have their own specially crafted policy, but creating such policies tends to be a frustrating exercise, eased slightly by third party utilities such as bane. In the future, we can expect to see an integrated security profile that travels with containers, specifying settings for kernel calls, SELinux/AppArmor profiles and resource requirements.
Further to the topic of access control, it’s important to note that anyone with the rights to run Docker containers effectively has root privileges on that host — they can mount and edit any file, or create setuid (set user ID upon execution) binaries that can be copied back to the host. In most situations, this is just something to be aware of, but some organizations will want finer-grained control over user rights. To this end, organizations may want to look at using higher-level platforms such as Docker Datacenter and OpenShift, or tooling, such as Aqua Security and Twistlock, to add such controls.
It is essential for organizations to consider security when implementing a container-based workflow or running containers in production. Security affects the entire workflow and needs to be considered from the start. Image provenance starts with the developers building, pulling and pushing images on their laptops, continues through the CI and testing phases, and ends with the containers running in production.
Containers and the golden image approach enable new ways of working and tooling, especially around image scanning and auditing. Organizations are better able to keep track of the software running in production and can much more easily and quickly react to vulnerabilities. Updated base images can be tested, integrated and deployed in minutes. Image signing validates the authenticity of containers and ensures that attackers have not tampered with their contents.
The future will bring more important features, as verified and signed images become common and features, such as integrated security profiles, are added. In the coming months, the security benefits alone will be a strong reason for organizations to make the move to containers.
Docker, IBM, CoreOS, Red Hat, Twistlock and Joyent are sponsors of The New Stack.
Feature Image via Pixabay.
The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker, Aqua Security.