Managing Persistence for Docker Containers
A few years ago, when virtualization was introduced to IT administrators, there was an attempt to standardize the virtual machine (VM) as the unit of deployment. Each new build and version resulted in a new VM template. Eventually, developers and administrators started creating new images and provisioning VMs, which resulted in a phenomenon called “VM sprawl.” Each template and the provisioned VM occupied a few gigabytes of storage, which led to inefficient storage management. Given the large size of VM images, organizations realized that it wasn’t practical to create a new VM template for each version.
One of the goals for Docker was to avoid the pitfall illustrated by “VM sprawl.” The only way Docker could avoid the trap of fragmented images was to adopt a different storage mechanism for its images and containers.
Another goal that was critical to Docker was the separation of filesystems to create isolation between the host system and containers. This isolation was core to the security of containerized applications. To meet these requirements, Docker adopted a union filesystem architecture for the images and containers.
Union filesystems represent a logical filesystem by grouping different directories and filesystems together. Each filesystem is made available as a branch, which becomes a separate layer. Docker images are based on a union filesystem, where each branch represents a new layer. It allows images to be constructed and deconstructed as needed instead of creating a large, monolithic image.
Docker’s use of existing Linux union filesystems is ideal for running applications that can rapidly scale. Since they are self-contained, Docker containers can be launched on any host with no dependencies or affinity to a specific host. While this is certainly an advantage for web-scale workloads, it becomes a challenge to run stateful applications that deal with persistent data. Workloads, such as relational databases, NoSQL databases, content management systems, and big data stacks, demand persistence and durability of data. Some workloads, like content management systems, also require shared data access across multiple application instances.
When a Docker image is pulled from the registry, the engine downloads all the dependent layers to the host. When a container is launched from a downloaded image comprised of many layers, Docker uses the copy-on-write capabilities of the available union filesystem to add a writeable “working directory” — or temporary filesystem — on top of the existing read-only layers. When Docker first starts a container, this initial read-write layer is empty until changes are made to the file system by the running container process. When a Docker image is created from an existing container, only the changes made — which have all been “copied up” to this writeable working directory — are added into the new layer. This approach enables reuse of images without duplication or fragmentation.
When a process attempts to write to an existing file, the filesystem implementing the copy-on-write feature creates a copy of the file in the topmost working layer. All other processes using the original image’s layers will continue to access the read-only, original version of the layer. This technique optimizes both image disk space usage and the performance of container start times.
Strategies to Manage Persistent Data
Docker’s layered storage implementation is designed for portability, efficiency and performance. It is optimized for storing, retrieving, and transferring images across different environments. When a container is deleted, all of the data written to the container is deleted along with it.
As a best practice, it is recommended to isolate the data from a container to retain the benefits of adopting containerization. Data management should be distinctly separate from the container lifecycle. There are multiple strategies to add persistence to containers. We will evaluate the options that are available out-of-the-box with Docker, followed by the scenarios that are enabled by the ecosystem.
Host-based persistence is one of the early implementations of data durability in containers, which has matured to support multiple use cases. In this architecture, containers depend on the underlying host for persistence and storage. This option bypasses the specific union filesystem backends to expose the native filesystem of the host. Data stored within the directory is visible inside the container mount namespace. The data is persisted outside of the container, which means it will be available when a container is removed.
In host-based persistence, multiple containers can share one or more volumes. In a scenario where multiple containers are writing to a single shared volume, it can cause data corruption. Developers need to ensure that the applications are designed to write to shared data stores.
Data volumes are directly accessible from the Docker host. This means you can read and write to them with normal Linux tools. In most cases, you should not do this, as it can cause data corruption if your containers and applications are unaware of your direct access.
There are three ways of using host-based persistence, with subtle differences in the way they are implemented.
Implicit Per-Container Storage
The first mechanism will create an implicit storage sandbox for the container that requested host-based persistence. The directory is created by default at /var/lib/docker/volumes on the host during the creation of the container. When the running container is removed, the directory is automatically deleted on the host by the Docker Engine. The directory may also become unavailable if the Docker Engine crashes on the host. The key thing to understand is that the data stored in the sandbox is not available to other containers, except the one that requested it.
Explicit Shared Storage (Data Volumes)
We can choose the second technique if there is a need to share data across multiple containers running on the same host. In this scenario, an explicit location on the host filesystem is exposed as a mount within one or more containers. This becomes especially useful when multiple containers need read-write access to the same directory. For example, containers running an Apache web server can centrally store logs to the same directory, making it easier to process the logs.
Since the directory on the host is created outside of Docker Engine’s context, it is available even after removing every container or even stopping Docker Engine. Since this shared mount point is fully outside the control of Docker Engine’s storage backend, it is not part of the layered, union filesystem approach.
This technique is the most popular one used by DevOps teams. Referred to as data volumes in Docker, it offers the following benefits:
- Data volumes can be shared and reused across multiple containers.
- Changes made to a data volume are made directly, bypassing the engine’s storage backend image layers implementation.
- Changes applied to a data volume will not be included when the image gets updated.
- Data volumes are available even if the container itself is deleted.
Shared Multi-Host Storage
While both techniques discussed above offer varying levels of persistence and durability, there is one major drawback with them — they make the containers non-portable. The data residing on the host will not move along with the container, which creates a tight bond between the host and container.
Customers deploying containerized workloads in production often run them in a clustered environment, where multiple hosts participate to deliver required compute, network and storage capabilities. This scenario demands distributed storage that is made available to all hosts and is then exposed to the containers through a consistent namespace.
Shared filesystems, such as Ceph, GlusterFS, Network File System (NFS) and others, can be used to configure a distributed filesystem on each host running Docker containers. By creating a consistent naming convention and unified namespace, all running containers will have access to the underlying durable storage backend, irrespective of the host from which they are deployed.
Shared multi-host storage takes advantage of a distributed filesystem combined with the explicit storage technique. Since the mount point is available on all nodes, it can be leveraged to create a shared mount point among containers.
Containerized workloads running on orchestration engines deployed in production environments can configure a distributed filesystem on a subset of cluster nodes. These nodes will be designated for scheduling containers that need long-term durability and persistence.
Orchestration engines provide a mechanism to specify hosts during the scheduling of containers. Docker Swarm filters come with container configuration filters, which define the nodes to use when creating and running containers. In Kubernetes, labels can be used to target a set of nodes when deploying pods. Kubernetes also utilizes Pet Sets, a group of stateful pods that have a stronger requirement for identity.
Typical Operations Supported By Host-Based Persistence
Thanks to tight integration with the core container engine, host-based persistence is the simplest to configure. Development or operations teams perform the following tasks to enable host-based persistence — we are considering explicit shared storage and shared multihost storage scenarios for this workflow:
- Create volumes: This is the first step for enabling persistence in containers. It results in the Docker Engine creating a designated volume which points to the host filesystem.
- Launching stateful containers: The persistent volumes are then associated with one or more containers during launch time.
- Backing up data: Data stored in volumes can be easily backed up to a tar or zip file. Refer to Docker documentation for guidance on this process.
- Migrating and restoring data: The backed up data can then be migrated or restored on a different host by creating a new data volume and decompressing the file.
- Deleting volumes: Data volumes are not automatically deleted after removing associated containers; they will need to be manually deleted by the operations team.
- Configuring a distributed filesystem (optional): In multi-host scenarios, IT may have to configure a shared filesystem, spanning multiple physical or virtual servers.
Top Use Cases for Using Host-based Persistence
Host-based persistence may be considered for the following scenarios:
- Databases: It can be faster to write to a volume than the copy-on-write layer. This is applicable when running relational and NoSQL databases.
- Hot-mounting source code: In a development environment where source code needs to be shared between the host and containers, host-based persistence comes in handy. Since the container accesses the same version as the host, it’s easy to debug and test in a container environment. Developers work in their normal IDE, editing files on their local Docker host, and those changes are reflected immediately inside the container.
- Master-Worker: In a scenario where data needs to be shared with two containers acting as master and worker, host-based persistence should be used. For example, the data aggregated by the master container is processed by a worker container.
Although host-based persistence is a valuable addition to Docker for specific use cases, it has the significant drawback of limiting the portability of containers to a specific host. It also doesn’t take advantage of specialized storage backends optimized for data-intensive workloads. To solve these limitations, volume plugins have been added to Docker to extend the capabilities of containers to a variety of storage backends, without forcing changes to the application design or deployment architecture.
Starting with version 1.8, Docker introduced support for third-party volume plugins. Existing tools, including Docker command-line interface (CLI), Compose and Swarm, work seamlessly with plugins. Developers can even create custom plugins based on Docker’s specifications and guidance.
According to Docker, volume plugins enable engine deployments to be integrated with external storage systems and data volumes to persist beyond the lifetime of a single engine host. Customers can start with the default local driver that ships along with Docker, and move to a third-party plugin to meets specific storage requirements. Volume plugins also enable containerized applications to interface with filesystems, object storage, block storage, and software-defined storage.
As of June 2016, Docker supports over a dozen third-party volume plugins for use with Azure File Storage, Google Compute Engine persistent disks, NetApp Storage and vSphere. In addition, projects like Rancher Convoy can provide access to multiple backends at the same time.
Basics of Volume Plugin Architecture
Docker ships with a default driver that supports local, host-based volumes. When additional plugins are available, the same workflow can be extended to support new backends. This architecture is based on Docker’s philosophy of “batteries included, but replaceable.” The third-party volume plugins need to be installed separately, which typically ship with their own command line tools to manage the lifecycle of storage volumes.
Docker’s volume plugins can support multiple backend drivers that interface with popular filesystems, block storage devices, object storage services and distributed filesystems storage.
Typical Operations Supported By Volume Plugins
Volume plugins typically install a daemon responsible for managing the interaction with storage backends. A client in the form of a command line interface (CLI) talks to the daemon to perform storage-specific tasks on the volume. The operations supported by the CLI go beyond the standard tasks that the Docker CLI can perform.
The volume plugin clients enable the following tasks as part of the lifecycle management:
- Creating provisioned volumes: This step involves creating devices that can be accessed while creating a volume from a standard Docker CLI.
- Taking snapshot of volumes: Many plugins support creating point-in-time snapshots of volumes. These incremental snapshots only contain the delta of changes made since the last snapshot, thus maintaining a small size.
- Backing up snapshots to external sources: Optionally, volume plugin tools support backing up snapshots to sources such as Amazon S3 and Azure Storage.
- Restoring volumes on any supported host: Plugins enable easy migration of data from one host to another by restoring the backups and snapshots.
Flocker from ClusterHQ is one of the first volume plugins to integrate with Docker. The Flocker data volume, called a dataset, is portable and can be used with any container within the cluster. It manages Docker containers and data volumes together, enabling the volumes to follow the containers when they move between different hosts in the cluster.
Flocker works with mainstream orchestration engines such as Docker Swarm, Kubernetes and Mesos. It supports storage environments ranging from Amazon Elastic Block Store (EBS), GCE persistent disk, OpenStack Cinder, vSphere, vSAN and more.
There are many open source volume plugins to support a variety of storage backends. Please refer to Docker’s plugin page for the latest list of available plugins.
Top Use Cases for Volume Plugins
Volume plugins target scenarios typically used in production environments. The following list highlights different use cases:
- Data-intensive applications: Since volume plugins have drivers for specialized storage backends, they can deliver the required performance demanded by data-intensive workloads such as big data processing and video transcoding.
- Database migration: Volume plugins make it easy to move data across hosts in the form of snapshots, which enable migration of production databases from one host to another with minimum downtime. Through this, containers in production environments can be migrated to powerful hosts or virtual machines.
- Stateful application failover: Using volume plugins with a supported shared storage backend like Amazon EBS, customers can manually failover containers to a new machine and re-attach an existing data volume. This enables transparent failover of stateful applications.
- Reduced Mean Time Between Failures (MTBF): With volume plugins connected via a shared storage backend, operations teams can speed up cluster time-to-recovery by attaching a new database container to an existing data volume. This results in faster recovery of failed systems.
In the next section, we will take a closer look at different container storage choices made available by this vibrant ecosystem.
Container Storage Ecosystem
Since storage is a key building block of the container infrastructure, many ecosystem players have started to focus on building container-specific storage offerings.
The container storage ecosystem can be broadly classified into software-defined storage providers, specialized appliances providers and block storage providers. While there are a few dozen entities delivering storage solutions from the container ecosystem, we will explore some of the prominent players from each category.
Software-Defined Storage Providers
The rise of containers in the enterprise has led to the creation of a new class of storage optimized for containerized workloads. Existing storage technologies, such as network-attached storage (NAS) and storage area network (SAN), are not designed to run containerized applications. Software-defined storage abstracts these traditional types of storage to expose the virtual disks to the more modern applications.
Container-defined storage, a new breed of storage, is a logical evolution of software-defined storage, which is purpose-built to match the simplicity, performance and speed of containers. Container-defined storage runs on commodity hardware, featuring a scale-out block storage, which in itself is deployed as a container. It provides per-container storage, distributed file access, unified global namespace, fine-grained access control, and a tight integration with the cluster management software. Many providers make money by selling these services on top of commodities or bundled with a cloud provider’s offering.
One of the key advantages of using software-defined storage for containers is the ability to virtualize storage, which may be based on faster solid-state drives (SSD) or magnetic disks. Aggregating disparate storage enables IT to utilize existing storage investments. Some flavors of container-defined storage can automatically place I/O-intensive datasets on faster SSDs while moving the archival data to magnetic disks. This delivers the right level of performance for workloads, such as online transaction processing (OLTP), which demand high input/output operations per second (IOPS).
Many companies are working on integrating software-defined storage with containers, with many of them selling appliances or Storage-as-a-Service using. Portworx, Hedvig, CoreOS Torus, EMC libStorage, Joyent Manta and Blockbridge all provide developers with access to their software without requiring them to buy something else. StorageOS, Robin Systems and Quobyte are examples of companies that do not provide unbundled access to their software.
Storage Appliance Providers
The virtualization of compute, storage and networking led to the evolution of software-defined infrastructure. Vendors like Dell, VCE and Nutanix started to ship appliances that delivered data centers in a box. These appliances came with bundled hypervisor, storage and networking capabilities, along with management software to orchestrate the virtual infrastructure.
With containers becoming a popular choice, some startups are building appliances that deliver end-to-end infrastructure for containerized workloads. They have purpose-built converged infrastructure for containers that comes with network interoperability and persistent storage.
Robin Systems and Portworx are two of several software-defined storage providers that sell software appliances. Diamanti is an early mover in the space of container-based converged infrastructure. Its appliance comes loaded with industry-standard software — Linux, Docker and choice of orchestration engine — but it’s unique because it provides container networking at the hardware level. Other companies, such as Datera, offer storage appliances as solutions for container use cases.
As containers become mainstream in the enterprise, we can expect to see the rise of converged infrastructure for containerized workloads. Containers solve the problem of assembling the right technology stack for cloud-native applications and microservices.
Object and Block Storage Providers
Key benefits of running containers are realized when they are deployed in web-scale environments. Customers leveraging public clouds often rely on object storage services, such as Amazon S3, IBM Bluemix Object Storage and Joyent Manta, as well as block storage devices such as Elastic Block Storage (EBS) or Google Compute Engine (GCE) persistent disks. To enable easy integration with the infrastructure, these cloud providers are investing in storage drivers and plugins that bring persistence to containers. DevOps teams can host image registries in the public cloud backed by object storage. Block storage devices deliver performance and durability to workloads.
These investments in storage drivers and plugins by cloud providers are primarily meant to run hosted container management services or Containers as a Service (CaaS) offerings. Recently, at DockerCon 2016, Docker announced native support for AWS and Azure. This will accelerate the development and optimization of storage drivers for object and block storage.
Storage is one of the key building blocks of a viable enterprise container infrastructure. Though Docker made it easy to add persistence based on data volumes, the ecosystem is taking it to the next level. Volume plugins are a major step towards integrating containers with some of the latest innovations in the storage industry. Providers are making it possible to tap into the power of enterprise storage platforms.
After revolutionizing the virtualization market, software-defined storage is poised to witness a huge growth with containers. The concept of mixing and matching low-cost magnetic disks with advanced flash and SSD will benefit enterprises running containerized workloads in production, such as IBM FlashSystem arrays being used to power cognitive computing services. This combination can use existing commodity hardware to build storage pools consumed by cloud-native applications.
It’s only a matter of time before converged infrastructure embraces containers to deliver turn-key infrastructure platforms optimized to run complex workloads. The compute building block that relies on hypervisors and VMs is gradually shifting to containers. Vendors, like Diamanti, are gearing up to ride the new wave of converged infrastructure powered by containers.
Public cloud providers with robust storage infrastructure are getting ready for containers. Object storage, block storage and shared filesystem services will get dedicated drivers and plugins to maintain images in private registries and run I/O-intensive containerized workloads in the cloud. Containers as a Service will also drive the demand for native drivers for public cloud storage.