EMC wants to make storage a first class citizen in the world of containers.
When Docker Inc. last year launched its plug-in ecosystem, one of the immediate benefits it touted for developers was a mechanism for enabling persistent storage containers — in order to host active databases and other services that had to stay put indefinitely at some location. It was a solution to a problem that was introduced by a solution itself: Stateless services, which by definition leave nothing behind, give no active context to the services that take their place in the queue. It was keeping developers from building something as common, and as necessary, as a container-based content management system.
Persistent volumes addressed that problem, and to the delight of Docker partners like ClusterHQ, appeared to solve the stateful problem completely. And for a while, it looked like the storage systems market was on board with the idea. As soon as ClusterHQ had a working model for persistent storage made available through Docker plug-ins, the company and storage leader EMC announced a deal that would enable ClusterHQ’s Flocker data volume manager (misspelled in that report) to integrate with EMC devices. What would appear to Docker containers as persistent memory, could actually be persistent storage.
But in what appears to be a dramatic course change, EMC released into the community Wednesday its storage provisioning framework designed exclusively for containers. It involves the inclusion of a new client library, libStorage, inside containers — a library that would eliminate the need for plug-ins altogether.
Trimming the Tree
In a blog post published late Wednesday, EMC Vice President for Emerging Technologies Josh Bernstein argued that a more centralized approach to accessing storage volumes is needed for containers, one that would eliminate “architectural dependencies” and would “enable container runtimes to communicate directly with storage platforms.”
Put another way; Bernstein is reviving the old case that an architecture that requires plug-in can not, by design, be considered truly “open.”
“Container and open source platforms are each addressing storage challenges independently, and building a healthy ecosystem around storage is critical,” wrote Bernstein. “As a result, sustainability is in question as additional methods of bringing storage into these platforms emerge. EMC has introduced libStorage to move the needle.”
A scan of libStorage’s preliminary documentation reveals an undeniably simple and straightforward operation. Its main purpose is to identify, and then connect to and manage, an available storage volume on the network. It is by no means some kind of EMC storage system driver. Rather, it expects whatever volume is being connected to, to be self-maintaining. The role of the storage volume from the container’s perspective, therefore, does not become some kind of slave to the container, because the connections and exchanges of data are all passive. Conceivably, under this model, a volume could be anything that looks like a volume — and at least at first glance, there’s nothing that would appear to prohibit some kind of persistent storage container, or virtual volume, from qualifying.
Data volume containers are now a permanent part of Docker and have been since last year. As the current release of Docker’s documentation explains, “If you have some persistent data that you want to share between containers, or want to use from non-persistent containers, it’s best to create a named Data Volume Container, and then to mount the data from it.”
Meanwhile, CoreOS has been maintaining its own method for mounting attached storage volumes from within Docker or rkt containers — a method it describes as enabling expanded capacity for container images. Just Wednesday, the company released a prototype edition of what it calls Torus, which it describes as a distributed, pooled block storage system that may, in future releases, be adapted for object storage. Torus relies upon Kubernetes to attach new persistent volumes to its pods (clusters of containers) on an as-needed basis, by way of a library called flex volume. But another word for such a library is a “plug-in.”
There are straightforward API directives in libStorage for obtaining a snapshot of a volume. It’s a compelling use case, particularly if it’s true that any variety of storage volume can make use of it. Last year, Rancher Labs released a storage driver called Convoy, which enables storage volumes to be created and discontinued directly by an API command from within a container. That API also contained a command for creating volume snapshots and storing them within the container itself. But by “storage volume driver” here, Rancher really meant “plug-in.”
It’s that part of the storage architecture that EMC is contesting. The preliminary documentation for libStorage on GitHub makes the case, albeit diplomatically, that anything plugged into an infrastructure system creates a dependency. And, by the fact that other things could conceivably be plugged into that system as well, the promise of portability and uniform code from system to system, is blown to smithereens.
EMC made certain this week to get Rancher Labs on the record as a supporter. In a statement evidently intended to appear in a press release, Rancher Labs CEO Sheng Liang is quoted by the EMC blog as saying, “libStorage solves one of the most critical issues of containers in the context of storage: communication, knocking down a major hurdle for users, and enabling them to more efficiently extract more value, more quickly, from multiple container platforms.”
Connecting to EMC, and Others
Strengthening EMC’s case is the revelation that it has created a kind of vendor-agnostic interface for connecting runtimes — including Docker containers and Mesos packages — to storage provider services — not only EMC’s own software-defined block storage system, ScaleIO, and its all-flash storage array XtremIO, but cloud services such as Amazon’s as well.
The libStorage library will use this interface, called REX-Ray, as a way to provide a kind of double-abstraction: hiding the details of the storage volume from containers, and masking the specifications of containers from the storage volume. This way, connections are between X and Y, and anyone can fill in the blanks. It’s a convincing model for openness, and it stands in stark contrast to any alternative model that relies upon someone’s branded plug-in.
Just last month, EMC introduced what it touts as a next-generation storage hardware platform [PDF] with the umbrella brand Unity (a change from “VNX3”), which looks the least like a virtual storage appliance than anything you’ve ever seen attached to a rack. It’s a set of all-flash and hybrid flash storage appliances, each line of which is available in four form factors including a 2U unit.
But the driver for these devices is something EMC calls UnityVSA. The “VSA” part stands for “virtual storage appliance,” which the company defines as, “A storage node that runs as a virtual machine instead of on purpose-built hardware.”
“EMC UnityVSA decouples the software stack from the physical hardware,” reads another EMC white paper devoted to its new driver software [PDF]. “This enables the software stack to be deployed on commodity, off the shelf hardware. UnityVSA increases agility and flexibility by enabling faster deployments while also reducing hardware dependencies, allowing for hardware consolidation, and providing effective use of repurposed arrays.”
The Long, Decoupled Chain
Essentially, EMC is taking the unusual step of producing a driver that abstracts any EMC distinction from the image of the storage volume it provides to the rest of the world. One benefit this may add is the ability for enterprise storage managers to repurpose existing hardware, of whatever variety, and include it in the same pool with Unity hardware. Of course, you have to have Unity hardware to make this happen.
The UnityVSA file and block storage driver, according to the documentation, can run on any server that supports VMware ESXi. Separately from the Professional Edition of the driver, a Community Edition limited to 4 TB of capacity connects up to 19 virtual disks. But each of these disks can be a decoupled form of physical hardware, such as EMC Unity or EMC ScaleIO.
This driver would produce a virtual pool of storage. It wouldn’t have to be all EMC-branded, assuming of course that any other storage appliance would be addressable through the VSA. But it would appear that EMC is advancing the REX-Ray interface as a storage engine that could, conceivably, link directly to the VSA that represents the abstract pool of Unity appliances. Presently, the REX-Ray documentation on GitHub shows, among its list of supported storage platforms, “Others.”
This appears to open the door to a chain of interfaces linking Docker containers to Unity storage devices, without the use of Docker plug-ins. The chain would look like this:
And while it may be feasible for other storage appliance vendors to leverage this same chain as a bridge to their products, the unanswered question at this point is whether the rest of the storage market is prepared to allow EMC to decide what “open” should look like.
Docker is a sponsor of The New Stack.
Title image of chain links “On the foreshore at Manningtree, Essex,” by Howard Lake, licensed via Creative Commons.