Containerization Leaders Explore Possible Standardized Data Storage Interface

A group of engineers from every leading container orchestrator maker have gathered together, virtually, around an initiative to explore a common lexicon for container-based data storage. Initially proposed by Mesosphere’s Benjamin Hindman, the Container Storage Interface initiative — which, for now, is essentially a GitHub document — is exploring the issue of whether the community at large, and their users, would benefit from a standardized API for addressing and managing storage volumes.
“The goal of this standard is to have a single, cluster-level volumes plugin API that is shared by all orchestrators,” Goelzer writes in the group’s preamble. “So, for example, conformant storage plugins written for Docker would run unmodified in Kubernetes (and vice-versa).”
Goelzer went on to explain the origins of the initiative — which, for the meantime, continues to bear the unfortunate abbreviation “CSI”: “A group of about eight of us from Docker, Kubernetes, Mesosphere and Cloud Foundry basically just sat down in a room and drew up this proposal. Now we are seeking feedback from our respective open source communities to improve the proposal.”
The Debate Over Step One
If the group’s goals come to fruition, any storage vendor or persistent storage container maker would be able to construct a single plug-in that operates in an identical fashion with any orchestrator. As the situation stands today, contributors note, storage vendors reluctant to distribute their support among multiple implementations are typically forced to choose one — Kubernetes, Swarm, DC/OS, Diego — and confine their testing operations to just that one. Otherwise, they may be “sitting out” the early period of market skirmishes, waiting for a single champion to emerge so they can support the one platform the market has anointed.
Like most Google Docs documents, the CSI proposal draft is designed to be marked up by any and all members, with their comments attached in the margins. Using gRPC syntax, the document presents prototypes for four commands that Goelzer and his co-contributors may consider to be the principal verbs in any standard storage container system: create, delete, attach, and detach.
But that most fundamental of decisions has sparked the unofficial organization’s first great debate. At issue is whether an orchestrator or scheduler should assume the role of data container, or data volume, lifecycle management. Leading the charge in the negative is Venkat Ramakrishnan, the vice president of engineering for Portworx, which produces a container-based data services platform.
Under the heading created for the initiative’s “Non-Goals,” Ramakrishnan inserted two bullet points: one for data placement, the other for data lifecycle management. Data placement, he wrote, “is really not the job of the scheduler. [A] scheduler can give hints, but data placement is best left to data path software as data has a lifecycle that is tied to the application data which is not governed by the scheduler.”
“We really think that the storage decisions are driven more by the application’s needs than by a scheduler’s,” said Murli Thirumale, Portworx’ CEO, speaking with The New Stack. “So not only are we, as Portworx, the storage manager, we are also directly able to communicate, in a programmatic way, with the app’s needs so that we actually make decisions about which nodes serve which apps, and which nodes serve which containers.”
Thirumale offered this example: Suppose one application is based around MySQL, and the other around WordPress. A content management-oriented application typically requires large blocks of memory and storage, he pointed out, but not speed. The MySQL application’s profile would be exactly the opposite. Portworx’ management of the underlying data fabric for the data center running both applications, he said, would have already carved out storage volumes that are best suited for both profiles, based on their respective configurations and their data usage histories. A scheduler may not have access to such profiles.
“So we are making those decisions, not the scheduler,” said Thirumale.
Also speaking with us, Ramakrishnan suggested that it is the role of modern, cloud-native applications to consume services. With converged, or hyperconverged, architectures, compute, storage, memory, and network fabric are presented to applications as services to be consumed — and when automation is boiled down to that very simple economy, it’s easier for an orchestrator to scale it up and down.
“Storage is just one part of what we are,” said Ramakrishnan. “What we’ve done is take the old, underlying storage, carved them out, and packaged them as services to individual containers. A database container, like MySQL, might want a high level of availability and data protection. They might create a high number of replicas, and require failure tolerance. So they may subscribe to an HA [high availability] service at a high level. If you’re a database, you may want to back up every few hours. Whereas, if you’re a website, you might want to back up every few days.”
From Portworx’ perspective, it doesn’t make much sense for a scheduler or orchestrator to assume it should have the authority to create storage nodes for applications, if it doesn’t intend to assume the entire role of data lifecycle management. Storage space, after all, is never entirely homogenous, like a carton of vanilla ice cream. If storage systems end up being reliant upon the more basic view of data management that a “create/delete/attach/detach” system would entail, Portworx believes, container data providers would have to limit their configurations to a kind of lowest common denominator.
Ramakrishnan expressed this opinion in several comments embedded with the CSI online document. In one comment, he added this: “Honestly, this is driving backwards and I am not sure why this spec is based on what storage vendors want. This should be based on what applications want to see in their infrastructure (tops-down) instead of bottoms-up.”
Who Should Reference What?
That drew this comment from Google senior staff engineer and Kubernetes co-creator Tim Hockin: “I don’t understand this sentiment. We have some baseline functionality that storage providers must provide to operate in any of the orchestration systems. This is top-down and convergent evolution — all of the major orchestrators have evolved very similar APIs.
“What we have now,” Hockin continued, “is a case where storage providers have to individually decide which platforms matter enough to be supported, which means coverage is inconsistent and testing can be spotty. What we want is a single plugin that providers can build and test against a reference ‘host’ and believe that all of the major COs [container orchestrators] will work.”
In a comment Wednesday, Hockin added that a successful implementation of CSI could conceivably become Flex version 2 for Kubernetes. The current version of Flex, he noted, “is incomplete and should not be used to justify design decisions, which is why we’re looking at this anew.”
From Hockin’s perspective, Kubernetes (and other conatiner orchestrators) need to be more considerate of the needs of the storage vendors seeking to implement them. Without a baseline reference implementation, vendors may be left to experiment by themselves, and who knows that their results will be. A final CSI, he told Ramakrishnan, will need to address the needs of “relatively dumb backend systems,” as well as more granular, particular storage fabric providers like Portworx’. So a basic vocabulary “create/delete/attach/detach” may not only be necessary, but unavoidable.
In Portworx’ discussion with us, Ramakrishnan argued that container-oriented engineers should perhaps avoid considering the requirements of dumber backend systems, particularly from legacy storage vendors. After all, he argued, their objectives are not to build scale-out systems, or even to completely containerize their applications.
“Being in the position where we are at Portworx, with the kinds of customers we have,” he told us, “we believe strongly that containers don’t play well with legacy storage architectures.”
“We are the container data fabric for some of the largest container deployments in the industry today, and we wanted to make sure the community understands how applications that get containerized, are really looking at storage and storage services,” Ramakrishnan added. “In that aspect, I think the Container Storage Interface initiative actually needs a little more help, and that’s why we’re trying to help the community out.”
The Cloud Foundry Foundation, the Cloud Native Computing Foundation, and Mesosphere are sponsors of The New Stack.
Feature image: A church built inside the arch of a viaduct in Runcom, Cheshire, UK, by Sue Adair, licensed under Creative Commons 2.0.