As the move to containers matures, previously back-burner issues such as storage are coming to the fore – and will be among the topics discussed this week at DockerCon in San Francisco. For example, Docker has some news as does Portworx, a new startup. There are several others making plays, including Datawise.io, Joyent and EMC.
“It’s not just storage, everything around infrastructure, including networking, has been an afterthought and they’re treated as bolt-on features,” says Portworx CTO Gou Rao.
“This really isn’t different from the evolution from when virtualization took place. … Every time you have disruptive technology around how you deploy your applications, infrastructure needs to cope with it.”
Docker will be announcing a new plugin model for storage volumes as well as networking.
“Storage in the world of distributed apps is, by necessity, something that needs to be distributed, as well. The first step to enabling distributed storage for containers is actually creating distributed multi-host networking, which we announce on Monday,” said David Messina, vice president of enterprise marketing at Docker. “Then what you need is pluggability for storage options.”
He said Flocker from ClusterHQ will be its first storage partner, but others are expected to quickly take advantage of its storage volume plugins.
Here’s how some companies are tackling containers’ storages issues.
Silicon Valley startup Portworx is emerging from stealth by unveiling its first product with a developer preview of PWX, next-generation storage software for hosting stateful applications in Linux containers.
Portworx PWX provides elastic scale-out block storage natively to Docker containers, allowing containerized applications to execute directly on the storage infrastructure and containers to be persisted and scheduled fluidly across machines and clouds.
Its founders, CEO Murli Thirumale, CTO Gou Rao, and chief architect Vinod Jayaraman, worked together at Ocarina Networks, which Dell acquired in 2010. Portworx, founded last November, is backed by $8.5 million from the Mayfield Fund and an undisclosed amount from Michael Dell.
Containers and data are not inherently portable or persistent across multiple nodes; storage performance is not tuned for containers; and storage features, such as snapshots and replication, are not container-specific, according to the company.
Rao explains that stateless applications require a lot of compute, but have fewer requirements for storage. They don’t care about persistence, but rely on a stateful service, such as an SQL database, key value store or message queue. Stateful applications’ data needs to be persisted, and you care about snapshots and replication. The two have different packaging and deployment requirements.
“Containers have more ways to describe this environment and package them in ways, that it makes more sense to do so than a virtual machine-centric or Chef-centric deployment,” he said.
The company’s vision is for a software-defined way of provisioning infrastructure. That requires two things: Linux containers and a software-driven approach to defining infrastructure for the containers’ applications.
“We don’t really see complicated scripts like Chef and Puppet that are very machine-centric as being the way to provision applications and infrastructure,” Rao said. “We see applications such as an SQL database being provisioned with its infrastructure in one cohesive way.
“Today, they are two distinct steps. To provision an application, I would have to provision the storage first, I would have to know what the application’s usage would be, then separately, I would install and provision the application.
“In a container-centric world, a container and the resources it needs would be deployed and provisioned at the same time, as long as the infrastructure knows how to act on these software intents. This gives the operational guys the agility to be able to manage and scale the data center,” he said.
Its product, PWX, runs on commodity X86 servers. It consists of a scheduler, called PRX orchestrator, that is aware of your data center environment and can schedule the application on the appropriate machine with the right resources; and PXC scale-out block storage that attaches to each containerized application.
“To each stateful application, we attach its own virtual, logical storage device. Wherever the application runs, its storage is always persisted. All of your storage features, like replications and snapshots, are virtually done in software for each container,” he said.
It uses CoreOS to provide the Linux distribution and Docker as the runtime.
The orchestration layer is compatible with any other orchestration tool anybody would use, Rao said.
“We’re taking a very enterprise production-quality standpoint in delivering our product to customers. What they’d expect is a VMware vSphere-style experience, not an OpenStack, do-it-yourself sort of experience,” he said.
Meanwhile, Datawise.io, another company founded in 2014 and still in stealth, will preview its Project 6 at DockerCon this week. CEO and founder Jeff Chou and his team are veterans of Cisco, Veritas, and VMware.
Chou says the company is focused on gaps in the container market around networking and storage, but Project 6 is just a subset of what the company’s doing. He’s keeping most of the details close to the vest. A preview of Project 6, however, can be found at http://www.datawise.io/project-6.html.
“Networking and storage, we consider them one and the same, they’re both IO. It’s difficult to address these problems separately; we feel they need to be addressed holistically,” he said.
“Docker and containers are great for application mobility, but what’s missing is data mobility. Data mobility involves both the network and storage.
“With Project 6, we wanted to demonstrate how you could manage containers in a cluster, while taking into account resources, such as networking and storage. Some of the things we’d like to do are make some enhancements to Docker Swarm or Google Kubernetes in order to allow plug-ins that take into account applications’ network and storage requirements. We feel this is an area ripe for innovation,” he said.
Project 6 aims to simplify network and storage management for bare metal containers in on-premises environments, which he believes will be the coming trend.
Chou said the company will be talking more about its plans later this year.
Joyent CTO Bryan Cantrill, meanwhile, maintains its Manta storage service has been well ahead of the market on container storage.
“Docker has the most traction on stateless services, where it’s entirely transient – and that’s fine. But there are lots of applications that you should be able to run in a container that require persistent storage,” he said.
“If you want to run a database, that requires persistent storage. The idea that you can’t have that in a container is ludicrous. We’ve been doing that in containers for a decade.
“In Triton and in container-optimized Linux, you have direct-attached storage. You have a ZFS file system that your container has that’s persistent. So if you want to run Postgres in a container, you can. The performance is terrific.”
The problem, he says, is that “the Linux file system story is a mess,” and the Linux community hasn’t embraced ZFS, which Sun Microsystems unveiled with OpenSolaris in 2005. “I think ZFS is the best open source storage substrate out there, full stop,” he says.
“Right now, when you’ve got a lot of data, people are sloshing that data around – they’re putting it into an optic store like [Amazon] S3 on the public cloud, or on-prem they’re putting it into a SAN or a NAS appliance. They’re putting it into something whose sole purpose is to reliably store and retrieve that data,” he says.
“What containers allow you to do in the abstract, is spin up compute where storage lives. So if you want to operate on data, if you want to query data, if you want to scan your data effectively, you want to pull your data out. It needs to move to some other element that can compute upon it, which is an absurd waste of resources, because your data is sitting on a computer somewhere. It has a CPU, which can do work.
“A container allows a very light abstraction that allows you to spin up compute where the data is, but it requires you to rethink your storage substrate a lot. You can’t just do that to S3 for a lot of reasons, you can’t do that to NetApp filer or an EMC SAN. You have to rethink the way you do authoritative storage.”
That’s what Joyent did with Manta, Cantrill says. He argues that the market just hasn’t understood Manta – he contends very few people actually understand both Docker and enterprise storage.
And he says the major enterprise storage vendors aren’t focused on container-centric storage – and, in fact, have a vested interest in maintaining the status quo.
EMC and ClusterHQ
Meanwhile, EMC has partnered with ClusterHQ to integrate two of EMC’s fastest-growing product lines with Flocker: its XtremIO series of all-flash arrays, and the ScaleIO software platform, which consolidates direct-attached storage in servers into a unified pool of capacity.
Flocker provides a more reliable way to harness data for stateful applications, something EMC had been working on — it released an open source toolkit earlier this year.
In its new 1.0 release, Flocker also supports other storage options, including OpenStack Cinder and Amazon EBS.
Flocker is expected to be most useful in helping organizations to divert spare capacity from their XtremeIO systems and ScaleIO installations for development projects, Silicon Angle reports.
Cisco, CoreOS and Docker are sponsors of The New Stack.
Feature image: “Jeff Rowley Big Wave Surfer wipeout Photo Jaws Peahi by Xvolution Media,” by Jeff Rowley, is licensed under CC BY 2.0.
The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.