ClusterHQ: Let’s Move Docker Like Cattle but Make the Data Special
Last August, a newly formed operation named ClusterHQ made a very public launch of a completely new and obscure development tool, with a version number that few other companies would dare use for a launch: 0.1. It was a concept called Flocker, named with the hope that it would be even further associated with Docker than just the rhyme.
At first, Flocker’s mission was to resolve the disparity between the model of the application with which most developers were already familiar, and the model which Docker was hosting upon the world. It would use something called an ‘app manifest’ to bridge together the various Docker container images that collectively comprise an application, with the intent of making them portable collectively.
Portability is one problem that Docker has always claimed it solves, but it does so at the container level, not the application level. If we change our model of constructing applications to one that adheres to the principles of stateless microservices, Docker’s advocates advise, the only problem is with our choice of context. If we stop thinking of applications as contiguous amalgams, we can build a world more in Docker’s image.
That might not be what everyone wants. On Wednesday, ClusterHQ formally announced the unveiling of Flocker 1.0. In the intervening year, its mission has evolved dramatically. Its aim is to move persistent data volumes inside of containers as they are relocated.
Or, put another way, so much for the stateless container.
“Container data management is, as far as I know, not an industry term. This is a phrase that we made up,” admits Mark Davis, ClusterHQ’s CEO, in an interview with The New Stack. “We know what problem we’re solving, but we basically didn’t have a handle to describe it. But what we’re really dealing with is, how do you deal with data within a container, and how do you manage it effectively within a production environment?”
Perhaps the simplest definition of a stateless service to date, devoid of all its variously extrapolated social, political, and economic implications, comes from OpenStack. Once an interaction with a stateless service is complete, it says, the service should remember nothing about it. It seems like a simple enough concept, until you recognize that most applications in an enterprise produce documents of some form or format. Maybe everything can be stateless on the server side, some have theorized, if you concede that the document gets produced on the client side.
Then, of course, there are databases. The model of implementing database management using stateless services is mainly a mental one. Back in November 2013, Docker CTO Solomon Hykes declared the problem of implementing stateless services with Docker already solved, in a message to open source developer Kris Buytaert’s personal blog.
“Container engines are not just for stateless applications. Docker has primitives for persistent data volumes,” Hykes wrote. “These primitives are typically used to designate persistent parts of the container like database files. Volumes continue to exist separately from the lifecycle of an individual container, and can be shared or transferred across containers.”
Uh-huh. Just a few weeks ago, The New Stack published the story of one Achim Weiss, who runs the IaaS provider ProfitBricks. Weiss’ service offers persistent data volumes, as one way to resolve the key underlying issue of databases needing not just to persist, but to use a database developer phrase, to be persisted. “Docker is ephemeral in nature, but applications, such as databases, often require data to persist across system restart,” wrote Weiss. “Sure, let’s lose state, but I don’t want to lose my PostgreSQL data.”
Obviously it’s a problem to somebody.
“The elephant in the room is that the way containers are designed, and particularly the way Docker is designed, the assumption is that the container is stateless,” explained ClusterHQ’s Mark Davis. “All of the infrastructure, all of the way to the top level, assumes that a container can die at any moment, and that none of its state matters — that it can be gotten from some other place.”
The way Davis perceives it, this forces developers who are moving their applications into a microservices architecture to containerize only one part of the application: the stateless side. “Of course, that’s all wonderful,” he said. “But data is in the center of applications, and customers we have spoken to over the last year-and-a-half have expressed a desire to containerize all of their apps, not just the stateless part.”
If you ask Docker Inc., this isn’t really a big problem. Docker already provides data volumes today, in such a way that they’re housed within separate containers but grafted onto the file systems of the containers that reference them. “If you have some persistent data that you want to share between containers, or want to use from non-persistent containers,” reads Docker’s documentation, “it’s best to create a named Data Volume Container, and then to mount the data from it.”
Fourth and Long
In a statement to reporters last August during his company’s official formation, Davis attempted to portray the problem not as Docker trying to resolve the data containment problem and failing, but rather as kicking the can down the road for someone else to solve.
“In particular, the current approach to both PaaS and containers,” he wrote, “is to punt the hard problem of data management out to ‘services,’ which either ends up tying us in to a specific cloud provider, or forces the data services to get managed orthogonally to the stateless application tier in an old-fashioned way.”
In Davis’ recent discussion with us, he tried breaking the problem down more simply: Customers accustomed to the world of virtual machines expect high availability, flexible storage, and the ability to seamlessly migrate data. People expect container data management, in other words, because they expect data management. To the extent that the new environment does not provide these features, it will be rejected.
I reminded Davis of the oft-cited “pets vs. cattle” analogy, which suggests that a proper microservices management environment treats containers more like cattle than pets. People may expect management services in the new world to present behaviors and features like those in the familiar old world, I said, but maybe that’s not what these people actually need.
Given the distinct architectural differences between these two worlds, I asked Davis, what is the real value proposition for Flocker 1.0 providing customers with an old-world expectation?
“The cattle analogy is absolutely where the world wants to go,” Davis responded. “And that’s actually independent of whether it’s a container or a virtual machine or any other technology. People need to be able to treat their underlying infrastructure like cattle, not pets, for everything — for compute, for networking, everything. We want to get to the point where we literally don’t care about any individual member of our herd, and that our systems will operate well no matter what happens to any individual member.
“We do that well today in containers for the stateless part of our applications,” the CEO continued. “As long as you don’t see any benefits to containerizing your stateless microservices, then what you said is a true statement about the architecture. The assumption built into the way that Docker first built their containers was that they were absolutely all cattle, and that no given container is special. As people are wanting to go put these container environments into production, one of the things we hear from DevOps teams is, ‘Yeah, that’s great, but I still have to deal with the data.’
The data is going to exist whether I containerize it or not. And that data is special. I cannot lose it; that is a pet to me.”
Davis now characterizes Flocker 1.0 as the first stage of a completely new journey toward a world where things look more like the Docker ideal of a cattle drive rather than a pet store. For the meantime, version 1.0 will enable a type of portability where the data volume is movable along with the container, rather than separately. He argues this added portability will make it feasible to produce dev/test environments that more accurately reproduce the situations of production environments, where the disconnection and remounting procedure wouldn’t and shouldn’t be happening every day.
One possible next step in Docker container evolution from that point, he went on, involves live migration. Until the stateful and stateless sides of the application can be effectively merged, he argued, live migration will be impossible.
But would such a merger result in the “holy grail” of perfectly stateless architecture? Would Flocker merely represent an interim transitional state, on the way to this kind of container nirvana?
“There actually is no such thing as a stateless app,” Davis finally said. “When people say their app is stateless, they’re lying. Every app has state; the question is, where do we put it, and how do we interact with it? By doing the pure, 12-factor, ‘My app is 100 percent stateless,’ you really haven’t built a stateless app. What you have done is built a very pure wall, or an API, between the stateful and the stateless side. And you still haven’t solved the problem; you’ve just pushed it away and said, ‘I’m not going to deal with it.’
“I think that there’s a good reason for wanting to containerize, and deal with, state,” the ClusterHQ CEO went on, “rather than just punting the problem to somebody else, and letting them figure out how to move stateful things around. No matter what anybody says — you or me or the data architects — you’re going to want to move your stateful services around. I can promise you. It’s just real life. We have to be able to do this stuff.”
Docker and ProfitBricks are sponsors of The New Stack.
Feature image: “Hood of Dirty Mitts’ Bus” by runran is licensed under CC BY-SA 2.0.