Events / Technology /

ClusterHQ Sets the Stage for Docker Stateful Apps

19 Nov 2015 10:57am, by

Among the many topics of discussion this week at DockerCon EU has been the rise in stateful containerized application — of services that maintain continuous links to data volumes. So it was a timely move by ClusterHQ to introduce the first public betas of dvol, its version control system for Flocker data containers, along with Volume Hub — a GitHub-like mechanism for hosting those containers — the week before the Barcelona event.

Like other ClusterHQ products including Flocker itself (generally released at the end of October), dvol is a Docker plug-in, extending the functionality of Docker through a simple command-line installation. From a command line, developers can commit database states to a registry, then checkout versions of those states for experimentation. Once those checked out states are then committed, they become branches of the original state. States may then be compared against one another, especially after changes to the code have been made, to determine how those changes affect databases differently.

Git for data

In an interview with The New Stack, ClusterHQ Vice President of Marketing Michael Ferranti pointed the way toward a future direction for dvol and the rest of its Flocker product line — a direction that appears to lead to a complete communications ecosystem equivalent in stature to GitHub, although for data volumes in the development stages of their applications.

“The way that git gives you version control for your code, dvol gives you control for your data volumes,” Ferranti explained. With dvol in conjunction with Flocker, he said, a developer can take snapshots of containers that serve as data volumes, clone them, run integration tests against them, and compare multiple snapshots over time against one another to ensure against the reappearance of identified bugs.

A developer in the act of producing application code, perhaps on her PC, can commit, reset, and branch databases running as data volumes within OCI containers, enabling her to revisit a prior state for any number of tests, according to a video from ClusterHQ chief technology officer Luke Marsden.

Does ClusterHQ intend for dvol to eventually take on more characteristics of GitHub, perhaps bringing more of the developer community into the process through the sharing and exchange of data volume snapshots?

“Yeah, that’s where we’re going,” the Ferranti admitted. “If you think about, what are the major things that you do with git — you branch, you tag, you push, you pull — we don’t see push and pull yet through dvol. But eventually, I think you will see that. It’s the idea that a single developer can get a lot of value from dvol today by working on their own laptop, creating copies of their database, committing that state locally. But eventually when you want to share it with someone else on your team, you’re going to need to be able to push that onto a central repository.”

The ecosystem down the road

One of the features that dvol may need to pick up along the way, Ferranti admitted, is encryption. Docker Inc. itself has implemented its own container encryption mechanism with Content Trust, which has already been in use for some three months (an eon, in the evolution of containers).

But as Ferranti said, with respect to encrypting data volumes, “we haven’t had to cross that bridge. We realize that we have a special responsibility to our customers and our users, because we deal with their data, and that’s the heart of the application. And there are a lot of compliance and regulatory concerns, not to mention that their customers are trusting them with their data. So we need to live up to our promise.

“You can expect that, when we roll out features that would potentially compromise users’ data if the engineering practices weren’t top-notch,” he continued, “we will take great care, and we will do all industry best practices to protect that data. Part of our future announcement will not only be the cool, new features, but also, here’s how we ensure that your data is safe, and you can trust us with it.”

ClusterHQ’s new Volume Hub would serve as Flocker’s de facto registry, providing a cloud service-like front end for Flocker data clusters. It’s a browser-based console that collects all the Flocker clusters residing on the same platform, including all the versions of data volumes snapshotted through dvol. It also serves as a provisioning tool for the platform space needed to host Flocker volumes, either on bare metal or cloud-based infrastructure.

Volume Hub “is not infrastructure-dependent. It will work on any infrastructure where you’re running your application,” Ferranti said.

The day before ClusterHQ announced Volume Hub, it announced a deal with Amazon that paves the way for Flocker clusters to be hosted on Amazon ECS using Elastic Block Store. Does this make Amazon the preferred public cloud host for Volume Hub, or does it mean Volume Hub instances will be appearing everywhere on Amazon infrastructure?

“Right now, they don’t overlap,” responded Ferranti, “except to the extent that, if a customer’s environment is on Amazon, then if they have been running their Flocker cluster on Amazon, those volumes will show up in Volume Hub. They’re separate, in terms of infrastructure. Where they overlap is, lots of our customers are using Amazon, and they’re using ECS volumes as their persistent storage. So if you hook up your Flocker deployment running on Amazon EBS into the Volume Hub, you can manage it via Volume Hub.”

Ferranti went on to say that ClusterHQ perceives Elastic Container Service as an emerging orchestration unto itself, on a par with Kubernetes, Docker Swarm, Mesosphere, and Marathon. So ClusterHQ is working with Amazon to enable users of the ECS orchestrator to also manage their Flocker volumes there, with the assistance of Volume Hub. However, there’s nothing about Volume Hub which ties Flocker volumes to Amazon infrastructure, he emphasized.

Victory for state?

Do the rise in developer support for Flocker volumes, coupled with the impending arrival of an ecosystem around them, signal the end of the stateless-vs.-stateful application argument, with statefulness emerging as the ultimate victor?

“I don’t think I would frame it quite like that,” ClusterHQ’s Ferranti responded. “What has happened is that developers and ops people have realized how enormously valuable containerization is for their workflows and their time-to-market. And they’ve seen, as we’ve moved our development environments to Docker containers, we save a ton of time and we solve a lot of complex problems. It’s easier to go into staging, and then to go into production. They’re hooked on that, and saying, ‘What else can we do?’ And the what-else — the thing that’s been held back, out of those environments — has been stateful services.

“It’s not so much that there was philosophical objection to running databases in containers,” he continued, “so much as it wasn’t practical, and there was so much other low-hanging fruit that teams could implement without having to solve this problem. They did those first. But it’s a natural evolution, now that they’ve gotten value out of containerizing their development workflow, moving into CI/CD [continuous integration/continuous development]. They’re saying, ‘What next?’ And that storage piece is the what-next.”

Docker is a sponsor of The New Stack.

Feature image: Grand Harbor, Malta, circa 1890, from Life of Vice Admiral Sir George Tyron KCB by Rear Admiral C. C. Penrose Fitzgerald, licensed through Wikimedia Commons.


A digest of the week’s most important stories & analyses.

View / Add Comments