After Six Years, Mesos Hits Version 1.0. Now the Real Fun Begins
Last year at this time, Verizon — by many accounts, the leading voice communications service provider in North America — selected Mesosphere DC/OS (which didn’t have the slash back then) as the orchestration platform for its data center services. So it should come as some comfort to Verizon, and other Mesosphere customers that cohabit the rarefied air of the upper reaches of the Fortune 100, which Apache Mesos — the platform upon which commercial DC/OS is constructed — officially issued the final production-ready version 1.0 into general release.
So does the declaration of Mesos 1.0 actually mean anything, besides a ding of the proverbial egg timer? After all, Mesos and the commercial products based on Mesos from Mesosphere have already made their way into many of the world’s major organizations, and not just on the experimental side of the fence.
“The introduction of the 1.0 API is really going to be beneficial for all the frameworks that run on top,” declared Ben Hindman, Apache Mesos’ co-creator and Mesosphere’s founder and chief architect, in an interview with The New Stack. “Because we’ve introduced the 1.0 API, Mesosphere is actually going to be able to iterate much, much faster on the frameworks.”
The challenge with versioning in the past, with Mesos as well as many other software platforms throughout history, has revolved around the synchronization of dependencies, including libraries. With new versions of platforms often come new libraries, which overwrite and supersede the old libraries. But that supersession can have a detrimental impact on older versions of the platform which must co-exist alongside new versions, especially in multi-tenant environments.
So the big deal with Mesos 1.0 is that it rolls everything up into a nicer, tighter package.
“We recognize that, in the world of open source projects these days, people don’t really adhere to versioning very well,” said Hindman, “especially around API versioning. And we just really wanted to do that. We thought it was really important to put out an API, and version it appropriately. We’ve seen a lot of other projects in the open source space, both in the container space as well as the big data space, that have abused that, and it’s really a giant pain in the butt for users and developers.”
Mesos’ new HTTP API — which Hindman promises, from this point forward, will be properly versioned — is intended to enable developers to write frameworks (Mesos’ term for an application running on its platform) using any language. Previously, a developer usually had to make use of one of a handful of Mesos libraries, a few of which were written in Go, but others in more classic languages like Java and C++.
Ironically, this made a version change to one of the greatest catalysts for agility ever to emerge in the data center, painfully slow.
So Hindman and his fellow Mesos engineers developed a remote procedure call protocol around HTTP, similar to the GRPC protocol developed for HTTP/2. The syntax of the call itself is either Google’s protobuf or simple JSON. Now, evolving the API can more of a sensible, iterative affair. Hindman says Mesos does plan to support GRPC directly in a future release, now that “future” can be carved into more near-term intervals.
“I think at the end of the day, the idea is that it’s a much more consistent API across all the different endpoints that we currently have,” said Hindman, “and it’s just better for our users. We just wanted to get that out before we called it ‘1.0,’ not because 1.0 is about stability and maturity for us, but because it’s really about the users, and giving them an API that they would feel comfortable with, through the 1.0-to-2.0 cycle.”
The Unified Containerizer
Only in an industry like ours can something be named a “containerizer,” and a good number of its practitioners know exactly what it means. Since the beginning of Mesos — which was many, many weeks behind Docker on the scale of evolution — it accomplished partitioning of processes by way of a component Mesos called the “containerizer.” Although Mesos provided its own, beginning with version 0.20.0, it allowed for operators to substitute an external containerizer — which was necessary for Mesos to support Docker images.
“Back in 2009 when we first started, containerization was a big premise of what we were doing,” Hindman told us. “The idea was, we could do lightweight resource isolation through the containerization technologies that existed in things like Solaris, and which were coming into maturity in Linux at the time. So we had to do containerization ‘ourselves,’ in quotes, where we were directly using the underlying technologies from Linux — control groups and namespaces — to containerize applications that we were running.”
When Docker arrived on the scene, it introduced its own daemon for containerization, followed soon afterward by its own container image format.
“So we clearly weren’t going to throw away all of the existing containerization runtime that we had,” he continued, “because we had some users who were going to keep using it no matter what — the Apples, the PayPals, the Netflixes, and the Twitters. We couldn’t just say, ‘Hey, guys, you all have to move to this Docker daemon now, because we’re going to stop doing containerization.’ They would have been up in arms.”
Thus Mesos added support for Docker’s format as well as CoreOS’ appc and plans to support the final OCI standard immediately once it’s published. In each of these cases, Mesos does not require the native daemon of the container engine associated with the format — which, he explains, can be non-deterministic, and therefore difficult to automate. Mesos 1.0, explained Hindman, makes support for Docker images official without invoking the external containerizer.
Another new feature of the unified containerizer is the ability to shut down the Mesos agent without shutting down the containers it had already launched. This will prove valuable, he said, when Mesos or DC/OS is being upgraded, as the process can happen now without shutting down currently running tasks.
Nested container support has also been added — meaning, the ability for the application within a container to serve as a platform in itself that spins up its own container, under a subordinate level of control. Before 1.0, Hindman pointed out, a container under Mesos could indeed spin up another container, but as a sibling. But that process was beginning to pose serious security concerns.
“The minute that you start bringing sibling containers, you run into all sorts of architectural issues,” he explained, “that include things like, what happens if your container dies? How do you properly know to clean up the sibling containers? How many resources should be restricted in that sibling container? If you can just arbitrarily launch sibling containers, can you overwhelm the machine?”
A nested container, by contrast, will only have access to the same resources granted to the parent. This, we’re told, will make the use case of running Jenkins in containers much more feasible, as well as more secure.
Hindman then told us the story of one major Mesosphere customer who adopted the universal containerizer, and soon discovered that it could not deploy one of its apps on the Mesos platform — one that had run just fine using the external containerizer and the Docker daemon. After an extraordinarily long round of debugging, Hindman and his colleagues traced the bug to the kernel of the specific Linux distribution this customer was using. This bug would prevent the container from being properly created, and while it would report that message back to Docker, the daemon would ignore that report and attempt to execute the container anyway.
Mesos would fail the same container that Docker would happily run on its own. Though this bug is now resolved, Hindman said the experience revealed to him the differences in methodology between the two platforms.
“We’ve been very focused on stability, correctness, resource isolation — the components that we really cared about when we run this at big scale,” he remarked. “I think the Docker daemon still is getting some of that maturity, so you’ll see it in examples like this. You might say you’d prefer the Docker daemon experience, which is just, ‘Run my application anyway.’ But it’s one of those things where, maybe you get the app up and running, but what happens later down the line when things start really breaking, and then you’re like, ‘Why is this breaking? I don’t understand!’ And then you realize, oh, it’s because you weren’t being properly isolated and so you ran into this issue, whereas we were catching this thing ahead of time.
“I think it funnels into, generally, the way that we really think in the Mesos community about what we’re building. We really think about the operators, and what’s going to give the operators the best experience?” he said.
Hindman calls this methodology “Day Two Operations,” and explained it to us as the difference between purely configuration-driven automation, such as the original style of Chef and Puppet, to the broader concept of real-time cluster management accomplished through DC/OS, as well as Kubernetes and Docker Swarm.
“It would be nice if we didn’t just move the deployment responsibility, but we also enabled new opportunities and new value-adds for the operators,” he continued. “To me, the specific ones we want are around Day Two Operations. It’s actually operating these things, not the very first time we deploy them, but when you want to upgrade them, to do maintenance in your cluster, and these other things. That’s something that we’ve really thought carefully about as we built in containerization features.”
Mesosphere is a sponsor of The New Stack.