Mesos and Docker Swarm Race to Improve ‘Health Checks’

Earlier this month, Mesosphere engineer Gastón Kleiman revealed the Apache Mesos project’s intention to implement a native health checking mechanism, for the upcoming release of version 1.2.0. This way, wrote Kleiman, scheduler frameworks that rely on Mesos such as Apache Aurora and Mesosphere’s own Marathon (now part of DC/OS) no longer have to “roll their own” methodologies for keeping track of distributed services.
“One of our intentions,” wrote Kleiman, “was to free framework authors from having to design their own health checking APIs. To accomplish this, we updated the Mesos API, making it possible to express the definition and results of COMMAND, TCP, and HTTP(S) health checks consistently across all schedulers and executors.”
Distributed applications often use heartbeats or health checks to enable their performance monitors, schedulers, and orchestrators to keep track of multitudes of services. Without services being able to send back some sort of signal that “I’m alive,” either by request or by regular pinging, their managers could easily lose track of them. Distributed architectures of all types, including for the Internet of Things, rely upon one of these mechanisms.
If Mesos’ health check engineers have time to catch a glimpse over their shoulder, they may just be seeing a face full of whale.
Last November, contributors to the Docker project revealed that, for version 1.13 — which appears to be preparing for release on January 18 — it’s updating the health check mechanism it introduced for 1.12. That update will enable Swarm to actually put the system to work — in this case, to amend the service records of running applications based on data it receives from health checks. This way, it becomes easier for Swarm to remove an unhealthy container without messing around with the load balancer.
Mesos: Governing by Delegation
The way Mesos has traditionally handled health checks is by way of its scheduler. In a high availability (HA) scenario, there may be as many as three schedulers running in a network simultaneously, with one “elected” as the leader. A scheduler maintains a connection with the master of a Mesos cluster. At least one, probably more, executor acts as a component that launches a task on a slave node.
Each node in a Mesos cluster includes an agent that serves the functions designated by the master. It’s this connection between master and agent that has acted as Mesos’ “lifeline,” if you will. It takes advantage of the “by default” capability of TCP network protocol to be kept open for multiple HTTP requests.
On this persistent TCP connection, the two components exchange what’s literally described as “ping” and “pong.” In the case of Mesos, the information being ping-ponged is meaningful: It typically contains what Mesos calls checkpointing data, in which the agent describes what it’s in the middle of, and the master compares that data against what it expects to see. Like an editor awaiting a long overdue article, a master may choose to take steps to discontinue its relationship with the agent.
There may not be anything particularly deficient about this protocol, in some folks’ minds. But as Mesosphere’s Kleiman pointed out, when a scheduler and a task running on that schedule reside on separate nodes, the ping-pong game may consume inordinate network traffic. What’s more, other frameworks that run on Mesos — and that may, let’s face it, be running concurrently with each other — each have a different interpretation of how the game is played.
The Mesos public documentation on GitHub currently offers details on Mesos’ API changes, which are being introduced in Mesos version 1.2.0. It explains how Mesos engineers are introducing a common API for use in all health checks, whether as an explicit command or through HTTP(S) or TCP protocol.
But perhaps even more notably, the responsibility of performing these health checks shifts from the scheduler to the executor. This way, when nodes are widely partitioned — including across domains — the Mesos agent is capable of managing the process.
Quantitatively, the documentation explains, this should reduce the traffic among components of distributed applications, rendering them much more scalable. There’s a price to pay for this, though: It will be up to these remote agents to send explicit health status messages for their tasks back to their master nodes. What’s more, there’s no protocol at the moment for an outside source to manually designate a task as unhealthy, though conceivably that may yet be resolved.
Docker Improves Its Service Record, Literally
Demonstrating Docker’s new health checks last month, Docker engineer Nishant Totla (with a bit of effort) showed off the ability to scale a distributed application up in such a way that an instance of a service doesn’t receive requests until it’s deemed healthy. This ability takes place in what’s now being called “Swarm mode” (Docker, with the embedded Swarm turned on).
Newly added service instances are deemed healthy when a notation is automatically added to the service record for their class. This way, Swarm’s load balancer won’t delegate HTTP requests to new containers until they’re fully functional, reducing instances where requests fail to receive responses and eventually time out.
Totla then went on to show a service being updated to a new version, while its old version is live. Swarm could accomplish this before, but not in the same way. This time, the load balancer will more evenly distribute concurrent requests between the new and old versions of a task, until the old versions are removed from service. Again, the service record is being amended in the background with health check data, making the transition smoother.
According to Docker engineer Dongluo Chen, who joined Totla for the demonstration, any command placed to containerd to declare a container as healthy (or otherwise), pushes that declaration into a network database. That declaration then becomes propagated to all the other nodes in the Swarm cluster by way of a gossip protocol (Docker’s choice being HashiCorp’s Serf).
Guarantees vs. Gossip
As long-time New Stack readers will attest, Apache Mesos and Docker have not been generally considered competitive against one another. Indeed, Mesosphere implements Mesos in the form of Docker containers, and Mesos itself has supported launching Docker container images since version 0.20.0.
But last year, Mesosphere began taking clear steps to distinguish its Data Center Operating System (DC/OS) and the Mesos framework, declaring them part of a kind of next-generation packaging system they call “Container 2.0.”
So as more businesses acquainting themselves with this new stack, to coin a phrase, compare the merits of Docker and Swarm against those of Mesos and the Mesosphere Marathon scheduler, they will at some point investigate the virtues of their respective messaging systems.
Mesos’ messaging architecture is, strange as this may appear in print, unreliable by design. Components such as masters and agents message each other directly. Although there are no guarantees that a recipient will receive messages, at least those it does receive will be in order. This is a tradeoff for the sake of expedience. However, an exception was made early on for task status updates, which do carry a guarantee of being delivered at least once.
The health check system being implemented for Mesos 1.2.0 means that health messages will be exchanged between master and agent, much closer to the components to which these messages refer. It means the scheduler delegates responsibility for managing failed tasks to a lower-order component, which may also mean that any central orchestrator must be comfortable with being more of an overseer than an overlord.
By comparison, Docker’s use of a gossip protocol ensures high availability, and a higher probability that health status messages will be received. Messages may be received out-of-order, but a timestamp called a Lamport Clock helps recipients reassemble them in sequence. The tradeoff there is that message traffic may scale exponentially as an application’s breadth scales linearly.
The best place to test the virtue of diverging architectures is in a competitive market, preferably with tens of thousands of judges who have yet to make up their minds.
Docker and Mesosphere are sponsors of The New Stack.
Feature image: A professional juggler by Usien, licensed under Creative Commons 3.0.