Ricon15: How Mesos Changes Cluster Management
All distributed workloads were not created the same.
On Friday at Basho’s Ricon15 conference last week in San Francisco, Mesos creator Ben Hindman talked about how the workloads of distributed systems are anything but uniform. The Map reduce part of a Hadoop job has a far different greater set of requirements than the Reduce portion. Yet, such are scheduled as if they required the exact same amount of resources.
Mesos was created, Hindman explained, to make resource assignment much more dynamic — the program itself should have a little dialogue with the infrastructure, requesting what it needs at any given time. In turn, the infrastructure should be able to take back any resources for higher-priority jobs, letting the app know ahead of time. It’s all about communication between the abstractions.
Mesos was meant to share “a data center’s resources between distributed systems,” Hindman said. If a server goes down, a data center experiences outages or weekly maintenance runs late, there can be trouble when the remaining resources are spread too thinly to handle the loads placed on them. Clusters failing or operating inefficiently can mean wasted time along with wasted resources.
With a traditional cluster management software, the user had to write a specification beforehand that maps out all the system resources that will be required for the job. This is a rather coarse-grained approach to resource allocation, Hindman said, referring to the process as “static partitioning.”
“Static partitioning was the status quo in the data center,” Hindman said. Hindman could tell this approach was inefficient by viewing the resource utilization of a number of different workloads across a data center. At any given time, a large number of resources were going unused.
“We could do better by dynamically moving resources back and forth between things,” Hindman said. “You could run jobs faster and you could get higher utilization across your entire data center.”
Mesos has made this approach more dynamic by offering programs the ability to allocate their resources as they see fit for their particular needs. Mesos offers the ability for programs to change their resource needs on the fly, meaning that a service can be scaled up or scaled down as needed. This communication works two ways; Mesos can snatch resources away from a program as well if they are needed for a higher-priority job.
Hindman called this approach “multi-level scheduling.” The distributed application is asking for resources from the underlying infrastructure, through Mesos, and then deciding how to use these resources to best run its workload.
“Multi-level scheduling provides a more robust example of cluster management, allocating system resources based on what is available at the current moment a request is called up,” Hindman said. Distributed cells then use this allocation to decide which server resources will need to be run, with Mesos scheduling tasks as assigned.
Mesos allows for a data center to operate on a multi-tiered level to break up resources where they are most needed at any given moment, utilizing the resources available to their fullest extent.
Mesos is very similar to a kernel for an operating system. Much like the Linux kernel parses out CPU resources to applications, so Mesos parses out data center resources to distributed application. To extend this analogy, Hindman’s Mesosphere, still in development, can be seen as an entire operating system for the data center, one using Mesos as its kernel. The idea with Mesosphere is to provide additional functionality to running Mesos, with additional frameworks and applications. Marathon, for instance, can be seen as a framework for running containers on a Mesos-based infrastructure.
“Wouldn’t it be really cool if you could build a distributed system in your organization, and some other organization could just run that distributed system on [its infrastructure] just as easily as you could build a Linux application and have that application run on a different machine?” Hindman said.
Mesos and Mesosphere represent an opportunity to manage clusters within a data center fluidly, with the ability to control, modify, and adapt resource management to suit one’s particular needs. Mesos has partnered with a number of cloud-based, VM, and container solutions across the software development ecosphere to center itself as a crucial piece in data center management at scale.
Basho is a sponsor of The New Stack.