Managing Containers Across Distributed Resources
This is a chapter from our ebook: The Docker and Container Ecosystem. It’s one of several chapters from the ebook that we will post here over the next several weeks.
Container Solutions is a consulting group out of Amsterdam. They frame the current discussions about programmable infrastructure in terms of machines.
Virtual machines can require large amounts of resources, especially when developing on older machines. Running hypervisors can lead to stress on a system if resources are not allocated correctly, or without a user specifying that a certain amount should be used to compile logs or complete other tasks as demanded by the project in question.
Efficiency is a primary issue with scaling clusters. Application latency is one outcome of high intensity workloads that do not scale appropriately. Resources with inefficient architectures have a high cost when applying traditional solutions to the problem. This might mean adding more storage and networking boxes, which do not have the elasticity that more modern architectures offer.
Phil Winder for Container Solutions writes that scaling in distributed systems depends on the service that is getting scaled. A stateless app can scale effectively. It needs compute resources and a record in a load balancer. A database, on the other hand, is much more difficult to scale horizontally. Winder writes that the “application must decide what has responsibility for synchronization of the data; is that the database’s responsibility, or some function of a distributed file system?”
There are a host of other factors that come into building out distributed architectures. And not surprisingly, there are all sorts of orchestration environments to consider.
Orchestration aids in running apps across multiple containers, instead of just one. In The New Stack’s survey of container-related vendors, we asked about orchestration, including it in a category with scheduling, management and monitoring tools. In this context, we found that almost two-thirds of the vendors had offerings in this area for the second quarter of 2015. While 71 percent also plan orchestration-related offerings in the future, several indicated they were working on revising their product or developing a partnership. Later in the survey, companies also told us how they are managing containers internally. There was no dominant tool, with “no product” being cited by more than four percent of respondents. The New Stack Container Directory has 55 products and projects associated with orchestration management, with half of those representing an open source effort.
A Brief History of Automated Resources
In 2006, Amazon launched Amazon Web Services (AWS), establishing itself as a pioneer in offering virtual resources. AWS created a sudden availability of almost unending compute ability, and customers were billed only for the resources they used. Almost overnight, apps could be deployed without the worry of getting a data bill with a comma. It was relatively simple and the costs were less — two core factors that led to behavioral changes and market shifts.
AWS made a powerful argument for moving away from the traditional models of IT. Amazon could operate at levels of efficiency that even the most well-managed IT shop would be hard-pressed to match.
The next evolution involved some new thinking about how to get the most of unlimited resources in a self-serve manner, fully automated with resources more oriented for developers and their application requirements.
This concept of programmable infrastructure is part of a much bigger trend that Google calls “warehouse-scale computing,” a term that came from a Google paper in 2009, which is at the core of why Google is so interested in containers.
Here are some examples of components that reflect warehouse-scale computing, as described by CoreOS CEO Alex Polvi in an interview earlier this year:
- Commodity hardware and underlying compute resources.
- Commodity switching and networking fabric.
- Application packaging (containers).
- Resource scheduling and orchestration.
- Linux for running orchestration and packaging on commodity hardware.
What is actually different this time is that we are building systems that have intelligent software and simple hardware — what Google is known for, Polvi said. It means more compute resources can be added to get more capacity in applications. It means any individual server is meaningless. We will think about everything in terms of applications, not individual servers.
“Dynamic scheduling” is a key component of Google’s definition of “cloud-native computing,” as Google’s Craig McLuckie described it on The New Stack.
The line-of-business manager should be able to run an application on some on-demand infrastructure without the help of the system administrator. The manager shouldn’t have to worry about servers or any other physical infrastructure. Instead, they should think about deployment only in terms of logical computing resources.There is a sea of compute available where any job can be scheduled to run.
“It turns out there are some things that computers do better than people,” McLuckie said. “One of those things is really thinking about, in real time, where your application should be deployed, how many resources your application should have access to, whether your application is healthy or unhealthy, whether some level of remediation needs to happen.”
“By moving away from a world where it’s an operator-driven paradigm — where you’re creating these static, dead things — to a world where your application is alive, and being actively managed, and dynamically, reactively, observed and watched by a very smart system … [that] changes the game,” McLuckie said.
In other words, software algorithms can schedule and allocate the jobs against the available resources with such efficiency, that doing so would lead to significant cost savings. No more calling the IT department to requisition a server that may show up six weeks later. It also makes the enterprise much more agile by lending it the ability to quickly spin up new applications, move resources to optimal usage and stay ahead of competitors.
So how do you build a smart system from a data center filled with dumb servers? This is where tools like Google Kubernetes and open source Apache Mesos data center operating system come in. Also of note is Docker’s platform, using its Machine, Swarm and Compose tools.
Google developed Kubernetes for managing large numbers of containers. Instead of assigning each container to a host machine, Kubernetes groups containers into pods. For instance, a multi-tier application, with a database in one container and the application logic in another container, can be grouped into a single pod. The administrator only needs to move a single pod from one compute resource to another, rather than worrying about dozens of individual containers.
Google itself has made the process even easier on its own Google Cloud Service. It offers a production-ready version of Kubernetes called the Google Container Engine. The cost is crazy inexpensive as well: managing the first five clusters is free, and the service costs a miniscule 15¢ per hour for each cluster after the fifth one.
Between pods, labels and services, Kubernetes offers different way to interact with clusters:
- Pods are small groups of Docker containers, able to be maintained within Kubernetes. They are easily deployable, resulting in less downtime when testing a build or QA debugging.
- Labels are exactly as they sound, used to organize groups of objects determined by their key-value pairs.
- Services are used for load balancing, providing a centralized name and address for a set of pods.
- Clusters on Kubernetes eliminate the need for developers to worry about physical machines; the clusters act as lightweight VMs in their own right, each capable of handling tasks which require scalability.
Apache Mesos is a cluster manager that can help the administrator schedule workloads on a cluster of servers. Mesos excels at handling very large workloads, such as an implementation of the Spark or Hadoop data processing platforms.
Mesos had its own container image format and runtime built similarly to Docker. The project started by building the orchestration first, with the container being the side effect of needing something to actually package and contain an application. Applications were packaged in this format to be able to be run by Mesos.
Mesos is supported by Mesosphere, which offers the Mesosphere Datacenter Operating System (DCOS). As the name implies, DCOS promises the ability to pool resources and then dynamically schedule jobs against them, as if all the servers worked together as a single entity.
Mesos is an open source software, originally developed at the University of California at Berkeley. It sits between the application layer and the operating system and makes it easier to deploy and manage applications in large-scale clustered environments. It can run many applications on a dynamically shared pool of nodes. Prominent users of Mesos include Twitter, Airbnb, Netflix, PayPal, SquareSpace, Uber and more.
The distributed systems kernel was born out of UC Berkeley’s AMPLab about five years ago. Benjamin Hindman was a PhD student at Berkeley at the time; he went on to work at Twitter for four years before joining Mesosphere. He is one of the original creators of the Apache Mesos project.
For years, an IT executive would build out the hardware and then run the software across it, treating all those servers as pets. In today’s world, it’s the applications that come first. The hardware gets abstracted, treated more like cattle than anything else. This pets-versus-cattle comparison is used often these days to explain how modern data centers must be treated in today’s application-centric and data-intensive world. The bottom line is that apps come first.
The data center is a virtual corral. Resources are pooled and apps are launched in much the same way as operating systems work on computers. Mesos has the capability to scale the number and size of apps as well as the compute, storage and other resources that are needed for different types of workloads. Its core is in the kernel, which performs the main functions, such as allocating resources to apps.
Docker Machine, Docker Swarm and Docker Compose are designed to work as an orchestration system. Docker also works closely with the Mesos community.
According to Docker, Docker Machine enables one-command automation to provision a host infrastructure and install Docker Engine. Before Docker Machine, a developer would need to log into the host and follow installation and configuration instructions specifically for that host and its OS. With Docker Machine, whether provisioning the Docker daemon on a new laptop, on virtual machines in the data center or on a public cloud instance, you only need a single command.
The pluggable backend of Docker Machine allows users to take full advantage of ecosystem partners providing Docker-ready infrastructure, while still accessing everything through the same interface. This driver API works for provisioning Docker on a local machine, on a virtual machine in the data center, or on a public cloud instance. In its current alpha release, Docker Machine ships with drivers for provisioning Docker locally with Virtualbox, as well as remotely on DigitalOcean instances; more drivers are in the works for AWS, Azure, VMware and other infrastructures.
Docker Swarm is a clustering and scheduling tool that automatically optimizes a distributed application’s infrastructure based on the application’s lifecycle stage, container usage and performance needs.
Swarm has multiple models for determining scheduling, including understanding how specific containers will have specific resource requirements. Working with a scheduling algorithm, Swarm determines which engine and host it should be running on. The core aspect of Swarm is that as you go to multi-host, distributed applications, the developer wants to maintain the experience and portability. For example, it needs the ability to use a specific cluster solution for an application you are working with. This would ensure cluster capabilities are portable all the way from the laptop to the production environment.
The Swarm API is for ecosystem partners to create alternative or additional orchestration tools that override Docker’s Swarm optimization algorithm for something more nuanced to particular use cases.
This is what Docker has been calling their “batteries-included-but-swappable” approach. Some users may be comfortable with using Docker Swarm to identify optimized clustering of a multi-container, distributed application’s architecture. Others will want to use the clustering and scheduling part of Swarm to set their own parameters, while still others will look to an ecosystem partner’s alternative orchestration optimization product to recommend the best cluster mix.
Multi-container applications running on Swarm can also be built using Docker’s Compose tool. The Compose tool uses a declarative YAML file to maintain a logical definition of all application containers and the links between them. Compose-built distributed applications can then be dynamically updated without impacting other services in the orchestration chain.
Docker Compose enables orchestration across multiple containers. Database, web and load balance containers, for example, can all be assembled into a distributed application across multiple hosts. The orchestration is composed by expressing container dependencies in a YAML file and, again, managing via the Docker user interface.
Orchestration is still a topic that few people know little about, but it will be crucial for companies building microservices environments. There are questions to consider about virtualized infrastructure and how to deal with issues, such as stateless and stateful services. There are the schedulers, the service discovery engines and other components that make up these new kinds of management systems. What these orchestration platforms do more specifically will be questions we answer later in the ebook series.
Docker, IBM, and VMware are sponsors of The New Stack.