What Google Learned From the Borg About Container Management
Google has been managing Linux containers for more than ten years. In that time, the company has built three different container management systems, Borg, Omega, and its most recent incarnation, Kubernetes, which was released as open source for public use.
In building these systems, Google engineers learned many hard truths about managing container-based architectures. Each set of lessons they learned in building and running one container management system they applied to the next. And now, the engineers have shared some of these lessons in a paper in the Association of Computing Machinery’s flagship Queue magazine.
The article argues that application deployment and monitoring can be dramatically improved by shifting the APIs (application programming interfaces) from being machine-oriented to application-oriented APIs.
Containers pull many details of machines and operating systems away from the application environment. By “decoupling” the OS and the container image, systems see an improvement in deployment reliability and speed of development.
“The key to making this abstraction work is having a hermetic container image that can encapsulate almost all of an application’s dependencies into a package that can be deployed into the container,” the paper noted.
Container API management not only eases and speeds up development, it ties the management metrics like memory and CPU usage to applications. As a result, application monitoring has seen dramatic improvement.
“The design of Kubernetes as a combination of microservices and small control loops is an example of control through choreography — achieving a desired emergent behavior by combining the effects of separate, autonomous entities that collaborate.”
The change to application-oriented architecture means that now application failures are detected and identified more quickly. This distribution makes it easier to built, manage and debug applications.
Google uses Kubernetes for nested containers that are co-scheduled on the same machine. The key to the success of this approach is consistency.
Uniformity in API at its base creates simplicity across the system. Writing generic tools that work across all objects is simpler, which in turn, makes it easier to learn, freeing up engineers to spend more time coding. The decoupling of images and OS creates a container environment where multiple components share a stable look and feel.
“The design of Kubernetes as a combination of microservices and small control loops is an example of control through choreography — achieving a desired emergent behavior by combining the effects of separate, autonomous entities that collaborate,” the engineers state.
They caution against a centralized orchestration system. A stitch in time saves nine, especially when applied to a decentralized system. Although decentralized systems take longer to build, it is more stable over time, especially in regards to unanticipated errors or state changes crop up — and they always do.
What Not To Do
In addition to finding tactics that worked, they also found several that did not. Here are some recommendations from the paper:
Don’t Make the Container System Manage Port Numbers
Borg assigns containers unique port numbers. Google engineers discovered that this led to networking services like DNS had to be replaced by local versions to handle IP:port pairs.
Kubernetes allocates IP addresses per pod, thus aligning the network identity with application identity, allowing greater flexibility in using third-party tools for things like bandwidth throttling and network segmentation.
Don’t Just Number Containers: Give Them Labels
Using labels instead of container numbers gives the container much more flexibility.
“A label is a key/value pair that contains information that helps identify the object. A pod might have the labels role=frontend and stage=production, indicating that this container is serving as a production front-end instance,” the paper stated.
Labels can be used by either automated tools or users, and different teams can manage their own set of labels. In Kubernetes, labels selectors have become the grouping mechanism for managing container operations and across multiple entities.
Be Careful with Ownership
While there are multiple values to using labels for management, including load balancing and debugging, it is important to pay careful attention to configuration choices to prevent conflicts. For example, the article asserted that “pod-lifecycle management components such as replication controllers determine which pods they are responsible for using label selectors, so multiple controllers might think they have jurisdiction over a single pod.”
Don’t Expose Raw State
Kubernetes forces all store accesses through a centralized API server, making it easy to enforce common semantics and management. This hides the raw state data of the store implementation. It also “provides services for object validation, defaulting and versioning,” the article stated.
This brave new world of containers is the wave of the future. Taking a few moments to review these lessons learned can save a lot of trouble in the future.
Feature image via Pixabay.