How Our Bare Metal Cloud Keeps up with All the New OS Releases
When you run a bare metal cloud service, you spend a lot of time ensuring your hardware supports all the popular operating systems in their latest versions. Each new OS release has to be validated on each server configuration in your fleet, and each OS that’s already on the list has to be tested on each new server config you add.
New OS releases come out all the time, so keeping the list of operating systems validated on our product Equinix Metal up to date can easily eat up a lot of engineering hours. Still, this is a crucial capability for our business, so until recently, we would dedicate resources to manually add each new operating system. A few months ago, however, we replaced that time-consuming and repetitive manual process with an automated CI pipeline we created using Buildkite. We dubbed it, informally, “Bob the Builder.” Bob has already saved us tons of engineering hours — hours that can be spent doing creative, more impactful work — and I’m here to share our experience.
For perspective, validating a new OS on Equinix Metal using the old process could take months. A simple update, say, from Ubuntu 18.04 to 20.04, was usually a weeks-long project. The problem was twofold. First, we didn’t have a simple and automated CI pipeline for adding packages to OS images, customizing the configurations or running the builds. We did all that manually.
Second, each new image needed to be deployed and tested on multiple hardware platforms to ensure that it worked on each server configuration available in our bare-metal cloud. We would verify that each image worked as required by manually installing it to various servers, then poking around to validate functionality.
Not having a CI pipeline for OS images and having to test manually were the main reasons it took so long to prepare each new OS image. The process got the job done, but needless to say, it wasn’t exactly an ideal way to spend valuable engineering resources.
Bob the Builder Takes OS Pipelines to the Next Level
Bob the Builder addresses exactly those two issues. Buildkite is a platform for running flexible and scalable CI pipelines. Using Buildkite automates our OS build and testing pipelines, dramatically speeding things up and freeing up our engineers.
Here’s how the new process works:
- Find an upstream cloud image for whichever OS we want to add. For instance, if we want an Ubuntu image, we pull it from the sources Canonical offers here.
- Configure a YAML template that defines how the OS needs to be customized to meet Metal’s requirements. We can specify which packages to add to the image, which networking configurations to apply, and so on. We also identify which hardware plans the image should support.
- Automation takes over. We set up a pipeline in Buildkite to generate a custom OS image based on the YAML configuration. Bob the Builder handles image modifications using virt-customize to modify the OS image, pushes it to an S3 bucket, deploys it to the server configurations we want to support it, and runs all the tests necessary to validate the new image.
- If it passes all the tests, the new image is pushed automatically to customers, who can then deploy it to Metal server configs.
Buildkite: Bob’s Trusty Sidekick
Bob the Builder can run on a laptop and trigger one-off builds — indeed, being a Go-based CLI tool that can run basically anywhere is one of Bob’s benefits — but we mostly use it as part of Buildkite pipelines.
We chose Buildkite to orchestrate OS image pipelines because of its several features that address our needs particularly well.
One is that it supports dynamic pipelines. Instead of defining a pipeline as a single set of steps and then running it, Buildkite allows us to set up conditions and stages. This means we can reuse the same pipeline for multiple image builds, which beats having to create a separate pipeline for each type of image.
It also lets us collect user input at any point in the pipeline. This makes our pipelines interactive, allowing for a lot of flexibility and control over complex OS build processes.
Most important of all is that it gives us total control over where builds happen. We can run them inside a container or directly on the hardware of our choosing. That’s a big deal for Metal because we often need to run builds on specific types of hardware, like a bare-metal Arm server, for instance.
In the end, we can easily publish an image across the various x86 and Arm server configurations that we want to support without having to set up a different pipeline for each. That’s not something your average CI server can manage.
Of course, Buildkite’s advanced feature set came with a learning curve. We didn’t get our first Bob the Builder-powered pipeline up and running in a day. It was a months-long process, but once we learned to pair Buildkite and Bob the Builder, rolling out a new OS image became a breeze.