Over the last few months, we significantly altered our Digital Rebar orchestration platform into a microservices architecture without really changing that much of the core code. While the process is ongoing, we’ve reached a point where it’s useful to share our experience.
The results exceeded our expectations. Not only did we have better segmentation between areas of the code; we saw a dramatic reduction in start-up time because we eliminated a lot of unneeded configuration and downloading. On the downside, we had to significantly rewrite our networking and startup sequencing. Running real network services (like DHCP and PXE) behind containers is much more complex.
Our starting point was the OpenCrowbar code base. It was already using Consul for some operations; however, we wanted to fully migrate to a Docker Compose based system so that we could decouple the services we were already using from the Rebar core.
Digital Rebar does cross-platform DevOps orchestration. Basically, it deploys software on clouds and metal. In addition to deploying across multiple cloud platforms, Digital Rebar expanded upon Crowbar’s very deep bare metal provisioning engine. One of the critical design requirements of the system is to intermix DevOps service and configuration steps because many steps in server or cloud provisioning are actually service calls like DNS, DHCP / IP Allocation, PXE, Disk Assignment or Network assignment.
Since service management is required, we already had to deal with services as a central part of the architecture. However, there’s a major leap between hardwired service interfaces and the dynamic environment created with stateless containers. Here is the path we followed:
Step 1: Embrace Service Registration (with Consul)
We added Consul while our application was still monolithic, and then registered our various services in Consul. Gradually, we replaced all the places we looked up those services with references to the registry. Specifically, instead of getting service information from shared configuration; we would rely on published service data. Initially, this felt like using the telephone to talk between rooms of our house but that allowed us to safely gradually increase the separation between services and consumers.
Step 2: Container Coordination (Compose)
This may seem backward, but we started using Compose before we’d done any real containerization. Consul is used to coordinate the launch of a set of interconnected containers. It also provides useful container configuration plumbing like external path mapping and port exposure.
It’s important to note that we’d already containerized the monolithic application so wrapping that into Compose was a small step. If you have not containerized your application into a single container, then start there.
By starting with the single container orchestration, we had a working reference point. It also let us get used to Compose control models. One unexpected benefit — consistent logging. There’s very high return from pumping your logs to the console so they can be easily monitoring from Compose.
Step 3: Containerization (Docker)
After we had the framework, we began pulling apart service into containers. We generally followed the services we’d identified in the first step. In some cases, we kept tightly coupled services together when we broke them from the parent application. Others, like our database, were pretty easy to split out. The third class of containers turned out to be service API wrappers.
The API wrappers are small dedicated web services (usually in Golang) that we added to existing services when we containerized. Those new APIs created an easy abstraction point for code that had previously been called directly. The APIs are essential since the containers can’t call code directly but add minimal overhead since they are a dedicated function for our application.
Step 4: Service Container Registration (Consul Again)
As we learned our internal patterns, our registration process normalized. We were able to create a consistent container initialization pattern that included service registration for each container and also a wait process for containers with dependencies. This, thankfully, meant much less reliance on Compose for sequencing and more use of service registration as our coordination authority. Ultimately, this made the containers much more independent and stateless.
If you find yourself relying on Compose to sequence your bring-up exactly right then review the process and find ways to eliminate or weaken the dependency. Otherwise, your application will be even more fragile than before.
Step 5: Networking (You Get What You Get)
Networking is a challenge in containers because the containers have so little control over it. The key is to never build networking assumptions or static information into your services. Assume that the container management will handle inbound rules and never assume that any address is static. If that sounds like you want service listeners to be dumb and outbound requests to be dynamic then you’ve got the idea.
For Digital Rebar, we had to create a container that routed traffic (many Ops services like PXE and DHCP are picky beasts that care about the source of packets). Hopefully, you’ll be able to avoid that type of mess. Reach out to us if you’re having similar issues.
Step 6: Data Locality (Files, Database and Consul)
Once we had all the services happily containerized, the real challenge began: We needed our containers to become stateless. That means that the containers could not store any internal data or configuration. For containers like the Digital Rebar API server, the choice was simple, use the database. It became much trickier for support services like DNS, DHCP and our Provisioner (which stores PXE images). For each service, we had to evaluate how much data was stored, how frequently it was updated, and if it could tolerate a distributed lock. There was no single pattern or even single answer for each container.
Our rule of thumb goes something like this: Use the application database (via APIs) if it’s normalized application data or managed configuration. Use shared file locations if its big and/or static enough to be replicated slowly via rsync. Use Consul if its limited data size with limited search required. For Consul data, we also evaluate the risk of multi-master synchronization before we commit to it.
Overall, we found that keeping our data storage options open was the right approach.
In short, prepare for refactoring and fragility during migration.
Migrating to services in containers made our platform more robust, easier to troubleshoot and faster to deploy. It also drove a lot of unplanned refactoring that, while ultimately helpful, we would not have planned otherwise. The most frustrating part of the migration was our discoveries about the fragility of container state and persistence leading up to step six.
Overall, Digital Rebar is much stronger and maintainable with architectural changes. We hope that hearing about our journey helps you with yours.
Docker is a sponsor of The New Stack.