CI/CD / Cloud Native / DevOps / Tools / Sponsored / Contributed

Cloud Native Masterless Puppet, with Bolt and PuppetDB

10 Sep 2019 8:10am, by

Puppet sponsored this post.

Craig Watson
Craig is a senior systems engineer at ForgeRock, with systems administration experience spanning consultancy, enterprise and start-up organizations, including public cloud environments on Google Cloud and Amazon Web Services at global scale. He has eight years of experience within the Puppet community, and has developed and contributed to a number of Forge modules.

The shift to cloud computing has undeniably changed the way IT thinks about infrastructure. Servers can be treated as disposable, immutable instances with lifecycles measured in minutes, hours or days. By comparison, monoliths can stick around for weeks, months or years before being decommissioned.

For the most part, though, the standard approach to configuration management with Puppet hasn’t changed since the days of racking physical infrastructure. It has simply been transplanted into the cloud, with some additional workarounds to automate tasks such as certificate signing and revocation.

As an IT team supporting a global company developing cutting-edge software across three major cloud providers, our aim is to redeploy our core Atlassian software stack of Jira, Confluence and Bitbucket along with a smorgasbord of other services into the cloud.

This requires a new way of thinking about how we manage configuration, and poses many questions — do we create a Puppet master in each cloud? In each region? Do we run a single master for all clouds? If so, where do we put it?

When combined with some simple orchestration using Puppet Bolt and a dynamic PuppetDB inventory, masterless Puppet side steps these issues altogether, allowing cross-cloud compatibility, near-limitless scalability and much-reduced complexity.

A Master of Puppets: Somewhere Back in Time

When managing physical infrastructure, commissioning a server is generally an infrequent, long-winded process. With Puppet, a new certificate request is generated by the client and signed manually via SSH on the master. Puppet is then run manually on the new server.

Without conscious planning, monitoring and maintenance; masters can become overloaded, and in the event of a Puppet master outage, new servers cannot provision and existing servers cannot be updated with the new configuration.

Project Planning

We decided to set high-level goals and refine the project as time progressed. The key goal was and continues to be:

Don’t future-proof — make the future possible.

This focussed us on making sure that our solution was appropriate for the future, without getting distracted in planning for every eventuality. Around that core goal, more general requirements were that the solution be:

Simple — To increase engineering efficiency, the new infrastructure should be as simple as possible while conforming to the other requirements below.

Scalable — We required the infrastructure to scale with the business, from a few to hundreds of instances, without periodic ground-up re-platforming projects.

Cloud native — As we were performing a migration of services into the public cloud, we should take advantage of as much cloud functionality as possible — where physical environments are needed, we should treat them as cloud environments.

Testable — Using testing frameworks such as Vagrant, rspec-puppet and puppet-lint, we should be able to catch any bugs without committing to version control. Once a commit has been pushed, it can be independently verified using automated continuous integration testing pipelines.

Cloud-agnostic — Perhaps the most challenging of all, this was to ensure that the project as a whole remained flexible enough to be deployed onto Amazon Web Services or Google Cloud Platform, depending on business needs.

Why Cloud Native?

The Cloud Native Computing Foundation defines the term as:

[Cloud native] techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.

Something that is often overlooked in cloud migrations and solution design is the platform’s ephemeral nature. When designing a cloud native solution to add and remove compute resources on-demand (aka auto-scaling), servers can theoretically live for minutes before being destroyed.

In this scenario, adding overhead for certificate signing on top of the time taken to run Puppet and configure the system greatly reduces the effectiveness of auto-scaling — where servers are typically expected to be serving traffic within two or three minutes of being launched.

To be truly cloud native, an application should be able to tolerate ungraceful failure in a graceful, stateless way.

Masterless Puppet: A Brave New World

In a masterless deployment, rather than deploying Puppet code to a central master, the codebase is deployed to servers themselves, allowing a single instance to provision in complete isolation of any other.

Masterless Puppet also throws up some interesting problems. Without a centralized master server controlling which server gets what configuration:

  • How do you define which role a server receives?
  • What mechanism is used to store Puppet code?
  • How are changes rolled out?

Instance Metadata: The Cloud ENC

Using the Puppet “roles and profiles” pattern, we can easily modularize our configuration into generic profiles, which are pulled together into a single role. For example, the “web” role may include the Nginx and PHP-FPM profiles, as well as more generic base or security profiles.

Cloud instances have access to a variety of metadata, which we can harness to build a Hiera data structure to act as an External Node Classifier (ENC) within the Puppet codebase.

As cloud platforms use different Facter schemas for metadata, our implementation uses a bootstrap script to store each piece of metadata in a static text file in /etc/facter/facts.d, allowing each variable to use a consistent name — as long as the script outputs the metadata to the correct location, our Hiera codebase is fully cloud-agnostic.

Where variations exist between cloud providers (for example, GCE and EC2 cluster discovery mechanisms for Elasticsearch), we can add a Hiera layer using the virtual fact.

Code Distribution with OS Repositories

To distribute our Puppet codebase, we harness our cloud providers’ object storage — namely Amazon S3 and Google Cloud Storage. These systems are provided as a service, and have no operational overhead for us to manage, we simply upload our files and access them.

As we are using CentOS, we have used a Yum plug-in to enable instances on both AWS and GCP to address these repositories natively using gs:// and s3:// URIs rather than https:// — a transport for the APT package manager also exist for Debian-based systems.

The job of packaging the Puppet codebase has been delegated to Jenkins, which also acts as our continuous integration platform alongside Gitlab CI runners. After running tests via rspec-puppet and puppet-lint, Jenkins uses fpm to compile the codebase into an RPM package, which is then synchronized to GCS and S3 Yum repositories.

Puppet Orchestration with Bolt

The task of triggering a Yum update or executing the Puppet binary on-demand falls to a relatively new tool in Puppet’s arsenal: Bolt. Bolt connects to remote systems via SSH, allowing ad-hoc tasks to be run in a clean, automatable and shareable way.

The tool can read “plans” — documents which describe actions, for example upgrading an OS package or restarting a system service, and as the plan can be written in Puppet code, it enjoys the same features that Puppet itself provides — full data typing, rich logic and of course the Puppet community, as many modules on the Puppet Forge are adding Bolt tasks for the resources they manage.

PuppetDB: Bolt’s Dynamic Inventory

While Bolt connects to machines via SSH, it requires an inventory to be defined in the configuration. While this works for static infrastructure, in a cloud environment we require a dynamic, queryable data source in order to determine where Bolt should connect to.

For this, we use PuppetDB. As its name suggests, PuppetDB is the core underlying database backing a Puppet master. It holds several pieces of data on every node, but most importantly it records each node’s facts as it checks into the master.

It is possible (and in high-traffic mastered Puppet installations, strongly advisable) to set PuppetDB up as an independent application, using a PostgreSQL database backend, and as the application is stateless, it lends itself extremely well to deploying into a cloud environment as a standard multitier web application.

Crucially for Bolt, PuppetDB provides a rich query language — PQL — to allow data to be returned, for example, the database can be searched for nodes containing the httpd package, or for nodes where the operating system fact is equal to CentOS.

For our use case, instead of using Bolt’s native package management module, we wrap the Bolt command utility with a Bash script. This makes it easier to pass explicit commands such as yum update or puppet apply, as well as wrapping the PQL query with user-friendly switches to query for specific machine roles and environments, as well as limiting the returned node-set to just one machine for canary dry-run testing.

In a future project iteration, this will be replaced with a fully parameterized Bolt plan manifest using Puppet language, but the current Bash wrapper allows us to get the project rolling.

Connecting It All Together, Part One: Puppet and PuppetDB

Enabling a masterless Puppet node to send facts, compiled catalogs and post-run reports to a standalone PuppetDB installation requires a little Puppet configuration.

First, we use Puppet’s routes.yaml file to tell the Puppet binary where to send its resources. The following file is the result of a fair amount of trial and error, mainly due to masterless Puppet being less readily documented. Also, all code snippets to follow are verified to work with Puppet 6.4.2 and PuppetDB terminus 6.4.0:

Puppet also needs to be configured to send reports to PuppetDB by editing puppet.conf. This is entirely optional, however there are various dashboards available for PuppetDB, allowing a top-down UI view of your nodes Puppet state:

Finally, Puppet can be configured with the location of the PuppetDB server with puppetdb.conf. As with the routes.yaml file, the below is the result of a little experimentation:

The above example uses variables within a Bash script to pull in the server’s location from metadata. The submit_only_server_urls and soft_write_failure options have been used to ensure that if the PuppetDB server is unavailable, Puppet will still continue to operate.

Connecting It All Together, Part Two: Bolt and PuppetDB

Setting up Bolt to connect to PuppetDB is a simple case of adding some settings in Bolt’s bolt.yaml configuration file:

This file is deployed via Puppet into the Jenkins user’s ~/.puppetlabs/bolt directory, and uses Puppet’s EPP templating syntax to expose the same metadata as the bootstrap script.

As communication with PuppetDB is non-negotiably HTTPS, we expose our PuppetDB instances via a public (but whitelisted) load balancer, which can handle SSL termination.

To remove the need to manage and circulate a self-signed certificate, we use a managed SSL certificate from each cloud provider which is signed by the Let’s Encrypt certificate authority. The CA is included within most operating systems’ trust stores, so we simply point Bolt to the local CA bundle.

Conclusion: Full Pipeline and Final Thoughts

This project has successfully enabled ForgeRock’s IT infrastructure team to drastically reduce the lead-time to onboard new systems and streamline our configuration management processes, and in the long-term increase team throughput and decrease toil.

We have implemented a solution that saves engineering overhead and allows for near-horizontal scalability without any major design alterations while being both cloud native, allowing us to harness the rich metadata available to us as well as leveraging services from cloud providers, and cloud-agnostic, allowing for portability without vendor lock-in.

Utilizing Bolt and PuppetDB, we can deploy our changes at scale, to one instance or many, and we are able to fully test our codebase both locally with Vagrant and continuously via automated pipelines.

Although the project has been largely successful, there has been a fair amount of trial and error involved in getting the configuration right, in particular connecting a standalone, masterless Puppet node to PuppetDB.

We still see a number of warnings in Puppet output as a few components of the Puppet binary’s reporting and caching layers are still attempting to connect to a master, however these haven’t affected functionality or performance.

It can also be argued that PuppetDB itself is now our single point of failure, as an outage of the PuppetDB stack would stop Bolt from being able to populate its inventory. However this would only affect changes to existing servers — any new instance will still be able to provision itself correctly without contacting PuppetDB.

As for Bolt, it’s less than ideal having to provision a user and SSH key to allow remote commands to be run — having been a heavy user of Bolt’s quasi-predecessor Mcollective, the ability to use an RPC interface would be an exceptionally welcome addition — though Bolt itself is still young and continues to undergo significant development.

If you have questions on anything covered — or anything not covered! — above, please reach out to me on Twitter and check out my talk at Puppetize PDX 2019, where you’ll see all of this in action — either in-person or via the video recordings which will be made available by Puppet soon after the event.

Puppetize PDX takes place in Portland, Ore. on Oct. 9-10. This is a two-day, multi-track conference that focuses on the broader community of Puppet users, featuring user-focused DevOps and infrastructure delivery talks and hands-on workshops.

Puppetize is a fantastic event, and although I am a first-time speaker in Portland, I am a serial alumni, having attended previous events in 2014, 2016 and 2018 — the Puppet community is an inspiring place to be, and I’m also excited to meet community members and hear war stories on cloud adoption with Puppet, including what didn’t work.

CNCF is a sponsor of The New Stack.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.