Delivering High-Quality Chef Cookbooks

In the DevOps world, engineers can be much like cowboys in the wild west. And when it comes to testing, they often work to get any fixes out as quickly as possible in order to push their changes into production. As a result, it’s often inevitable that something is broken. This article can offer some practical advice on implementing best practices when it comes to testing Chef Cookbooks.
The Culture of Testing (Warning: May Cause Sudden Changes in Code)

First off, it’s worth noting that I am a big proponent of automated testing as a way to actively develop, maintain and improve an organization’s test framework, particularly for Chef cookbooks, which is a lot of what I do for my own organization at Okta. As a brief background and example here, our business focuses primarily on Identity-as-a-Service (IDaaS), with adaptive multifactor authentication, single sign-on and Universal Directory as some of the main applications. We work with customers to help connect their organization and users through IDaaS applications and organizations as a way to help them be more productive and more secure in their digital transformation process. Therefore, keeping our customers securely connected is a critical piece of our business. If our system goes down, then our customers will no longer be able to authenticate against our system, resulting in downtime and loss in productivity. You can see where testing becomes a critical piece of ensuring that your core products are more reliable and less likely to fail.
Overall, though we may not always be able to prevent all bugs and issues from the changes in the cookbooks, having a strong test culture within your DevOps teams can considerably reduce a lot of failures. As teams are able to test out changes as early as possible in the development lifecycle, the sooner they will be able to catch the bugs and ultimately save money on more simple repairs. Otherwise, if those changes are pushed straight into a production environment and later found to be causing issues, more often than not it will be more expensive to have them fixed later in the lifecycle, and the rollback mechanism could be even more complicated to repair. By having a strong test culture and process in place, you can then establish trust among DevOps teams while also verifying and validating cookbooks. With this in mind, here are three fundamental principles to consider when implementing a testing system for your organization: there is the culture of testing Chef cookbooks, the test process and accompanying tools and finally, the deployment choreography.
When it comes to the culture of testing, the typical DevOps model involved writing the code, deploying it and then releasing the code to production. However, can you imagine if any changes a team were to make to their code was deployed straight into production using this model? Things could easily go awry if these changes are not well-tested. Therefore, the question to ask yourself (in terms level of confidence) is: if my team were to push out their work to production, how confident are they that they are not breaking someone else’s code, and likewise, someone else is not breaking their code? This reminds me of those medication commercials on TV where they list all of the possible side-effects and risks when taking the drug. Is it really worth the risk and possible side effects?
With this in mind, it is recommended that organizations think about a more mature development deployment model that involves peer reviews. Or better yet, having cookbooks tested by a quality assurance team within your DevOps organization. This involves testing against a list of things that they should be asking themselves and cross-referencing to ensure a higher level of confidence to release their changes. This new model includes coding, testing, deploying and monitoring.
Test Process and Accompanying Tools
As a test engineer myself, the five key test processes and accompanying tools that I would recommend are linting, unit testing, cookbooks dependency resolution, cookbooks convergence test and integration testing.
For example, let’s examine ShellCheck and Ruby -c. In most instances, there will be some sort of shell scripts bonded into cookbooks, which is a common practice that not many users actually test or have linted shell scripts. Therefore, by using ShellCheck during the linting test phase, those shell scripts can be linted so that by the time they are deployed into an actual machine, they are not broken or DOA. Also, since Chef is written primarily with Ruby, then Ruby -c can do a syntactic check against Ruby modules. Imagine if any of your Ruby modules that users wrote and shipped are not syntactically checked. The moment they attempt to converge these modules, to run check convergence and have the instance up and running, things could potentially fail. So, the goal here is that by the time these modules are bundled into the Chef cookbooks, they are well linted.
Therefore, a well-defined test process with proper usage of tools can ensure the delivery of high-quality Chef cookbooks.
Deployment Choreography
Now let’s imagine an organization not using any Chef tools at a larger scale yet wish to write their own tool and scripting. Furthermore, if they do not have any safeguards or risk mitigation strategies in their deployment choreography, those changes are applied to 100 percent of all Chef nodes.
This can cause potential system-wide outages, untested extreme edge cases, little to no bake time to verify work and could incur expensive rollback procedures.
Yet with deployment choreography in place — to include Canary, A/B testing or phased rollout — organizations are able to benefit from risk mitigation strategies (risk acceptance, avoidance and limitation), allow for bake time to verify and monitor work, lessen expenses associated with rollback procedures and ultimately ensure higher confidence in deployed work.
Fundamentally it’s all about pushing out the changes to a subset of instances prior to releasing those changes throughout all instances. By pushing out the changes into a subset of instances, teams can then monitor those changes, verify that there are no failures and if there are failures, analyze whether those failures are acceptable ones that will fall under the risk acceptance category. If all checks out, then those changes can then be signed off and released into all instances. This method clearly demonstrates a level of maturity by the team that their changes are well thought out, thereby giving have a higher level of confidence to the overall organization. This level of confidence is particularly important when it comes to deploying changes into a production environment. All in all, always plan and assess risks before deploying changes.
In summary, when it comes to testing Chef cookbooks there are three fundamental principles to keep in mind. First, there is the culture of testing Chef cookbooks that helps ensure trust, at the same time validating and verifying code. Second, having a well-defined test process with proper usage of tools helps ensure the delivery of high-quality Chef cookbooks. And lastly, always plan and assess risks before deploying any changes. Happy testing!
Feature image via Pixabay.