New Security Challenges with Infrastructure-as-Code and Immutable Infrastructure
Over the past 10 years, we have seen some dramatic shifts in the way software applications are developed, deployed and managed. The need for rapid innovation and shorter cycles led to the rise of Continuous Integration/Continuous Delivery (CI/CD) and DevOps practices and tools to make delivery faster, more consistent and error-free. Many of the changes came out of the hyperscale environments of Google, Facebook and Amazon that managed massive infrastructure footprints and pushed hundreds of software changes each day. Thanks to the rise of the public cloud and the emergence of DevOps movement, these practices are now widely available to all businesses building applications. Public cloud services have evolved to enable these agile practices, allowing for API-enabled automation and rapid provisioning and de-provisioning of resources.
As the chief technology officer of a cloud security company that provides a SaaS product hosted on AWS, I have witnessed firsthand both our own progress as well as our customers’ journey into DevOps, Infrastructure-as-Code (IaC) and Immutable Infrastructure.
IaC is the concept of deploying and configuring IT infrastructure in an automated way by using machine-readable blueprints rather than manually fiddling with physical hardware or the cloud providers’ UI consoles. The AWS-native format is called Cloud Formation Templates (CFT); Google and Azure provide their own implementations. Terraform by Hashicorp is a multivendor solution that is gaining momentum.
IaC speeds up infrastructure deployment and allows for rapid iteration. It also eliminates one-off non-standard configurations (snowflakes) and builds consistency and repeatability to the infrastructure development/deployment process.
Automation of the infrastructure deployment process increases the importance of security and compliance testing, as with using IaC, with the push of a button, you are able to make highly impactful changes to your cloud environment. This is the flip side of agility. In the public cloud, where simple configuration changes can leave sensitive data and private servers exposed to the world, the security implications of automation are profound.
Another interesting change relates to the persona who drives infrastructure now. In the traditional model, networking teams controlled the networks and the network security devices, application teams developed the software while operations teams deployed and maintained them. Now, where your entire infrastructure is defined in a JSON file(s), the DevOps folks have their hands on the keyboards. This puts the security folks in a weird new place — instead of being a chokepoint, they now try to keep up with the changes. Sometimes retroactively.
On the opportunity side, IaC allows teams to embrace proven software development practices and bring them into the infrastructure world. In this brave new software-defined world, infrastructure templates are checked into source control systems, where every infrastructure change can be audited, tested, reverted and incorporated in Software collaboration/approval workflows.
From a matureness perspective we can see organizations going through several stages:
- Manual security assessment: This approach involves manually inspecting the live infrastructure after deployments, and reviewing the architecture/templates before it is deployed to a live environment. While DevOps teams are experts in creating CFT and Terraform templates, most traditional security teams are not familiar with these technologies. Checking the security posture of these CFTs is a slow, manual process that requires extensive back and forth between ops and security, putting a brake on CI/CD pipelines.
- Continuous monitoring of live environments: In this approach, the live production environment is being continuously monitored for security and compliance violations. This is commonly accomplished using dedicated cloud security services that provide continuous security and compliance testing. While being reactive in nature, this type of monitoring does provide coverage for both automated changes as well as manual/rouge ones.
- Deploying and testing in sandbox environments: The repeatability of IaC is an enabler of this new approach, where the template is first deployed into a temporary environment. Automated (and sometimes manual) tests verify that there were no security and compliance regressions before pushing the changes to a production environment. All of that is orchestrated by an automated CI/CD pipeline which then terminates the temporary environment. While this approach is effective in ensuring that code deployed to production environments meets security and compliance requirements, it can be expensive in terms of time, resources and solution complexity.
- Static (infrastructure) code analysis prior to deployment: In this approach, we treat the templates just like any other SW code and perform security and compliance unit-tests after code commits as part of the standard CI process. This is a very promising new approach that can bring security & compliance as close as possible to the source, cutting drastically the time to detection and the operational complexity of approaches like #1 and #3.
Immutable infrastructure takes IaC to the next level. It is a paradigm for infrastructure deployment and maintenance in which servers are never modified after they are deployed. In order to make any update or fix the software, new servers built from the base image with the modifications are deployed and the old servers are deprovisioned. Immutable infrastructure came out of the need for greater predictability and reliability in infrastructure. By eliminating the need for patching and in-place server upgrades, immutable infrastructure simplifies maintenance by eliminating corner cases and inconsistencies in the deployed server footprint.
Immutable infrastructure has many positive security implications. For example:
- No more one-offs and configuration drift. Ensuring that there are no changes to a live server means we know everything about that server. There are no surprises/unexpected flaws. It also means that the accurate state of that server (OS version, patching level, application version) can be summarized into only one parameter — the image ID of that server. This brings additional value as now it is very easy to reason about an environment without complex inventory management systems (ex: the unpatched servers are all servers with images from two weeks ago or before)
- No need for administrative ports (ssh, rdp) to be kept open on servers — this drastically reduces the possible attack surface of the environment.
On the cons side, the additional agility that IaC and Immutable Infrastructure brings can also create challenges for traditional security systems that were not designed for such dynamic use-cases.
Imagine for example host vulnerability scanning or file integrity systems — it is possible that by the time that they finish scanning an instance, it is already gone and replaced with a new version.
These challenges require a paradigm shift, where instead of scanning of live servers for vulnerabilities, we’ll scan the “master” host as part of the image baking process (same goes for a container). As we know that the server image is clean and that the server is not modified and is recycled in a timely manner, we can guarantee its integrity.
As the pace of changes and requirements for organizational agility just increases, and as DevOps practices like IaC and Immutable Infrastructure are being the new norm, it is crucial that we perform some critical thinking about our existing processes and how to better align security and DevOps.
I hope that this post highlighted some practices that need to be revisited as well as opportunities for new, better-aligned paradigms.