What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Infrastructure as Code / Operations / Platform Engineering

The Pillars of Platform Engineering: Part 3 — Provisioning

Give platform teams workflows and checklists for building provisioning into their platforms.
Sep 22nd, 2023 6:22am by
Featued image for: The Pillars of Platform Engineering: Part 3 — Provisioning

This guide outlines the workflows and checklists for the six primary technical areas of developer experience in platform engineering. Published in six parts, part one introduced the series and focused on security. Part three will address infrastructure provisioning. The other parts of the guide are listed below, and you can download the full PDF version for the complete set of guidance, outlines and checklists.

  1.   Security (includes introduction)
  2.   Pipeline (VCS, CI/CD)
  3.   Provisioning
  4.   Connectivity
  5.   Orchestration
  6.   Observability (includes conclusion and next steps)

In the first two pillars, a platform team provides self-service VCS and CI/CD pipeline workflows with security workflows baked in to act as guardrails from the outset. These are the first steps for software delivery. Now that you have application code to run, where will you run it?

Every IT organization needs an infrastructure plan at the foundation of its applications, and platform teams need to treat that plan as the foundation of their initiatives. Their first goal is to eliminate ticket-driven workflows for infrastructure provisioning, which aren’t scalable in modern IT environments. Platform teams typically achieve this goal by providing a standardized shared infrastructure provisioning service with curated self-service workflows, tools and templates for developers. Then they connect those workflows with the workflows of the first two pillars.

Building an effective modern infrastructure platform hinges on the adoption of Infrastructure as Code. When infrastructure configurations and automations are codified, even the most complex provisioning scenarios can be automated. The infrastructure code can then be version controlled for easy auditing, iteration and collaboration. There are a few solutions for adopting Infrastructure as Code, but the most common is Terraform: a provisioning solution that is more widely used than competing tools by a wide margin.

Terraform is the most popular choice for organizations adopting Infrastructure as Code because of its large integration ecosystem. This ecosystem helps platform engineers meet the final major requirement for a provisioning platform: extensibility. An extensive plugin ecosystem allows platform engineers to quickly adopt new technologies and services that developers want to deploy, without having to write custom code.

Provisioning: Modules and Images

Building standardized infrastructure workflows require platform teams to break down their infrastructure into reusable, and ideally immutable, components. Immutable infrastructure is a common standard among modern IT that reduces complexity and simplifies troubleshooting while also improving reliability and security.

Immutability means deleting and re-provisioning infrastructure for all changes, which minimizes server patching and configuration changes, helping to ensure that every service iteration initiates a new tested and up-to-date instance. It also forces runbook validation and promotes regular testing of failover and canary deployment exercises. Many organizations put immutability into practice by using Terraform, or another provisioning tool, to build and rebuild large swaths of infrastructure by modifying configuration code. Some also build golden image pipelines, which focus on building and continuous deployment of repeatable machine images that are tested and confirmed for security and policy compliance (golden images).

Along with machine images, modern IT organizations are modularizing their infrastructure code to compose commonly used components into reusable modules. This is important because a core principle of software development is the concept of not “reinventing the wheel,” and it applies to infrastructure code as well. Modules create lightweight abstractions to describe infrastructure in terms of architectural principles, rather than discrete objects. They are typically managed through version control and interact with third-party systems, such as a service catalog or testing framework.

High-performing IT teams bring together golden image pipelines and their own registry of modules for developers to use when building infrastructure for their applications. With little knowledge required about the inner workings of this infrastructure and its setup, developers can use infrastructure modules and golden image pipelines in a repeatable, scalable and predictable workflow that has security and company best practices built in on the first deployment.

Workflow: Provisioning Modules and Images

A typical provisioning workflow will follow these six steps:

  1. Code: A developer commits code and submits a task to the pipeline.
  2. Validate: The CI/CD platform submits a request to your IdP for validation (AuthN and AuthZ).
  3. IdP response: If successful, the pipeline triggers tasks (e.g., test, build, deploy).
  4. Request: CI/CD-automated workflow to build modules, artifacts, images and/or other infrastructure components.
  5. Response: The response (success/failure and metadata) is passed to the CI/CD platform.
  6. Output: The infrastructure components such as modules, artifacts and image configurations are deployed or stored.

Module- and image-provisioning flow

Provisioning: Policy as Code

Agile development practices have shifted the focus of infrastructure provisioning from an operations problem to an application-delivery expectation. Infrastructure provisioning is now a gating factor for business success. Its value is aligned around driving organizational strategy and the customer mission, not purely based on controlling operational expenditures.

In shifting to an application-delivery expectation, we need to shift workflows and processes.  Historically, operations personnel applied workflows and complaints to the provisioning process by leveraging tickets. These tickets usually involved validating access, approvals, security, costs, etc. The whole process was also audited for compliance and control practices.

This process now must change to enable developers and other platform end users to provision via a self-service workflow. This means that a new set of codified security controls and guardrails must be implemented to satisfy compliance and control practices.

Within cloud native systems, these controls are implemented via policy as code. Policy as code is a practice that uses programmable rules and conditions for software and infrastructure deployment that codify best practices, compliance requirements, security rules and cost controls.

Some tools and systems include their own policy system, but there are also higher-level policy engines that integrate with multiple systems. The fundamental requirement is that these policy systems can be managed as code and will provide evaluations, controls, automation and feedback loops to humans and systems within the workflows.

Implementing policy as code helps shift workflows “left” by providing feedback to users earlier in the provisioning process and enabling them to make better decisions faster. But before they can be used, these policies need to be written. Platform teams should own the policy-as-code practice, working with security, compliance, audit and infrastructure teams to ensure that policies are mapped properly to risks and controls.

Workflow: Policy as Code

Implementing policy-as-code checks in an infrastructure-provisioning workflow typically involves five steps:

  1. Code: The developer commits code and submits a task to the pipeline.
  2. Validate: The CI/CD platform submits a request to your IdP for validation (AuthN and AuthZ).
  3. IdP response: If successful, the pipeline triggers tasks (e.g., test, build, deploy).
  4. Request: The provisioner runs the planned change through a policy engine and the request is either allowed to go through (sometimes with warnings) or rejected if the code doesn’t pass policy tests.
  5. Response: A metadata response packet is sent to CI/CD and to external systems from there, such as security scanning or integration testing.

Provisioning flow with policy as code

Provisioning Requirements Checklist

Successful self-service provisioning of infrastructure requires:

  • A consolidated control and data plane for end-to-end automation
  • Automated configuration (infrastructure as code, runbooks)
  • Predefined and fully configurable workflows
  • Native integrations with VCS and CI/CD tools
  • Support for a variety of container and virtual machine images required by the business
  • Multiple interfaces for different personas and workflows (GUI, API, CLI, SDK)
  • Use of a widely adopted Infrastructure-as-Code language — declarative language strongly recommended
  • Compatibility with industry-standard testing and security frameworks, data management (encryption) and secrets management tools
  • Integration with common workflow components such as notification tooling and webhooks
  • Support for codified guardrails, including:
    • Policy as code: Built-in policy-as-code engine with extensible integrations
    • RBAC: Granularly scoped permissions to implement the principle of least privilege
    • Token-based access credentials to authenticate automated workflows
    • Prescribed usage of organizationally approved patterns and modules
  • Integration with trusted identity providers with single sign on and RBAC
  • Maintenance of resource provisioning metadata (state, images, resources, etc.):
    • Controlled via deny-by-default RBAC
    • Encrypted
    • Accessible to humans and/or machines via programmable interfaces
    • Stored with logical isolation maintained via traceable configuration
  • Scalability across large distributed teams
  • Support for both public and private modules
  • Full audit logging and log-streaming capabilities
  • Financial operations (FinOps) workflows to enforce cost-based policies and optimization
  • Well-defined documentation and developer enablement
  • Enterprise support based on an SLA (e.g., 24/7/365)

Stay tuned for our post on the fourth pillar of platform engineering: connectivity. Or download the full PDF version of The 6 Pillars of Platform Engineering for the complete set of guidance, outlines and checklists.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.