Build Your IDP at Light Speed with a Platform Reference Architecture
Platform engineering is gaining traction among top-performing software engineering organizations, and for good reason. Internal developer platforms (IDPs) can boost developer productivity, remove Ops bottlenecks, minimize developer cognitive load and enforce standardization by design. But the best news? There’s now a much simpler way to build an effective enterprise-grade IDP that’s creating a major stir in the platform engineering community.
While every platform looks different, common patterns emerge among effective setups. During their PlatformCon 2023 talk, Stephan Schneider, digital expert associate partner, and senior DevOps engineer Mike Gatto from McKinsey shared how they’d synthesized real-world platform designs of hundreds of organizations into standard patterns.
These patterns have formed the basis of the reference architecture for enterprise-grade platforms. Now organizations have a standard, proven, scalable, and repeatable pattern to follow, one that’s applicable to any tooling choice. And one that enables you to create IDPs fast that shorten time to market (TTM), improve software supply chain best practices, drive revenue growth and keep you ahead of the competition. So how exactly does it work?
According to McKinsey, there are five main planes that make up different areas of the platform architecture that cluster certain functionalities.
Note: The components and tools referenced below apply to an AWS-based setup, but all are interchangeable. Similar reference architectures can be implemented for GCP, Azure, OpenShift or any hybrid setup. Use this reference as a starting point, but prioritize incorporating whatever components your setup already has in place.
- The Developer Control Plane level contains the primary “interfaces” developers can choose to use when using the platform. Following the golden paths design principle, it’s best to leave interface changes to the developer on a workload-by-workload basis. And to keep existing developer workflows intact where possible by defaulting to code.
- The Integration And Delivery Plane level contains the tools that build, store, configure and deploy requests coming from the developer control plane.
- The Resource Plane level contains all resource components necessary to run the app. The resources can be configured as code using tools like Terraform.
- The Monitoring And Logging Plane provides real-time metrics and logs for apps and infrastructure. Developers can use this plane for observability, monitoring and making data-driven decisions.
- The Security Plane manages secrets and identity to protect sensitive information, such as storing, managing, and security retrieving API keys and passwords.
Platform teams are responsible for binding the individual components of the planes to each other, as well as one plane to the other. They should also test and refine the end-to-end flow of the architecture to ensure a smooth developer experience (DevEx).
Zooming in on Golden Paths
To understand how the planes and components of this architecture work together, it’s helpful to follow a deployment from the initial git-push to the running application. A “golden path” refers to the tools and workflows that the platform team binds together to standardize and accelerate software delivery. Golden paths should be defined in a way that improves DevEx and maintains developers’ freedom to go off-road where necessary. This is where the Humanitec Platform Orchestrator comes into play. Platform engineering teams use the Platform Orchestrator to design golden paths and define clear conventions for their organization. Developers simply describe what resources their workloads need to run using the open source workload specification Score (or the UI, CLI, API).
Let’s see how this all works.
Golden Path 1: Deploying to Dev
Let’s say a developer wants to deploy the changes made on a workload to dev.
The golden path would look something like this:
- Developer modifies a workload and git-pushes the code.
- The CI pipeline picks up the code and runs it.
- The image is built and stored in the image registry.
- The Platform Orchestrator is notified. It generates the necessary app and infrastructure configs and prepares all components for the deployment by running an RMCD (read, match, create, and deploy) execution pattern.
- Read phase: The Orchestrator interprets the workload spec.
- Match phase: The Orchestrator removes the context ( CI tag or metadata) and identifies the correct resources to wire the workload to.
- Create phase: The Orchestrator generates app configs by applying the workload spec to the workload profile.
- Deploy phase: The Orchestrator either orchestrates the resources and performs the deployment or hands them over to the dedicated CD systems.
Golden Path 2: Creating a New Resource
Let’s say a developer needs an ArangoDB, but this isn’t yet known to the setup. Dynamic configuration management (DCM) enables developers to extend or customize the available resources simply by adding a resource definition to the general baselines of the organization. As with the deployment to dev, the Platform Orchestrator takes care of the rest.
Golden Path 3: Updating a Resource
This is an area where platform engineers can use the platform to maintain a high degree of standardization across the organization. Let’s say a platform engineer wants to update all of the Postgres resources, across the workloads that depend on them, to the latest Postgres version. To achieve this, the platform engineer would:
- Update the resource definition of dev Postgres resources. If the Postgres is configured in Terraform, this will involve simply updating the Terraform module. If not, the driver would be adopted.
- If inputs and outputs should be changed, update the resource definition in the Terraform provider of the Orchestrator.
- Find which workloads currently depend on the resource definition of “dev Postgres” in the Platform Orchestrator. This can be done by pinging the Orchestrator API or looking at the user interface.
- Auto-enforce a deployment across all workloads that depend on the resource type “Postgres dev.”
Just like that, the new version is rolled out across all workloads and applications.
Start Your Platform Journey the Right Way
So there you have it. No matter your setup, the IDP reference architecture as discussed by McKinsey is a complete game changer for organizations starting their platform engineering journey. It not only enables platform teams to learn proven IDP design principles, it shows how architectural components fit together and how to design great interaction patterns for engineers and developers.
With the ability to zoom in on building golden paths for greater developer self-service, platform teams can streamline the way developers work and reduce the need for manual work and ticket ops, allowing Ops teams to focus more on making improvements vs. ad hoc requests. By following these proven design patterns, you can ensure their IDP meets the needs of your developers and overall business objectives.
My team at Humanitec created several white papers inspired by McKinsey’s talk, not only for AWS, but also GCP and Azure. They illustrate how to integrate the planes and components, showcase how developers, ops and platform teams can use the platform, and provide more examples of golden paths. You can explore all three platform reference architectures by heading here.