What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
DevOps / Platform Engineering

Measuring Key KPIs and Platform Engineering Success

Leading and lagging indicators help us connect the efforts of platform engineering with its results, offering deeper understanding of its advantages.
Oct 23rd, 2023 7:02am by
Featued image for: Measuring Key KPIs and Platform Engineering Success
Image from Facets Cloud.

Platform engineering makes technology work better for different needs and improves the experience for developers. Think of it as a tool that connects and improves various tech parts, helping businesses run smoothly and innovate more. This helps companies reach new levels in their industries.

Yet, seeing the real effects of platform engineering can be tricky. It’s like trying to spot the changes in a puzzle, where some pieces might not show results right away. Evaluating things like a better experience for developers and smoother operations can be hard because some benefits might take time to appear.

To help with this, we use two well-known concepts: leading and lagging indicators. Leading indicators are like early signs that predict what might happen in the future. In contrast, lagging indicators look back at past events to confirm long-term patterns. These tools help us connect the efforts of platform engineering with its results, giving us a deeper understanding of its many advantages for a company. Let’s dive deeper into these.

Leading and Lagging Indicators

When putting platform engineering into action, the goal is to make operations smoother, improve the way developers work and boost how we use cloud technology. Here’s how it usually goes:

First, we use “leading indicators.” These early signs or hints quickly show whether our operations are working well. Since we can see these signs early on, we can quickly change or improve things if needed. They’re like the dashboard lights in your car, alerting you to any immediate issues.

After that, we look at “lagging indicators.” These are the results we see after some time, showing how well our platform engineering changes have worked in the big picture. They give feedback on things like how efficient our tech setup is, how well software gets put into action and how tough and reliable our operations are. They’re like looking back at a journey to see how well you did, in contrast to leading indicators that focus on what’s happening right now or about to happen.

In simple terms, we first use leading indicators to check on things as they’re happening, and then lagging indicators to see the full results later on. Now, let’s explore some important lagging and leading indicators we use to see how good our platform engineering is.

  • Ops task reduction:
    1. Description: This indicator refers to the decrease in routine and manual operational tasks, which are typically time-consuming and repetitive.
    2. Significance: By reducing these tasks, the operations team can focus on more important work, like making the system work better and improving the quality of the service they provide. This can make our IT setup more creative and quickly adapt to changes, making the whole organization work better.
    3. Dependent leading indicator:
      • Reduction in ops tickets: Measure the decrease in manual ops tickets raised by development teams post-implementation of platform engineering.
      • Reduction in environment provisioning time: Assess the time reduction in launching new environments for testing or business needs through self-service and automation.
      • Reduction in operational incidents: Evaluate the decrease in the number of infrastructure outages and improve the mean time to recovery by empowering developers to fix issues independently.
  • Improvement in developer experience and local productivity:
    1. Description: Enhancements in the experience and productivity of developers.
    2. Significance: A better experience for developers means standardizations and consistencies across teams imply faster onboarding of new members and more effective collaboration, which can lead to faster and more reliable software delivery, thus contributing to organizational success.
    3. Dependent leading indicator:
      • Time to onboard new team members: Measure the reduction in the time it takes for a new team member to become productive due to standardizations and consistencies across teams.
      • Increase in release velocity: Assess the increase in deployment frequency in both nonproduction and production environments to indicate enhanced release confidence and independence in releases.
      • Time spent on the integrated development platform (IDP): Evaluate the amount and effectiveness of time developers spend on the IDP to gauge the improvement in developer experience.
  • Improvement in cloud posture:
    1. Description: Enhancement in the stability, security and efficiency of cloud infrastructure.
    2. Significance: An improved cloud posture is integral for organizations relying on cloud services. Better adherence to cloud best practices results in reduced security vulnerabilities and operational risks, enabling organizations to leverage the benefits of the cloud more effectively. This translates to improved service availability and data security, which are crucial for maintaining trust with stakeholders and customers.
    3. Dependent leading indicator:
      • Cost-based reduction: Measure baseline cost reductions in nonproduction environments as a preliminary indicator for potential widespread cost reductions.
      • Time spent on upgrades: Assess the reduction in time consumed for upgrades to ensure up-to-date versions and fewer security vulnerabilities.
      • Number of noncompliances in scans: Evaluate the recurrence of noncompliances in scans like quarterly audits or penetration tests to ascertain enhanced cloud posture.

To provide clarity on the functionality of these indicators, let’s delve into some illustrative real-world scenarios.

Lagging Indicator Leading Indicator Before Implementation After Implementation
Reduction in Ops Mundane Tasks Reduction in Ops Tickets 50 tickets per week raised to Ops for various tasks. Reduced to 10 tickets per week.
Reduction in Provisioning New Environments Used to take a week for ops teams to provision new environments for testing. With self-service and automation, developers can provision their environments, reducing the provisioning time to a few hours.
Improvement in Developer Experience and Local Productivity Time to Onboard New Team Members Two months to fully onboard a new team member. Reduced to two weeks due to standardization.
Increase in Release Velocity Deployments occur once every two weeks. Deployments occur daily with increased confidence and less dependency.
Improvement in Cloud Posture Cost-Based Reduction $10,000 a month for running non-production environments. Reduced to $6,000 a month with optimized, standardized templates.
Time Spent on Upgrades A week every quarter, causing downtime and reduced productivity. Reduced to a day with frequent upgrades, reducing downtime.
Better Cloud Security Posture Reduction in Non-Compliances in Scans 30 non-compliances found in quarterly penetration testing. Reduced to five non-compliances with no recurring issues.

Driving Progress with Platform Engineering

Platform engineering helps operations run smoothly, even if it often goes unnoticed. It helps us see and measure our progress. Metrics are our way to gauge transformations. Think of leading indicators as previews of what’s coming next, helping us make quick fixes for future success. On the other hand, lagging indicators show us the results, like how efficiently things are working and how happy our developers are.

Together, these indicators give us a full picture of our progress. They help organizations decide how to improve the work environment, solve problems, use cloud technology better and boost productivity.

In short, platform engineering is about continuous improvement. By regularly checking our metrics, we make sure we’re moving in the right direction. This attention to detail is key for any business wanting to make the most of platform engineering. It helps improve operations and sets the stage for long-term success in tech. The goal is to stay focused, keep improving and adapt to the fast-paced tech world.

This is part of a series on platform engineering. Read the entire series:

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma, Velocity.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.