Thundra sponsored this post.
Promises, promises, promises. DevOps processes in general, and continuous integration/continuous deployment (CI/CD) pipelines in particular, promise substantial business value. They can help organizations effectively and innovatively address dynamic market demands by shortening the time to market for new products and features. The value captured can be both direct (such as increased revenues) and indirect (such as greater customer satisfaction).
But how can non-technical business stakeholders understand if DevOps is delivering on its promise for their organization? The answer lies in tracking the right DevOps Key Performance Indicators (KPIs) and, perhaps most importantly, presenting them in a meaningful context.
This article looks at the most important DevOps KPIs to monitor, and how they should be presented to business stakeholders so that they can make informed decisions.
The DevOps Ecosystem
The agile approach to software development, with its stress on flexibility and rapid iteration, has been around since the turn of the millennium. However, by 2008 forward-thinking engineers sought to break down the barriers between development and operations and apply agile methods to both app development and to infrastructure issues. DevOps, as the new approach came to be known, gained traction slowly. It would be nearly a decade before Forrester Research declared 2017 “the year of DevOps,” with 50% of organizations implementing DevOps initiatives by then.
In the DevOps model, development and operations teams — and often QA and security teams as well — collaborate closely throughout the application lifecycle, from dev/test to production. DevOps teams use technologies and tools that support self-service provisioning of resources, as well as the automation of traditionally manual processes.
In addition to the cloud native DevOps stacks offered by cloud service providers, there are many tools on the market. Leading tools in the DevOps ecosystem include Docker and Kubernetes for container management and orchestration; Git and GitHub for source code control; Jenkins, Ansible, Chef and Puppet for IT automation and configuration management; Splunk, Prometheus, Thundra APM for logging, monitoring and observability; ServiceNow, Jira for IT ticketing; and more.
KPIs That Reflect DevOps Value
In general, a KPI provides quantifiable, objective evidence of progress over time toward a desired business performance result. Whether used to measure inputs, process, outputs or outcomes, KPIs contribute to informed, analytical and focused decision-making processes. Of course, it is important to remember that KPIs are a means to an end — i.e., improving performance — and not an end in themselves. You don’t want teams focused on producing good KPI “scores,” over providing good products and services.
With this in mind, here are some quantifiable DevOps KPIs that give business stakeholders insight into the effectiveness of their DevOps activities.
Deployment Frequency and New Request Lead Time
Deployment frequency measures the volume of deployments over a certain period of time, while lead time measures the overall time it takes to move a new feature or product from request to implementation.
Deployment frequency can be measured relatively easily, by configuring a webhook at the end of the CI/CD pipeline that increments automatically whenever a deployment takes place. New product/feature lead times can be tracked through the organization’s project management platform.
Together, these two KPIs provide critical insights into both the velocity and the productivity of the organization’s DevOps workflows. Decreases in deployment frequency and/or increases in lead times can indicate bottlenecks that are holding back progress. In general, business stakeholders should be looking for moderate but steady improvements in these metrics.
Deployment Failure Rate
High deployment frequency and short project lead times are meaningless if there’s a high failure rate when deploying into production.
At a tactical level, deployment failure rates are measured by tracking how often a batch deployment causes a system outage or slowdown or requires a subsequent fix or even rollback. At a more strategic level, it is also valuable to track how many code commits don’t even make it into the production environment.
If these tactical and strategic failure rate metrics are increasing, business leaders should consider scaling back DevOps activities until underlying issues can be identified and addressed.
Change Volume and Change Request Lead Time
While the first set of KPIs deal with deployment frequency and the lead time to put a new request into production, this set of KPIs measures the number of change requests between deployments — as well as how long it takes to put them into production.
In order to measure these KPIs, it is important that each patch, bug fix and other remediations be tracked as a discrete work ticket in a unified development and operations work backlog. Then webhooks can be used to automatically close a ticket after a successful deployment, thus keeping track of their overall volume as well as the lifetime of each ticket.
Although frequent changes between batch deployments are a sign that DevOps is on top of performance and security issues, increasing change volumes may indicate that the major deployments are being rushed into production. Change request lead times are an important measure of the efficiency of the organization’s DevOps workflows.
Availability and Performance in Service Level Agreements
Organizations have formal or informal objectives for application availability (uptime) and performance objectives (error rates, response times, etc.). It is important to verify that the organization’s DevOps practices are, at the very least, not an obstacle to meeting Site Reliability Engineering uptime and performance requirements. Ideally, DevOps workflows should be optimizing and improving both application and service reliability, as well as the end-user experience.
There are many IT monitoring methods and tools that are used today to track application availability and performance. For example, black box monitoring uses simulated user agents to get performance metrics.
These monitored KPIs should show consistent availability and performance metrics that meet or exceed the organization’s defined objectives. It is also worth noting that these metrics should be monitored throughout the entire application lifecycle, not just in the production environment.
Mean Time to Recovery
No matter how well-oiled an organization’s environment, there will inevitably be downtime and slowdowns for applications and services in production. Well designed and robustly implemented DevOps practices should support a quick mean time to recovery (MTTR) that minimizes lost or slow access to application or service features. DevOps tools like Thundra APM, for example, help reduce MTTR with automated distributed tracing and production debugging.
MTTR metrics, as measured by the organization’s ITSM system, provide invaluable insight into how well the organization is meeting its MTTR baseline. An organization should expect its DevOps practices to deliver a consistent MTTR that minimizes the number of users who abandon an application or service.
Defect Volume and Escape Rate
Organizations do not want users to get the feeling that they are part of the organization’s QA team, as they identify and report runtime defects. Business stakeholders expect DevOps workflows to support a low defect escape ratio — i.e., the number of defects found by the end-user divided by the bugs found by QA in pre-production. A very simple way to collect metrics for this KPI is to track customer tickets versus bug-related tickets opened by QA.
What is important here is for business stakeholders not to expect defect-free applications. This would be a good example of how a poorly used KPI metric can paralyze innovation and constrain deployment velocity, undermining the very business objectives that DevOps is meant to support.
Making Sense of KPI Data
Non-technical business stakeholders need powerful reporting and visualization tools that analyze and correlate DevOps KPI metrics into charts, graphs and tables that clearly show where expected baselines are being met, where they are being exceeded, and where they are being missed. These resources will show business leaders, on a project by project basis, if their current DevOps practices are accelerating or constraining business outcomes.
Some examples of visualization tools are:
- From the world of lean manufacturing and agile software development, Kanban boards are designed to visualize work, optimize work-in-progress, and maximize workflow efficiency.
- The ticketing software platforms, such as Jira and ServiceNow, come with built-in kanban boards and other tools that visualize the volume and velocity of the ticketing activities at the core of many automated DevOps workflows.
- Collaboration tools such as Trello and Asana can be integrated with DevOps workflows, to automatically provide reports and alerts to the relevant stakeholders across the organization.
- Dashboarding solutions such as Datadog, Grafana and Tableau Software turn big data into business intelligence and are important tools for presenting actionable DevOps KPI insights to decision-makers.
A Final Note
All of the DevOps KPIs discussed above are even more difficult to monitor and assess in today’s application architectures, which are often deployed across multiple environment types. With this in mind, we at Thundra offer Thundra APM, which has been designed from the ground up to provide the observability, monitoring and dashboard capabilities that give business leaders insight into their company’s DevOps practices, and whether or not they are delivering the expected business value.
Feature image via Pixabay.