Using DORA Metrics to Optimize CI Pipelines
In today’s highly competitive atmosphere, responding to rapidly changing market demands can only be possible by maintaining software at high quality and high availability. Keeping user experience at the uttermost is only possible through continuously delivering quality software at speed.
We will highlight three of the DORA metrics that can help to improve the performance of continuous integration (CI) pipelines when monitored proactively. We will look at three metrics to measure the success of a CI pipeline: change failure rate, change lead time and deployment frequency.
Use the DORA Metrics for CI Performance
In their book, “Accelerate,” the DevOps Research and Assessment (DORA) research program identifies a set of metrics that they consider indicators of software teams’ performance. They provide guidance for how DevOps teams can continuously improve their processes and capabilities. These metrics are known as the DORA metrics: change lead time, deployment frequency, mean time to resolution and change failure rate.
The DORA metrics make DevOps measurable across the full development life cycle. In other words, these metrics help engineering teams take data-driven decisions to improve best practices, and deliver software faster and more reliably throughout f a CI/CD pipeline.
Change Failure Rate
Change failure rate (CFR) is the percentage of code changes that lead to failures in production. It is code that needs to be fixed or called back after it has been deployed. Defects in production caused by code changes comprise this metric.
While tracking a change failure rate, we only take into consideration the failures that happen in production, but not the ones caught and fixed during the testing phase. In addition to that, defects may happen at the users’ end that are no fault of the developers. The change failure rate metric should only be counted when there’s a change in the code that can be anything from new features to quick fixes.
How to Improve CFR in Pre-production
Improving the change failure rate is possible with a holistic and continuous effort. Anomalies and defects should be monitored carefully not only in the production environment but also during the testing phases.
Reviewing a pull request (PR) is a difficult task since you might not exactly know the impact of that code in production. What matters most is the changed parts of the code, naturally, so you’d want to know and understand them. Hence, it’s difficult for code authors to ensure their testing strategy is in alignment with the way their applications are actually used. Moreover, change management processes are mostly manual and daunting.
This metric reveals the quality of the code reviews. And that’s why code reviewers have a direct impact on this metric. To be more precise, pull requests should be rated by the risk impact that they might have on production. Code reviewers should be automatically clued in on the risk level that the PR they will review. This way imprecise reviews will be eliminated.
The ideal way to improve the change failure rate is when errors, bugs or failures are caught during the testing phase in CI workflows.
Change Lead Time
Change lead time (CLT) helps you understand the efficiency of your software development process. It is calculated by measuring the time between the first commit of a code for a given issue and the time of deployment.
Thus, delivering defect-free software at speed makes all the difference. When focusing on CI performance and health, we tend to look at the test coverage to accelerate fault-free deployments.
Test coverage should not be treated equally with code coverage. Test coverage is the percentage of software functions and features covered by tests or test suites. Code coverage is the percentage of code that is covered by test cases through testing frameworks and suites. Both code coverage and test coverage relate to software testing that improves code quality and, therefore, your change lead time.
You waste so much time without quickly debugging test failures, detecting flaky tests, identifying slow tests and visualizing performance over time to identify trends. In addition to that, it is a pretty old-school practice to test everything for every commit. As a result, releases delay and slow down because of long testing cycles.
Long-running test suites and frequent failing tests are the most common reason for slowing down build times and hence reducing deployment frequency. You should have visibility into test runs to quickly debug test failures, detect flaky tests, identify slow tests and visualize performance over time to identify trends.
Deployment frequency is a measure of how often your team pushes changes to production. It indicates how quickly your team is delivering software, and consequently your speed.
Software teams should closely monitor and follow the below practices to improve the deployment frequency:
- Any potential bugs or mistakes should be identified in the pre-production phase by introducing debuggers or specific monitoring solutions.
- Test automation is a must, but manual quality assurance should also be a part of the QA strategy. The testing strategy should be strong, strict and wide enough to make sure any part of the code is tested well.
- Most of the defects should be caught before being introduced to production. The pre-production environments should be closely monitored and erroneous tests or tests with unexpected latency should be examined carefully.
Monitor Your CI Pipelines
Improving your CI performance through metrics means understanding your continuous integration process, architecture, runtimes, engineering teams and development process.
Being able to track, monitor, and gain insights into the CI performance means having clear observability in your CI pipeline. Today, existing solutions/toolings such as application performance monitoring tools, monitoring tools, error-tracking tools or CI/CD tools and platforms do not satisfy the need for visible pipelines. The best way to prevent production regressions is to have an observable CI pipeline.
While just a metrics dashboard for CI pipelines can be enough for some, others may require the ability to trace their CI steps and job.
Foresight provides sophisticated monitoring capabilities for CI pipelines and tests. You can monitor workflow resource metrics such as CPU load, memory usage, disk, and network I/O; trace processes at the kernel level to monitor inside workflow steps; monitor test suites and tests; understand the impact of the code changes on the production environment, and prioritize tests to optimize performance in every CI workflow. You can try Foresight’s GitHub application by yourself.