AIOps Done Right: Delivery Automation for DevOps and SREs
AIOps has the potential to drive significant business value by enabling DevOps and site reliability engineering (SRE) teams to create better, more secure software, faster — if organizations take the right approach. But many simply aren’t leveraging AIOps correctly and are not maximizing its potential. In my previous article, I highlighted how long-term reliance on “Gen 1” AIOps solutions — older tools that may have worked for the IT environments of years ago — contributes to this. These tools are out of sync with today’s more dynamic multicloud environments, where changes happen too fast and production deployments occur too often for older machine learning algorithms to keep up.
In that piece, I shared one example of how organizations can start executing AIOps the right way by shifting AIOps “left” to create more test-driven operations. Now I want to delve into another use case that highlights the value of AIOps done right: scaling and improving delivery automation.
Many organizations have already begun automating their delivery pipelines through tools like Azure DevOps, GitLab Pipelines, GitHub Actions, Argo, Tekton, and Jenkins. AIOps can further accelerate this process, empowering DevOps and SRE teams to put higher-quality code into production and increase the throughput of their delivery pipelines..
There are two essential ways to integrate AIOps solutions into delivery automation: pushing data on deployment and configuration changes into the AIOps solution, or pulling AIOps-supported answers to facilitate more data-driven decision-making around software delivery.
1. Pushing Deployment Data to Your AIOps Solution
Attaching or linking events to a monitored entity — a service, process, container or host — makes it easier for the AIOps solution to analyze and correlate behavior. For more precise insights, AIOps should go beyond simple correlation to supply root-cause determination for this data.
AIOps can provide this context by linking events to monitored entities. Examples include:
- Deploying a new application iteration in a testing environment.
- Load testing an application in a staging environment.
- Load balancing traffic in a production environment.
- Switching feature flags or restarting service instances in a production environment.
The context provided by linking an event (such as a deployment, load test, load balance, configuration change or service restart) to a monitored entity (such as a particular application, container or process) enables the AIOps solution to connect the dots between behavioral changes in executed actions.
Pushing this information and context to the AIOps solution also allows it to immediately alert DevOps and SRE teams in cases where a behavioral change produces a negative impact on either end users or service level agreements (SLAs). That alert from the AIOps solutions not only notifies the team of a negative impact, but also identifies the root cause of the issue.
For example, imagine pushing information about an automated load test from your delivery pipeline to the AIOps solution. In that scenario, the AIOps solution is aware when the load test is executed, conducts a performance hotspot analysis over the length of time the load test was conducted and directly alerts the test engineers about potential user experience or SLA impacts. The AIOps solution can even create automated regression analyses in between test runs, making it easier to compare performance.
Pushing event information directly to the AIOps solution provides more context about your delivery automation pipeline with precise and automated answers about how to improve delivery quality and scalability.
Pulling AIOps Answers to Create Data-Driven Delivery Decisions
Pushing deployment information and context to the AIOps solution makes it more aware of delivery activities. That, in turn, provides a new source of data that DevOps teams can pull from the AIOps solution for more informed decision-making.
This can work in a couple of ways:
- The AIOps solution may both generate that data within its own dashboard and provide an API that displays performance data of individual releases or tests.
- The AIOps solution may also provide teams with a choice for comparing either individual test run results or baseline results, across multiple tests or deployments, to pinpoint potential regressions during or between tests.
Keptn, an open source Cloud Native Computing Foundation project, which I’ve written about previously here, provides a clear use case of how DevOps and SRE teams can integrate with their AIOps solution to compare and analyze the results of test runs or deployments. Keptn uses both service level objectives (SLOs) and an open standard communication protocol for tool integrations to analyze data ingested from various solutions, creating an overall “SLO quality score.” Instead of having to manually review AIOps dashboards and test reports, teams can use Keptn to automate this process, distilling the AIOps data analysis into a single SLO score, which then helps support delivery decision-making.
This approach serves two purposes, both accelerating software delivery pipelines and ensuring that problems in the delivery process are quickly remediated before they negatively affect software releases — and, as a result, end users.
Keptn is just one example of an implementation that DevOps and SRE teams can use to reap the advantages of AIOps data. By applying these approaches, DevOps and SREs can pull AIOps data into their delivery automation process — not just the individual metrics, but the automatic root-cause analysis. This integration between AIOps and delivery automation means teams can make faster rollout decisions, quickly incorporate feedback into development and deploy better code into production.