Harness sponsored this post.
Machine learning and continuous delivery (CD): is it an awesome concept, or just the latest grating set of marketing buzzwords? Substance does lurk behind the flashy jargon, and it can be used to solve actual problems.
It helps to start with the familiar reality of a typical deployment pipeline. It usually looks like this:
1. Write code;
2. Commit code;
3. Build artifacts;
4. Test artifacts;
5. Deploy artifacts;
6. Verify artifacts;
7. Rollback artifacts.
Currently, the majority of organizations are in the process of automating steps one through five using their CD process or platform, which generally consists of Jenkins and a set of scripts. For most DevOps teams, CD stops at the deployment phase — which means the moment a new artifact hits production.
However, there’s something missing in steps six and seven above about the actual business impact of these production deployments. One possible solution is to use machine learning (ML) to truly understand this impact — and, as we describe below, ML processes are already up to the task.
Understanding the Business Impact of Deployments
Believe it or not, deployment velocity is not a true barometer of success (despite the fact it feels good). I was always amazed at how many conference attendees were claiming thousands of deployments per day, yet very little had any sort of visibility into the impact of their changes/fixes/patches. How do you know if you’re deploying lots of good or bad ideas or changes to customers?
Consider referring to Steps six and seven as “continuous verification (CV)” — and then use unsupervised machine learning to automate these steps.
Machine Learning 101
Before I dive into how to apply machine learning to enable CV, let’s understand some ML basics.
Machine learning is a form of Artificial Intelligence (AI) that comes in two flavors:
Supervised typically requires humans to train the ML algorithms and models by providing parameters, feedback or labels. For example, things like thresholds, ratings, rules, settings and configuration are types of parameters/feedback that algorithms can use to get more accurate over time. You also might hear buzzwords like “neural feedback,” which simply means soliciting feedback from humans.
Unsupervised is where ML algorithms and their models can infer meaning and relationships from data on their own, without the need for human assistance or intervention. It’s plug and play — you supply the data and the ML will figure out things by itself.
The tradeoff is that supervised machine learning is more accurate, but generally requires extensive training and maintenance. In contrast, unsupervised machine learning is less accurate but is also hands-off. The benefits and value of supervised versus unsupervised therefore vary significantly in the use case(s) and datasets being applied.
Applying AI and Machine Learning to Continuous Delivery
To verify the impact of any production deployment, we first need to understand (and measure) several key performance indicators (KPIs) relating to applications.
Here are some example KPIs:
● Business: revenue, order volume, order throughput;
● Performance & Availability: response time, throughput, stalls, uptime;
● Resource: CPU, memory, I/O;
● Quality: events, exceptions, and errors.
Fortunately, nearly all of this data exists for a CD platform to leverage from tools like:
● Application performance monitoring (APM): AppDynamics, New Relic, Dynatrace;
● Infrastructure monitoring: Datadog, CloudWatch and Nagios;
● Log monitoring: Splunk, ELK and Sumo Logic;
● AIOps/ITOA: Moogsoft and BigPanda;
● Synthetics: Selenium.
Most applications these days also have their own health check page where some simple HTTP assertions can return various KPIs.
What’s required is to build connectors and webhooks that integrate with a lot of the above-mentioned toolsets to observe all application KPIs, data and metrics following every production deployment. It’s then possible to use unsupervised machine learning to automate the process of analyzing all the time-series metrics and event data from these sources. Doing so enables you to automatically verify production deployments and identify any regressions, anomalies or failure which may have been introduced.
Here’s a visualization of what it might look like:
In this view, you can verify production deployments with one or more verification sources regardless of whether the data is time-series metrics, unstructured events or simple HTTP assertions. In the above example, both Splunk and AppDynamics are being used to verify the deployment.
Detecting Performance Regressions
Nothing is more aggravating than a slow app, so verifying performance post-deployment is a no-brainer. Below is a screenshot showing the output of applying unsupervised machine learning (SAX and Markov) to an AppDynamics data set that’s monitoring an application. You can see that the ML has identified three performance regressions relating to key business transactions:
If we mouse-over any of these regressions, we can see why the ML algorithms have flagged these anomalies:
As you see, the response time for Request Login increased 5X from 31ms to 165ms. The great thing with machine learning is that algorithms execute in real-time, so the level of insight you get is almost instant. There is not gut feel with ML; everything is measurable and quantifiable. In this case, ML can quantify the exact business impact of every production deployment.
Detecting Quality Regressions
Performance without reliability is nothing, and keeping an eye on application events, exceptions and errors post-deployment is critical. A major challenge these days is that most application consoles and logs are full of junk like ClassPath exceptions. What’s worse is that most of these exceptions have existed for years, and are a major contributor to noise.
You can actually apply ML to this problem to learn what events are benign and what events are new, unique, and anomalous to your apps and business. You do this by tokenizing event data from your log tools and applying several unsupervised machine ML algorithms such as entropy, Kmeans clustering, Jacard and Cosine to understand the relevance, uniqueness and frequency of events that your applications generate post-deployment.
In the below example, Splunk is being used to continuously verify application quality by analyzing the log event data from the application. This is being done through the use of a Kmeans algorithm that builds “clusters” of related event data. Grey dots represent baseline events that the algorithms become aware of as “normal” events — meaning, events that happen with every production deployment. Red dots represent unknown or new events that are being encountered for the first time, or they represent events with unusual frequency.
In this particular case, the developer accidentally used a semi-colon instead of a colon. As you can see, the deployment introduced tens of new exception regressions. Now the error can be easily fixed.
Need for Speed
The bottom line is that using ML to assist and verify production deployments can dramatically reduce the amount of time it takes to identify and remove deployment errors. Yes, a team of 100 engineers can spend their time manually researching logging and monitoring tools to discover what happened post-deployment — but why should they? What takes several humans 60 minutes to manually do can now be done in one or two minutes using machines.
Now, of course, that doesn’t mean that DevOps becomes irrelevant — just that they get to focus on a lot more interesting problems than troubleshooting deployments, or being perceived as a bottleneck to production releases. And that’s a reason to celebrate the Rise of the Machines.
Feature image via Pixabay.