DevOps / Machine Learning / Security / Contributed

AIOps Readiness in 5 Steps

6 May 2020 9:34am, by

Paul Scully
Paul Scully is a vice president at Grok, which focuses on machine learning and AI Ops. He can be reached at paul.scully@grokstream.com.

It doesn’t take a lot of digging to show that AIOps, particularly within service provider organizations, is a booming industry. Forrester reports 68% of companies surveyed have plans to invest in AIOps-enabled monitoring solutions over the next 12 months. Gartner forecasts that by 2022, 40% of all large enterprises will combine big data and machine learning to support and partially replace monitoring. The list goes on and on.

But, if you look one layer deeper, you can start to piece together a compelling story that while companies see the true value in AIOps, they don’t always know how to realize that value. In a Logicalis report, overall only nine percent of respondents considered AI projects successful, and only 35% of all IT-focused projects were considered successful.

It’s clear that with the rapid adoption of AI there remain challenges that service assurance organizations must address. Conducting a readiness assessment can help to set a solid foundation and clearly defined objectives that will help ensure the future success of AIOps initiatives.

Step 1: Define the Objectives

A successful AIOps program has some really attractive outcomes once it’s established and proven out. Because of that, it’s tempting to try to implement a soup-to-nuts program right out of the gate. Stop there.

Trying to lump all of the machine learning needed into one project or phase creates a risk of not meeting your objectives. Don’t be afraid to start small with an iterative approach that defines a more limited strategic goal. Doing so will not only help you prove out your proof-of-concept, but will keep you and your team from “analysis paralysis” that can come from trying to do too much at once.

You do want to outline the overall objective and high-level phases, but then start with smaller projects in a daisy chain — start, perhaps, with event clustering, then add in log clustering, then anomaly detection before getting to incident prediction. This approach lets you focus on the existing project and objective, measure the success of it and make adjustments in future phases.

Step 2: Evaluate Today’s Operational Processes

You don’t want to embark on a lengthy evaluation and analysis period. But you do need to understand at a high-level what systems you have in place today, the data that’s available in those systems, how your operations teams (including DevOps) leverage that information, and how incidents are opened and worked by these groups.

Working in stages you can evaluate the systems specific to the phase you’re in. If you are focused on deploying event clustering, then evaluate just the systems that collect event streams today. From there, you can shift focus to how to integrate to the eventing systems and how to feed the AIOps output back into the Operational process and systems.

Step 3: Evaluate Your Data

The goal here is to understand the type of data that will be ingested into the AIOps platform. Ask yourself these important questions:

  • Is the needed data even available?
  • Will there need to be any reformatting of the data?
  • Is there the right amount of density of the data for the algorithms to learn?
  • Are there any disruptions in the data pipeline that may throw off how the algorithm learns?

The adage is true: garbage in, garbage out. This saying very much applies to AIOps, so it is important to ensure the data is available and ready. Only after you have a real sense of where your data is and how it stacks up can you move on in the process.

Step 4: Identify and Fill Gaps

Now that you have a good understanding of your systems and data, you can evaluate that information to identify gaps and put plans in place to mitigate them.

For example, in a recent project, we noticed that the performance time-series data had a broken pipeline process as the data was sent to the AIOps platform. This issue caused dramatic troughs and spikes in the data that was not indicative of the real-time data coming from the devices. This, in turn, caused the machine learning algorithms to learn against bad data and reduce the accuracy of the outcome. Identifying it early and fixing the issue avoided lengthy troubleshooting to find the problem later, as well as having to retrain and re-implement the machine learning models that were dependent on the data.

Step 5: Leadership Buy-In

It’s hard to overstate the importance of securing buy-in from all stakeholders on the overall objectives and benefits of your AIOps program. Without it, your project could languish without the support it needs.

Make it a point at the beginning of each phase to secure support from each of the stakeholders specific to that phase. Going back to our daisy chain in Readiness Step 1, if you’re focused on event clustering, make sure you have not only the backing of your Operations Executive, but that you’ve agreed on what success looks like and how it will be measured.

An AIOps Readiness Assessment is a foundational element before you invest in an AIOps platform. It provides alignment on the benefits of the program and what constitutes success and helps you to uncover potential issues that need to be addressed to avoid pitfalls. Remember, if you take an iterative approach and outline each phase, you’re able to show success for each phase, which rallies support from key stakeholders throughout the project.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.