5 Mistakes to Avoid with AIOps Projects

As with any groundbreaking new technology, the early days are fraught with frustration when IT teams deploy unproven solutions and expect immediate ROI. This is assuredly the case for artificial intelligence for IT operations (AIOps), which combines machine learning, data science, and other computational approaches for solving modern digital operational problems at scale.
IDC’s Worldwide CIO Agenda 2020 Predictions expects that 60% of IT organizations will deploy artificial intelligence (AI) to “augment, streamline, and accelerate IT operations” through 2022. A recent Gartner study, however, found that over the last two years, nearly 50% of enterprises have failed to transition AI projects from proof of concept stage to production deployments.
If IT operations teams wish to deliver maximum business value from AIOps deployments, they should pay attention to these five common mistakes that can trip their best-laid plans:
Mistake #1: Not Analyzing Your Current State of IT Operations
Technology leaders planning to purchase an AIOps platform should take a close look at how their teams handle incidents. They should start with a playbook that documents how teams respond to problems and analyzes the effectiveness of incident resolution processes. Otherwise, IT might buy an AIOps tool that’s a terrible fit without understanding how existing event management workflows, staff skills, and tooling are hampering business outcomes and customer experiences.
Here are questions to consider while reviewing your incident management processes:
- Technology landscape. What application and infrastructure platforms are you currently supporting? How do you expect the IT estate to change over the next few years?
- Tools portfolio. Which IT operations tools (infrastructure monitoring, application performance monitoring, event correlation, and service desk) are you currently using? Are there any plans to retire or consolidate existing tools?
- Process. What does it currently take (stakeholders, tools, and workflows) to troubleshoot a critical outage? How long does it take to identify an incident and assign it to the right stakeholders?
- Challenges. What issues do your teams face while identifying, troubleshooting, and resolving issues?
- Measurement. What metrics are you using to track customer satisfaction and how do they inform your incident management key performance indicators (KPIs)?
- External support. Are you using managed service providers or external consultants to support your event management workflows?
Mistake #2: Not Measuring the Business Outcomes You Wish to Achieve with AIOps
IT teams should assess the effectiveness of current incident resolution processes to determine how much they can improve infrastructure availability, enhance operational agility, and reduce management complexity with AIOps.
While there are clear benefits to a data-driven approach for event and incident management, IT leaders should also consider the tradeoffs involved in a successful implementation:
- Business problems. What issues are you trying to address, such as lower failure rates or decrease in support tickets, with modern incident management tools?
- Productivity. How much time can your teams save by ignoring false alarms, building static rules for event suppression, and creating war rooms for root cause diagnostics?
- Automation. How much effort can you save by auto-assigning incidents to the right on-call teams or triggering process workflows for automatic problem resolution?
- Data requirements. Have you identified different sources of historical and streaming data that will feed into your AIOps platform and analyzed the time taken for data preparation, modeling, standardization, and cleansing?
- Expertise. What training will your staff need to work with modern event management tools that use machine learning algorithms and statistical insights and will you need professional services to supplement your internal teams?
Mistake #3: Not Drafting a Tools Selection Criteria Driven by Organizational Priorities
IT professionals gravitate towards feature comparison checklists while evaluating different AIOps tools. While technical tradeoffs are a useful exercise, tool selection should rest on specific use cases that contribute to business outcomes such as better customer support or quicker problem resolution.
Consider the following factors while drafting the tools selection checklist:
- Workflows. How does the vendor support and enhance current incident management workflows and support critical use cases?
- Integrations. Does the vendor offer out-of-the-box support for my existing infrastructure and tools portfolio?
- Partner ecosystem. Does the provider have partnerships and alliances with leading managed service providers and popular IT operations tools?
- Secret sauce. What proprietary and industry-standard machine learning algorithms and data science techniques does the technology vendor incorporate?
- Approaches. Does the provider use different techniques (algorithmic, statistical, or topology-infused) for root cause(s) analysis? What support will the vendor provide to mimic current rules-based approaches for event filtering, classification, and analysis?
- Product roadmap. How does the vendor plan to enhance product functionality and usability over the coming months and quarters?
- Metrics. How does the vendor track and surface critical KPIs for event management?
Mistake #4: Not Staffing a Center of Excellence
Organizations that wish to deliver a successful and scalable AIOps adoption should build a cross-functional tiger team known as the Center of Excellence (CoE). The CoE ensures alignment with business requirements, delivers an incremental approach for deployment, and shares best practices for accelerating the AIOps journey. Here’s how IT leaders can support the CoE:
- Executive sponsorship. Does the CoE have strong executive support to prescribe and govern the AIOps implementation framework? Does senior leadership make it a point to emphasize the importance of the CoE’s work during their weekly staff meetings?
- Multidisciplinary organization. Does the CoE have the right skills that combine business context and technical chops for defining solution architecture, managing change, and driving value creation?
- Upskilling. Has the organization invested in refresher training courses for CoE staff on statistical pattern analysis, machine learning, and vendor-specific certifications to ensure successful transformation?
Mistake #5: Not Marrying Human Insights with Machine Data Intelligence
An implicit goal of AIOps deployments is to shrink overall staff working on incident management. While IT leaders can redeploy existing staff working on incident resolution once their AIOps platform has matured, headcount reduction should not be the major focus of modernizing incident management workflows.
IT operations staff bring valuable business context, systemic understanding of enterprise applications and infrastructure, and process expertise. Your AIOps project will be an abject failure unless your employees share their insights to refine and optimize algorithmic recommendations for event management. Here are some considerations:
- Collaboration. How do IT staff work with data scientists to drive better pattern recognition, anomaly detection, and elimination of repetitive incidents?
- Explainability. How do you build trust and confidence in modern analytical approaches for IT performance management, such as by displaying data on the effectiveness of AI-based recommendations?
- Cognitive enhancement. How do you deliver data-driven insights to improve a human operator’s ability to recognize issues and enable faster resolution?
Conclusion
Before investing in an AIOps solution, technology leaders should first analyze their current state of IT operations, measure the business outcomes that they wish to achieve, and select tools driven by organizational priorities. This will avoid costly missteps and ensure business priorities are the driving force behind enterprise AIOps deployments.
Feature image via Pixabay.