DevOps / Machine Learning / Tools / Sponsored / Contributed

The Urgency Driving AIOps into Your Enterprise

20 Oct 2021 6:40am, by

Ali Siddiqui
Ali is chief product officer for BMC Software Inc. In this role, he has end-to-end responsibility for the company’s entire product portfolio, including the BMC Helix suite, Control-M and its Automated Mainframe Intelligence (AMI) solutions. Ali earned a Bachelor of Science degree in electrical engineering from the California Institute of Technology and a Master of Science degree from Stanford University.

AIOps was once considered just a back-office fundamental, a solid suite of tools simplifying routine security and network monitoring tasks that primarily served the IT shop. The accelerated pace of digital transformation is changing that. Now, IT service and operations teams are in the spotlight and tasked with enabling business performance that help their companies provide seamless digital experiences and evolve in a fast-changing economic environment.

AIOps is key to helping them transform the business to be more agile by helping IT become more proactive and predictive in anticipating challenges that could lead to costly downtime. AIOps allows machines to solve IT issues by themselves, using a multilayered approach that enhances the operations of IT using machine learning (ML) and analytics to analyze big data obtained via different tools. This combination can automatically spot and react to IT issues in real-time and support continuous integration and deployment (CI/CD) for core technology functions. It’s also critical that AIOps tooling has an open approach that can integrate with existing IT tools and data sources, given the broad range of data to observe and analyze.

As enterprise systems became more complex, IT practitioners needed ways to leverage the massive amounts of data at their fingertips. The application of ML to that data gave way to AIOps. Just as AIOps has evolved to meet the needs of IT operations teams, IT operations teams have evolved to meet the needs of their enterprise. They are faced with a huge surge in operational data volumes, the increased complexities of IT environments brought on by multicloud and remote working environments, agile development methodologies and digital transformation initiatives involving newer application architectures such as containers and ephemeral workloads.

The pace of change has been remarkable in the last year alone. A recent study of large organization IT departments conducted by Hanover Research found that more than two-thirds of companies (69%) now apply AI to IT operations and IT service management. In addition, the AIOps platform market size is expected to grow to 11.02 billion by 2023, promising speed and accuracy in solving wide-ranging IT problems at scale.

Automation is most effective when applied within strictly defined processes and workloads that are manual and repetitive. AIOps reduces the amount of time that highly skilled engineers devote to these tasks, and it allows them to focus on higher-value initiatives within the organization. AIOps helps IT address complex challenges and cater to exponential data growth, automating the entire IT operations process across hybrid environments to create an accurate inventory for machines to correlate data points independently, apply it to ML to detect patterns across four key practice areas: event noise reduction, predictive alerting, probable cause analysis and capacity analysis.

Event Noise Reduction and Predictive Alerting

IT teams struggle to manage the large numbers of false events and alerts from the various monitoring tools installed in their environment. It’s one of their largest challenges. While the alerts can be helpful at times, more often they clutter the inbox and create false alarms.

AIOps reduces the noise of events across an environment, by learning how an environment behaves in both busy and slow times. This knowledge is then used to determine whether a specific alert indicates a bigger incident with potential service impact. Furthermore, IT teams will only be alerted when the environment’s behavior indicates anomalous behavior indicative of app or service degradation or system downtime. This helps with prioritization and drives efficiencies. For example, a hybrid IT solution provider was able to reduce event noise by 90% with the help of AlOps and cut costs by reducing 10,000 tickets per month down to just a few hundred.

The same intelligence gathered to reduce event noise can be applied to predictive alerting as well. In this scenario, AIOps identifies innocuous-looking events for further evaluation because those events in the past have contributed to larger issues. This enables a proactive approach to stopping problems and prevents service outages for customers.

Probable Cause Analysis and Capacity Analysis

In a traditional environment, it can take an exorbitant amount of time and energy to understand why and how an issue originally occurred. However, this process can be automated using AI to surface top causal nodes, such as exactly where the problem is and the events that are associated with it, eventually helping to reduce the mean time to repair (MTTR). From there, the AIOps solution can analyze the data and identify the problem in minutes or even seconds instead of hours.

AIOps can also offer a topology view, which shows the impact, the specific node, how many events have occurred and any completed change requests. This allows IT teams to investigate the changes and the events coming into those specific nodes and see the probabilistic percentage of which node should be the reason behind the actual service degradation.

IT teams must also understand resource consumption on-premises and in the cloud. Through behavior learnings and advanced analytics, AIOps can enable better capacity management and understand what resources are being used and when. Even more importantly, AIOps can determine what resources will be needed to support the apps and services most in demand by customers. This allows for the planning of future needs and gives IT teams the intelligence to right-size resources, keeping costs down and applications performing as expected.

Adoption and Integration Into DevOps Frameworks

AIOps reduces the amount of time that highly skilled engineers devote to these tasks, and it allows them to focus on higher-value initiatives within the organization.

Increasingly, AIOps is being integrated into DevOps frameworks, especially log ingestion, analytics and identification of risks in code. In the future, AIOps usage in the DevOps framework will shift from a focus on pre-production to include production metrics like user engagement, quality and business relevance. All of this means that there is strong evidence that DevOps teams that leverage AIOps platforms to monitor and support applications will accelerate their timelines and streamline development.

The Time is Now

Digital transformation represents a shift from centralized IT to applications and developers, an increased pace of innovation and deployment, and the acquisition of new digital users — machine agents, Internet of Things (IoT) devices, APIs, etc. — that organizations previously didn’t need to service.

All of these new technologies and users are taking traditional performance and service-management strategies and tools to the breaking point. AIOps is the IT operations team’s best strategy to handle these digital transformation issues. AIOps transforms IT operations so that automated and AI-based analytics is applied to a broad range of data ingested into a modern and open observability platform, allowing teams to focus on driving operational excellence and helping the company evolve into an autonomous digital enterprise.

AIOps will enable ITOps to intelligently orchestrate infrastructure, applications and services across hybrid cloud ecosystems to align with the business and address customer needs on demand. Business leaders must recognize the need to digitally transform the entire IT environment to support a smart enterprise that can meet the needs of the fast-moving digital market.

Photo by Burak Kebapci from Pexels.