How AIOps Conquers Performance Gaps on Big Data Pipelines
If your data pipelines are growing in complexity and beyond the point where you can manage them, you’re not alone. Today, they have become so massive and are crisscrossed by so many dependencies that it can be hard to see how all the components fit together, and hard to identify issues and opportunities that impact app performance and availability. Data stacks combine many disparate elements for data gathering and analysis, among other functions — and exponential data growth in most organizations only adds to the challenge.
Example Data Services from AWS, Azure, and Google Cloud Platform:
In such an environment, simply monitoring performance and taking reactive measures when performance lags is no longer a viable approach.
Today, with AIOps (Artificial Intelligence for IT Operations), a correlated data model helps you discover the full context of your apps and system resources so that you can adequately plan, manage, and improve performance. At the same time, AIOps is maturing to the point of creating true efficiencies among DevOps teams as they struggle with the diversity, complexity, and rate of change across the entire stack.
But why AIOps, and why now? The gap between system complexity and human ability called for mature AIOps functionality and practices a decade ago. But organizations are making up for lost time (caused by lack of available technology) to automate, and they’re putting their best foot forward by harnessing the skills and experience of AIOps for data performance optimization.
In fact, the best possible teamwork in all of IT operations may be a fluid synergy between DevOps and DataOps teams that share a single goal. DevOps primarily focuses on improving agility and flexibility; DataOps and AIOps focuses on automating the path from development to production. When the two functions collaborate, they share a focus on the entire system and how to automate, with a keen eye on remediating issues and optimizing performance.
DevOps and DataOps teams face challenges on multiple fronts: They need to find and remediate issues. They need to improve system performance. They need to develop operating efficiencies. And they are learning that AIOps and automation can help by:
- Detecting anomalies
- Predicting performance problems
- Detecting deviations from a baseline
- Suggesting optimizations
- Correlating signals across multiple platforms for troubleshooting
- Performing root cause analysis
- Automating remediation.
It’s common knowledge that data is expanding at unprecedented rates. “Data is doubling in size every two years, and by 2020 the digital universe — the data that people create and copy annually — will reach 44 zettabytes or 44 trillion gigabytes,” said IDC. IDC also noted that a quarter of the current big data “mountain” contains information that is useful for analysis. In 2020 this will grow to one third.
Moreover, in an October 2018 webinar, Gartner said that, by 2022, 40% of large enterprises would use AIOps to support and partially replace IT Operations Management activities – an increase from 5% today.
This data onslaught means different things to different people, of course. But, to data-driven organizations, it means that the ability to store, process, analyze, interpret, and consume that data is becoming a red flag.
Coping with the Multiple, Complex Data Pipelines
The elements of a pipeline are often executed in parallel or in time-sliced fashion. The need for AIOps is illustrated by the magnitude of the infrastructure challenges that enterprises are struggling with today.
- Data no longer resides in just a single database or data warehouse. Today, enterprises are dealing with complex and highly distributed data ecosystems.
- A variety of sources of data are being processed and ingested today, either in real-time or batch processing.
- A range of technologies are in use: such as Hive (for text) or even the older MapReduce; Kafka and Spark Streaming for data ingestion; Hive and Spark for analytics; and HBase and Cassandra for key-value stores. Then, in the cloud, you have born-in-the-cloud systems such as RedShift. Those technologies each have their unique role in an interconnected ecosystem; all are vital, and none is dominant.
When you put all these apps together, complexity explodes.
From the very start, data ops teams are challenged to operationalize the pipeline to be sure it meets business requirements. But they can’t do that when apps are slow, error-prone, or unreliable. And the consequences are missed SLAs, higher costs, and high mean time to resolution (MTTR) to resolve these issues grows.
Worse, teams face the issue that problems diagnosed in one node of the infrastructure can cascade to other parts of the infrastructure and can impact applications and operations. In other words, the symptoms of the issue may not reveal the root cause or location of the issue.
Data Pipelines become complex quickly:
In a December 2018 webinar, attendees were asked which DevOps tools they used to debug big data applications. The poll revealed that log management, a first-generation technology for debugging apps, far exceeded other technologies. But arguably the more compelling finding was captioned by Wayne Eckerson, president of The Eckerson Group, a research and consulting firm: “The stack is a proliferation of tools, with none of them being a single source of truth.”
AIOps Tames Pipeline Complexity
Today, with AIOps, you can automate the monitoring and correlation of activities in the data pipeline and can uncover the myriad issues that can arise when 1) data volumes grow and 2) a range of disparate applications are used to process the data. Data applications are complex, they run on a wide range of architectures, and there are a lot of dependencies between those apps.
And so it is, as in other markets, that AI is shaping up to be a true enabler. Planners who have adopted AIOps realize that there are three keys to managing their pipelines: gaining visibility into the full stack, deploying advanced analytics, and developing AI-powered automation.
With advances in AI, big data teams are now using AIOps as the backbone of a system for self-service applications performance management. Big data and IT teams can optimize their AIOps platform for any desired SLA. And, while automation is clearly the big gain, AIOps gives operations teams the confidence that the parts of the big data infrastructure they “own” are optimized for performance. And that’s possible because AIOps enables role-based access control, allowing users to view just their own applications or those related to just their department or business unit.
Perhaps the completing piece is AIOps, when applied at a deeper level, can automate remediation, solving problems on its own. It’s not quite wizardry; it just looks like it. And that’s what AI was meant to do in the first place.
Feature image by Jean-Paul Jandrain from Pixabay.