Telus Takes First Step Toward AI/ML with IT Automation
CloudBees sponsored this podcast.
In the telecommunications industry, the level of manual labor involved in maintaining large and complex applications at scale on a traditional software architecture has become untenable. In an effort to avoid costly service outages, system administrators are frequently up all night monitoring dashboards for spikes in activity that may signal trouble, said Sana Tariq, a senior architect of Exchange-to-Exchange (E2E) service orchestration at Canadian telecommunications service provider Telus.
“I don’t think that should exist anymore, the staying up all night,” Tariq said in a livestream and podcast with The New Stack at Open Source Summit held in Vancouver, B.C. this past August. “We need to advance to the point where we trust the algorithms to act on our behalf.”
To cut down on support costs and fatigue, Telus, and the broader telecom industry, have begun the difficult task of automating application management using artificial intelligence (AI) and machine learning (ML). But first, they must migrate to cloud native technologies, managed by DevOps processes using continuous integration and continuous delivery (CI/CD) tools and pipelines.
“Cloud native is really the foundation of our ability to start using ML and AI” to manage and automate infrastructure, Tariq said.
Telus has begun building a consolidated analytics model that absorbs real-time data in many formats, with a consolidated runtime vision. The aim to manage all of the data they are currently producing to set the stage for training future algorithms for IT automation.
The first step was to identify a good use case for ML automation that will produce a large return on investment in a short period of time, such as optimizing the use of cloud resources, Tariq said. Then the organization can create a good data model based on historical data for that use case. This involves combining data from multiple sources and formats to create a data lake that can then be analyzed in a “static closed loop.” Finally, the ML algorithm can learn from an action taken on the static data to create a “dynamic closed loop.”
At first, maybe the algorithm triggers a human intervention to analyze the actions, but eventually, humans are taken out of the loop altogether.
“There is a mindset shift, there is a comfort level that will eventually come and I see that will happen in a couple of years when we see that the accuracy of our models has reached that stage where we can really trust it,” Tariq said. “That’s how to slowly build the roadmap to more complex use cases.”
In this Edition:
1:40: Tell us a little bit about yourself
6:08: What were you seeing in the world post-graduation that has shaped your career going forward?
11:55: Discussing alert fatigue and the benefits of machine learning in this space
15:34: How do cloud-native architectures fit into the story here?
18:13: What are some of the things you’re learning as you’re starting to prepare that data for these container architectures?
21:56: Exploring the way that teams are organized at Telus