PagerDuty sponsored this podcast.
Monitoring today’s highly distributed and often immensely complex and decentralized environments requires unprecedented capabilities and reach.
SignalFx co-founder and CEO Karthik Rau and the CTO Arijit Mukherji spoke with Alex Williams, founder and editor in chief of The New Stack during the PagerDuty Summit in San Francisco about the new age of monitoring and how it has become a core capability in managing infrastructures.
A core capability of effective monitoring consists of meeting the demands of highly distributed applications. “In a traditional enterprise architecture, you have a monolith and it’s running on a single server. There’s a lot of monitoring that you can do locally to understand if instances fail,” Rau said. “Today’s architectures are more and more distributed — you have VMs, you have containers, you have distributed architectures and distributed databases that might be running on tens or hundreds of nodes. And so what’s happening on one instance is not as interesting as what’s happening across the collection of instances since performance matters across an entire service.”
Monitoring systems must also be able to keep up with changes dynamically increasing at an exponential rate, Rau said. “So instead of just doing one or two updates a year, more and more people are embracing DevOps. Containers give you a lot of flexibility because you separate the application stack from the underlying OS, and the security and operating system footprints,” Rau said. “Developers can thus push changes out directly with containers and that’s enabled tremendous developer velocity where you can have individual teams pushing changes out directly into their individual services and you can have a thousand times the number of code pushes happening in a given period of time.”
One of the more exciting modern trends to emerge in computing is how organizations often rely on their own developers to design their IT infrastructures and applications, while often working in open source environments. This trend has, of course, affected monitoring applications.
“In the old days, because your applications were monoliths and because you weren’t making as many changes, the primary value of a monitoring system was data collection and instrumentation. So you got an agent from a vendor and it automatically gave you visibility into your application because…which wasn’t the app that you developed [since you were] probably running someone else’s app,” Rau said. “In today’s world, more and more people are running an open source stack or they’re running a stack that they’ve built themselves, so it’s a lot easier to collect the data that you care about. The challenge is really becoming the analysis of all this data because there’s so much more data.”
As mentioned above, it is critical to monitor the collection and attributes of data and applications holistically in today’s highly distributed environments. “We like to say monitoring has really turned into an analytics problem, so that if you don’t have a more progressive approach of collecting the data, doing real-time analysis on it and identifying the outliers and the patterns; then you’re really just going to drown in noise,” Karthik said. “You’re just going to get further and further behind and you want to be able to operationalize these new models.”
At the end of the day, effective monitoring systems must embrace the new computing dynamics and associated dependencies as described above. At the same time, monitoring ChatOps, such as Slack, in use by DevOps can serve to help detect and pinpoint problems, even before they emerge.
“In an organization where there’s not just a single team but there’s multiple different microservices teams perhaps located in different places, how do you know what’s going on?” Mukherji said. “Monitoring social signals, for example, gives you amazing insight into if there’s something brewing, something that might become a problem.”
In this way, a system cannot possibly monitor and detect every possible combination of every problem that may or may not later occur, while relying on human communications by monitoring ChatOps can help to fill in the gaps. “Humans are aware of what’s going on and they’re looking at the systems.” Mukherji said. “So a lot of times, looking at non-computer generated insights can actually help.”
In this Edition:
4:43: Communication across ChatOps and other points of data
6:11: When you think about these tools like Slack, and how they are connecting people, how can you use the data from that to better understand how both developer and operation teams work together?
9:48: Discussing teams, DevOps, and the role that people play on these teams.
13:40: Exploring the concept of observability
15:38: Should the developer be responsible for building those applications that are observable?
18:54: Is it the hopeful intention that this platform will focus the operator’s role to some extent by being able to focus more on the physical layer?