BigPanda and the App Incident Problem That Is Only Getting More Complex

We know that one person can’t manage a new stack in any manual fashion. We also know with data and the automated data center that it’s entirely possible to manage apps that scale to millions of people with a team of just a few people.
Now operations teams face a new set of problems that comes with all the data generated from a new generation of specialized services designed to help manage the complexities of apps that run on distributed infrasrtrutures.
BigPanda offers a service that approaches the problem by using data science to manage incidents for apps that run across thousands of servers. Today the company received an endorsement for its approach with the news that it had raised a $7 million Series A round from Mayfield Fund and Sequoia Capital.
Incident response has some old school roots. HP and others pioneered the space, allowing small and large teams to manage what at the time were viewed as quite complex systems.
Today companies have to manage thousands of servers. To monitor the apps, companies have turned to services such as New Relic, App Dynamics, Splunk and Nagios.
But often, the data is managed in different silos, creating a complexity in itself. All that data needs to managed but more so in aggregate than in multiple queries.
BigPanda normalizes the data by collecting and defining it as one data model. It has an IT taxonomy that defines the data according to where it came from. It may process Webhooks data from cloud services or use natural language processing to analyze the data from emails that Splunk generates.
There are a number of incident management companies that address the issue that come with the new stack. VictorOps, based out of Boulder, uses data it collects from different incidents and the interactions of different team members. Like BigPanda, VictorOps summarizes the data from different monitoring services and cloud management technologies such as New Relic, Loggly, Amazon Cloudwatch, Pingdom, Crittercism and UserVoice. It also monitors data between team members.
PagerDuty also is one of the new operations management tools that has gained popularity over the past few years. It alerts operations teams via email, phone and SMS. it aggregates alerts and routes them according to priority.
Hundreds of companies are filling a space once dominated by legacy providers. BigPanda, and others like it, are reflective of a bigger movement that manifests itself in the DevOps culture that has come to symbolize IT’s transformation.
It’s also evident of a change in how manual labor is increasingly irrelevant in the data center. Instead, the operations person is on a daily basis managing thousands of alerts which is much different than it used to be.
@alexwilliams it was opening up a bridge and waking up on-call resource for each one of the systems involved & everyone troubleshooting
— Ranga (@rchakra1) October 28, 2014
Big Panda’s approach uses a corollary algorithm to help IT quickly track down issues. Still, as can be expected, customers still must fine tune the BigPanda service to get the most out of it. If they don’t, the customer might potentially get the wrong automated conclusion. The spectre of that scenario will only increase as more services emerge to manage new stack environments.
Feature image via Flickr.