DevOps / Monitoring / Tools

Kintaba Introduces Automation for Modern Incident Management

21 Dec 2020 11:00am, by

For Kintaba, when your company experiences an “incident” — servers and services go down, for example — there’s more to resolving that incident than simply alerting your site reliability engineers (SREs) and getting them up and running again. Instead, Kintaba manages all the steps, from declaring an event to handling the response to conducting a post-mortem, helping to automate all of the processes involved.

Now, Kintaba is launching an Automations feature that will insert business logic into its processes to help automatically include other appropriate members of an organization outside of those SREs into the incident management process, taking the decision of who to include out of the hands of someone trying to handle an incident in a hectic moment and instead making it predetermined.

“Automations is really meant to go and do things that we consider ‘busy work.’ I use that phrase in a pretty aggressive way, because we don’t want people wasting their time doing things like traversing the org hierarchy to make a decision about who should be added,” explained John Egan, CEO and co-founder of Kintaba, in an interview. “That’s a piece of business logic we should be able to capture.” The technology was built by former Facebook engineers.

Egan offered the example of an incident wherein personally identifiable information (PII) was exposed, rather than a server simply going down. In that instance, the solution is more than getting the service back online and involves more personnel that SREs, and in fact may include the involvement of a legal team, public relations, and others. The role of Automations here is to make sure that the responder simply tags the event with a label such as “PII,” which then triggers a series of events in Kintaba to notify all necessary parties. This is, Egan said, just the beginning for what Automations can do.

“It really gives us a platform inside of which we can continue to expand out the decision-making capabilities, where it’s appropriate for Kintaba to step in and make decisions automatically, and not in places where it’s better for humans to make those decisions,” said Egan. “It’s bringing in your PR people to go and respond externally. It’s bringing in your customer success folks. It’s bringing in the engineers who are working the problem. And then it’s that process of recording that human timeline of root cause analysis, and determining where the core problem is and what’s going to be done about it, allowing the owners and responders to reflect back with that context into what about the process that the company has allowed that failure to happen.”

While this sort of workflow might seem obvious, Egan explained that the current solution is often to stitch together numerous tools with scripts or processes. Kintaba Automations is meant to automate those human processes and unite them in one place.

“Automations is a good indicator of the direction that we want to go — more ability to simplify all of that painful overhead that humans have to deal with when they’re dealing with critical major outages or incidents. Everything we do is based on the principle of removing that overhead and busy work,” said Egan. “When I look at where Kintaba is going, I look at resilience within the organization through process — what changes as a result of these incidents. Kintaba should help you with that, it should help you with finding the team, assembling the team, working through the problem as effectively as possible, and then actually updating the knowledge that your organization has, so that it can operate more efficiently in the future and not ever have that incident again. Success in our industry is there’s no repetition. There’s an incident that happens once never happens again.”

Feature image by NeONBRAND on Unsplash.

A newsletter digest of the week’s most important stories & analyses.