Modal Title
Operations / Software Development / Software Testing

Bringing Principles into the World of Incident Management 

Let’s imagine a world where incident management tools are guided by principles designed to help you learn from and improve your response. 
Mar 28th, 2023 8:41am by
Featued image for: Bringing Principles into the World of Incident Management 
Image via Shutterstock

Incident management tools are in need of a fresh approach.

Why? Well, let’s take a step back and consider some common scenarios that play out when companies integrate incident management into their tech stack.

  • Company A has a tool that promises to help with incident management, but it does little to support those involved in the process after the initial flurry of alerts fires. The tool simply isn’t helpful beyond the initial stages, and it puts pressure on engineers to carry the load.
  • Company B adopts an end-to-end incident management tool that creates more problems than it solves. It’s infinitely customizable to the point of being a detriment and generally hard to use as a result. Nobody wants to use it — especially non-technical users — and adoption suffers.
  • Company C goes another route: building its solution in-house. But this takes much more time than they assumed it would, taking up engineering resources they cannot afford to spare. It’s successfully used because it’s designed with the organization’s specific context in mind, but it’s painful to maintain and there are constant demands for changes from those using it.

In the end, all of these companies throw their hands up in frustration.

Something isn’t working here.

It feels obvious to say that companies adopt incident-management tools to streamline their response processes and reduce downtime. But ideally, these tools would also help businesses improve their processes and help them learn from their incidents too.

But if we consider the very real situations that these companies are dealing with, it’s clear that this isn’t happening.

So where do we go from here? We start to ask much more of our incident management tools — namely to prioritize three principles in their products: simplicity, automation and learning.

Incident Management Requires Principles Like Anything Else

We all have principles in our personal lives. And it’s rare not to have this carry over to our work. But it seems the world of incident management today suffers from a lack of principles.

It often feels that some incident management tools exist solely to manage incidents. There’s incident declaration, resolution, closure — and not much in between. No opportunity for learning or insight and no process improvement.

The problem with this approach is that it lacks an overarching set of principles designed to ultimately help you respond better.

And in many ways, responding to incidents is just a single aspect of a great incident management tool.

By focusing on these principles, incident management tools can help businesses respond to incidents seamlessly, improve their processes and ultimately build better, more reliable products.

Prioritizing Simplicity

By nature, resolving incidents can be pretty complex. But why should that complexity carry over to your response tools?

When you simplify your incident response, you remove a hurdle that inevitably prolongs your downtime.

This might look like using tools that incorporate everyday language instead of technical jargon and use an intuitive UX designed to be approachable and accessible and to leave no room for ambiguity.

Prioritizing Automation

Ad hoc or ambiguous processes can only work for so long. And even if they are working right now, it doesn’t feel like a stretch to say that more efficient incident response processes can create much more desirable results, such as faster time to resolution.

What’s one way to create more efficient workflows? Automation.

By having deliberate and simple incident response processes powered by automation, you can ensure that everyone, regardless of how long they’ve been a responder, can feel confident and comfortable managing incidents.

Imagine this: You close out an incident and get an automatic prompt to create a post-mortem so that it doesn’t slip through the cracks. Or you escalate an incident to a higher severity and senior folks get looped in automatically, improving visibility into something that has the potential to grow into a bigger issue and nipping it in the bud before that happens.

It’s worth highlighting that automation can be a slippery slope, so much be used judiciously. Machines are clearly very efficient, but it’s important they remain an assistive function during incidents, rather than the thing driving the entire response. Nobody wants their automation to helpfully delete all their servers because it’ll clear the errors.

The clearer your processes are, the better your responses will be, and automation can help you get there.

Prioritizing Learning

Ideally, we’d try to avoid incidents at all costs, but unfortunately, they do happen. And even when we try our best to avoid them, repeat incidents can happen too. Whilst this might feel frustrating, there’s an underlying opportunity here: the opportunity to learn.

Prioritizing learning from your incidents not only helps you prevent those repeat incidents that cut into engineering resources, it also helps you build more resilient products over time.

Unfortunately, many tools today just don’t consider this part of the incident management pyramid, so learning feels like a complete afterthought.

Let’s Bring Calm into the Often-Chaotic World of Incident Management

Responding to incidents is already hard enough — your tools shouldn’t magnify that.

That’s why we created incident.io: to bring principles into the world of incident management that make it a truly effortless experience.

We believe in simplifying incident response so everyone can feel empowered to declare incidents. We believe in having automated processes around your incident response that are easily repeatable, simple to navigate and inspire confidence.

And even when incidents are resolved successfully, we believe in the value of learning from them, which is why we’ve made it easy to do just that with a dashboard full of valuable and actionable insights designed to help you identify areas for improvement.

In short, we’ve designed a product to take the pain out of incident response.

Here’s how we’ve embedded the principles we’ve laid out above in our product.

Easy to Use and Intuitive 

The simpler a tool is to use, the faster you can complete the task that tool is intended for.

This holds for incident management as well. We remove the complexity that typically comes with the incident response process so folks can spend less time running through tutorials and resolve their incidents more effectively.

What does this look like in-product?

Easy-to-understand incident declaration forms allow anyone to come in and raise an issue, an intuitive user interface prioritizes ease of use via Slack and our dashboard; simple, approachable language; clear action buttons and a single command interface.

Because of this, adoption is simple and pain-free, it allows you to get up and running in just a few minutes.

Automated Processes That Give Responders Peace of Mind 

Every company is different and faces a unique set of challenges. Your incident management tool should accommodate this.

For the incident response process, you can either use the sensible defaults that we include — for instance, all incidents having a severity level and an incident lead — or customize your processes to suit your needs. Regardless of your route, there will still be guardrails to ensure that you’re running the most efficient incident response possible.

On top of this, you can also set up automation through our Workflows feature that helps optimize your incident response. For example, alerting a specific group of people whenever a high-severity incident gets declared.

You can also set up nudges, prompts and reminders so folks can always anticipate what they should be doing next. For example, creating a post-mortem shortly after closing out an incident.

The result is a process that everyone can follow without ambiguity or fear.

Dashboards That Give You Actionable Insights into Incidents

We’ve stressed the importance of learning from your incidents to help you build more resilient products. This is why we’ve created dashboards to help give you insights into how your organization really works and where you might want to invest time to improve things.

From more elementary data points such as average time to resolution, through to deeper human insights like incident workload and most active response teams, we give you the insights you need to improve your response processes further.

And because we’re all about simplifying things, these dashboards are pre-built, so you can dive right in with zero upfront effort.

Bring a Fresh Approach to Your Incident Management 

Incident response is a critical function for any tech-forward business, so it’s important to ensure that your tools work as an accelerant, not a hindrance.

Having a tool that prioritizes simplicity, automation and learning will result in an end-to-end incident management process that gives everyone more confidence and more time to build a better product.

If you want to see what a tool that prioritizes simplicity, automation and learning looks like, sign up for our demo.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma, Calm.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.