Favorite Social Media Timesink
When you take a break from work, where are you going?
Video clips on TikTok/YouTube
X, Bluesky, Mastodon et al...
Web surfing
I do not get distracted by petty amusements
DevOps / Software Development / Tech Culture

How to Tackle Tool Sprawl Before It Becomes Tool Hell

It’s no use complaining that context switching between tools causes problems. The key is to prove it with data and stories as irrefutable proof.
Aug 31st, 2023 6:18am by
Featued image for: How to Tackle Tool Sprawl Before It Becomes Tool Hell
Image from Tommaso Marangoni on Shutterstock

Today’s digital-first companies know their customers demand seamless, compelling experiences. But incidents are inevitable. That puts the pressure on operations teams already struggling with a heavy workload.

Teams looking for novel ways to tackle these challenges often hit a formidable roadblock in the form of tool sprawl. When the world is on fire, swivel-chairing between tools while trying to get the full picture is the last thing incident responders need as they try to resolve incidents and deliver a great customer experience. But complaining will get them nowhere. The key is to be able to articulate a business case for change to senior leaders.

Into the Valley of Tool Sprawl

Digital operations teams may have a slew of poorly connected tools across their environment, handling event correlation, internal communication, collaboration, workflows, status pages, customer-service case management, ticketing and more. Within each category, there may also be separate tools doing similar things. And they may be built to or governed by different standards, further siloing their operation and slowing things down.

Incident response is a collaborative process. It is also one where seconds and minutes of delay can have a real-world impact on customer experience and, ultimately, revenue and reputation.

Stakeholders from network teams, senior developers, database administrators, customer service and others may need to come together quickly to triage and work through incidents. Their ability to do so is impaired when much time and effort must be expended on simply jumping between tools to get everyone on the same page and in the same place to tackle incidents. That’s not to mention the extra licensing costs, the people to manage and maintain the tool, and the need for additional security patching, etc.

How to Tell the Right Story

Incident responders need a unified platform to tackle issues but without the need to constantly switch context. Integrating and consolidating tools can reduce sprawl and drive simplicity end to end, underpinned by a single set of standards. We’re talking about one common data model and one data flow — enabling teams to reduce costs and go faster, at scale.

Such platforms exist. However, engineers and developers typically don’t have the power to demand change and drive adoption. But that shouldn’t stop them from asking for change. To do this, they must play a longer game, one designed to influence those holding the purse strings. It’s about telling a story in the language that senior executives will understand. That means focusing on business impact.

Humans are naturally story-driven creatures, so senior leaders will likely respond well to real-life examples of how disruptive context switching can be. When speaking to senior leaders, teams should seek to bring problems to life with a story.

Consider the most recent incident that’s affecting customers. How did your team identify and triage the incident? In many cases, teams don’t have a centralized place to capture incident context. This leads to them having to chase information across systems to understand what happened and access the context needed to start remediation. This adds critical time to the process and, in the larger incidents, a loss of customer trust.

Once the issue has been identified, you then have to communicate to the right people. This involves a lot of tools to pull in incident responders and subject matter experts. On top of this, teams also need to communicate about incidents to business and customer stakeholders, which again requires switching between different systems to craft and send messages.

Much of this is manual work that could be automated, but that’s only possible from one place, not disparate systems. The intent isn’t to get to a single pane of glass, which can be a fool’s errand as tools and processes evolve, but building a first pane of glass with the necessary context to immediately resolve issues is a great target.

Using this scenario, don’t be shy in naming all the specific tools and systems teams had to switch between to get to the end goal: uptime. Build a picture of the volume you are having to juggle. It’s also important to weave in the impact of the tool sprawl on the business.

A good starting point is to calculate how much time managing these disparate solutions added to resolving the last SEV 1 incident. Then multiply the figure by how many such incidents there were in the previous 12 months, and then work out how that translates into team costs.

These are the kinds of calculations that can make a big impact on senior decision-makers. It’s about showing the financial and temporal impact of tool sprawl on incident response, and ultimately, the business. If the figure is impactful, it might be enough to start a conversation with the people who can make a difference. The same capability can then be applied to lower severity but more frequently occurring issues, which can solidify your position.

By bringing the problem to life and showing the business and, most importantly, customer impact, teams can have practical conversations with decision-makers that can help to drive change and bring incident response processes into one place.

One Tool to Rule Them All

The valley of tool sprawl is bad enough. But combine it with a deluge of manual processes, and you have a recipe for too much toil and multiple points of failure. Maintaining and managing multiple tools is time-consuming, unwieldy and expensive. It requires continuous training for staff and disrupts critical workflows at a time when seconds often count. In this context, something as simple as an operations cloud to capture incident context from multiple systems of record and automate incident workflows can make a huge difference to responder productivity.

Centralizing on a single, unified platform for digital operations should be a no-brainer. But to get there, teams have to engage senior decision-makers. It’s no use complaining that context switching between tools is causing problems.

The key is to prove it with data and stories to provide irrefutable proof. It’s the way to win over hearts, minds and wallets — and lay a pathway out of the valley of tool sprawl, toward optimized operations.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.