Three Ways Automation Can Improve Workplace Culture

Talent has always been important to the success of IT operations. But in a world where expertise is increasingly hard to find, employers must work much harder to keep their best people. Unfortunately, job satisfaction today remains low in many organizations, with incident responders feeling overworked and burnt out. The culprit in many cases is toil: manual, repetitive and tactical work that lacks enduring value. If left unchecked it will destroy the kind of productive, innovative workplace culture that so many developers and engineers aspire to be a part of.
Automation can be the answer to this challenge. However, getting there is not necessarily easy when results have been mixed in the past.
How Toil Can Take Its Toll
Toil as a concept was first popularized by Google and its site reliability engineering (SRE) movement. Understanding what it is and how it can be contained is the first step toward freeing up engineering talent to work on the tasks that will build enduring value.
Let’s be clear, toil isn’t unnecessary work per se. In fact, many organizations would come to a halt if it didn’t get done. It’s more that this is work that adds no lasting value for the organization. Examples include schema updates and rollbacks, network, storage quota and DNS changes, and adding capacity and users. Toil can also come from unforeseen incidents that demand manual intervention, such as diagnostics, performance checks and configuration changes.
Toil is particularly dangerous in that it sucks talent into spending their time on low-value tasks. Organizations wanting to reduce it find they need to commit more engineering time, perhaps to build automated processes to replace manual effort. Or to enhance a service in other ways so that it doesn’t require any maintenance intervention. If not managed carefully, toil might increase to such a level that there simply isn’t any engineering capacity left to correct it.
On the other hand, attempting to create an IT operations function where there’s no toil at all simply isn’t attainable. The volatility of modern organizations means new and unexpected developments will always appear and require toil to fix. A more practical approach would be to try and minimize the level of toil.
The Struggle Is Real
Unsurprisingly, high levels of toil can take a heavy toll on talented engineering and developer teams. It’s not just the burnout they might suffer from excessive manual, repetitive work, but also the feeling of career stagnation that comes from having no time to learn new skills.
Toil also creates a vicious circle where more manual work leads to more human error, which in turn demands more time to fix. For the organization, the knock-on effects are obvious: a shortage of capacity, excessive operational support costs, an inability to progress strategic initiatives and an inability to retain and acquire top talent.
These are not hypothetical challenges. Research from 2022 reveals that on average 54% of incident responders are being interrupted outside of normal working hours. And 42% worked more hours overall in 2021 than they did the previous year. That implies processes are not being optimized with automation to reduce toil. The concern is a vicious cycle where employees depart as a result of burnout, ensuring a higher workload and a greater chance of burnout for those who stay.
Instead, talented teams should be working on value-add engineering work that is creative and strategic. So how do they get there?
How Automation Can Help
The truth is that automation still isn’t used to its full potential in most organizations. Despite significant outlays in the past, automated workflows are not always trusted and certainly not made available to all those who could benefit. Organizations need essential infrastructure for critical work that is able to automate workflows across digital operations. This essential infrastructure can become a significant workplace culture force multiplier in three areas.
1. Diagnostics Automation
Research reveals that 50% of a responder’s time is spent working out if there is a problem and who is best to contact for additional support. By this measure, half of an incident’s lifespan is spent on diagnosis and triage rather than remediation. Automating the diagnostic phase of an incident — pulling data to work out the severity of an incident, what went wrong and how — can save first responders significant time and overcome their potentially limited subject matter knowledge.
Ultimately, it will ensure responders triage incidents more efficiently without needing to bother subject matter experts (SMEs) for diagnostic data locked away in production systems. It also means they only contact engineers who can resolve a specific issue and share diagnostic info. For further value-add, auto-collected diagnostic data can be used in post-mortem exercises for continuous improvement.
2. Runbook Automation
Runbooks provide standardized, detailed instructions for quickly and effectively working through common issues and tasks. They’re particularly useful in incident response, but in many organizations manual processes are still the norm, undermining their usefulness and creating extra work.
Runbook automation can help by standardizing operating procedures, defining automated jobs incorporating existing automation, and safely delegating these processes as APIs and self-service requests to other stakeholders. It will reduce toil and human error, freeing up the time of SMEs across incident response, service requests, business continuity and other IT operations use cases.
3. Self-Service Automation
At the heart of automating diagnostic and runbook processes is the idea of empowering non-SMEs to tackle the jobs that they would otherwise be forced to escalate. It’s a great way to get rid of excessive ticket queues, which can create bottlenecks, silos, communication problems and management overheads. Replacing ticket queues with pull-based self-service interfaces eliminates wait times and shortens feedback loops while eliminating SME toil. The ticket queues that remain are for true exceptions, like logging bugs or making enhancement requests.
Happiness Matters
Job satisfaction can’t be ignored. Developer talent is hard to find but easy to lose if teams are swamped with repetitive manual tasks when they should be innovating for the company. They want to work on interesting projects. They want to feel productive. And they want a decent work-life balance. Automation can get everyone working toward the same goal, rather than drowning in a swamp of toil.