What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Tech Life

How Engineering Drives Revenue in an Economic Downturn

Though difficult, it’s better to resist cutting costs: A healthy software engineering team is often the difference between a company thriving and a company failing.
Oct 17th, 2022 10:00am by
Featued image for: How Engineering Drives Revenue in an Economic Downturn
Image via Pixabay.

Software-as-a-Service provider BetterCloud saw 10% reduced customer churn after revitalizing its incident management as part of its site reliability engineering (SRE) practices.

Companies concerned about inflation and a possible recession are looking closely at all costs to remain in good standing during these challenging times. 

The engineering function is often regarded as a cost center, and most companies will say it’s fair game for budget cuts, supposing they will still “survive.”

However, nothing could be further from the truth.

A healthy software engineering team is often the difference between a company thriving and a company failing. Though difficult, it’s better to resist cutting costs because doing so prohibits the potential of achieving more pronounced gains. The saying “penny wise, pound foolish” comes to mind.

This is especially true when you consider product reliability, that it requires investment from all areas of engineering, and that it’s become more and more critical to business success. This approach, referred to as site reliability engineering, works to optimize uptime and reduce errors or sluggish performance which both translate to a bad customer experience.

Invest in Tools that Augment Engineering Talent

You don’t want to throw more engineers at the problem, which is already difficult given “the Great Resignation” or “quiet quitting” and a general software engineering talent crunch. A recessionary market is the perfect opportunity for forward-thinking companies to smartly invest in tools that augment human talent investments and reduce burnout by improving productivity with more streamlined workflows.

Simply put, reliability engineering helps companies deliver a better user experience. Happy customers stay loyal, grow revenue and justify the engineering function as a profit center.

No company promises 100% uptime (and no reasonable user expects it). Customer loyalty and the potential for increased revenue stem from trust in your brand, which is built over time. It helps to be transparent about service reliability levels. What that boils down to is how incidents are handled and resolved, as well as communication to customers during incidents. Resilient engineering teams should know pretty quickly when incidents occur, ideally before it affects the user. By acting quickly, they can immediately assemble a team to address the situation and provide ongoing updates. Taking this a step further, resilient engineers always want to learn from incidents. By studying retrospective reports, they can improve the system for the future.

Current State of Play

Technology leaders must communicate to the rest of the organization the implications of service reliability on their business. How do you prove that a more reliable product is critical to success? You can start by demonstrating that a less reliable product is a significant barrier to success. 

Proliferation of incidents: It’s a heady mix of increased complexity with cloud native, digital services and the pressure to deploy new features in competitive markets. In some cases reliability is volatile because larger organizations are migrating from legacy tech stacks to modern cloud native and that can cause issues downstream or challenges related to connecting with APIs and other third-party systems and environments. Depending on the industry segment, more engineering teams will also be faced with meeting stringent compliance regulations or are seeing increased cybersecurity threats.

Trouble finding engineers: While “the Great Resignation” has affected all types of industries and job functions, engineers are especially leaving in droves. Some have chosen new careers; others are simply burned out from increasing pressures to deliver at a higher velocity. Others are tired of drowning in technical debt and/or getting stuck in debugging cycles, which is not the most interesting or career-advancing work.

Lack of brand loyalty: The pandemic accelerated a trend that businesses had already begun to notice: Both consumers and B2B buyers are less brand loyal than ever. Brand switching might be cost-driven, but plenty of buyers also switch because they are let down by the product promise. SaaS providers with an increasing number of outages, especially ones that harm the business of their clients, will struggle to retain. Stephen M. Dick, vice president of cloud engineering and optimization at BetterCloud, saw firsthand the effect reliability engineering has on customer retention. 

A year after BetterCloud implemented Blameless, by revitalizing their incident management and adopting SRE practices, they saw 10% reduced customer churn. Using Blameless has ultimately helped BetterCloud feel confident about product reliability and their new incident response process. Implementing Blameless led to more logged incidents, paving the way for more meaningful metrics and ensuring completion of the retrospective (postmortem) report. The platform established a focused method of communication across the organization because everyone is aligned on a standardized process.

Companies have options to not only solve the above issues but create revenue-generating opportunities during a recession. The best solution for resource-deprived engineering teams is to identify which processes can be automated to the benefit of their workers and the overall company. This way, engineering teams remain laser-focused on digital service performance, availability and, essentially, what is deemed an acceptable level of reliability. Once this is well understood and agreed upon, engineers can work to improve how to deliver that level of performance with a mix of tech tooling and skilled resources. 

Here’s how tooling, automation and the right data insights can help when facing these challenges:

Create a Playbook: Streamline How You Manage Incidents, Learn, and Avoid Repeating 

Too often, incident management responses are a patchwork of processes, some outdated, and are often run by specific team members or “heroes” that carry tribal knowledge. Eventually, when they move on from the company, if the process is not codified, that knowledge is lost for good. A codified system coupled with a shared tool works to embody a universal process that outlives the tenure of a few subject matter experts. It also speeds up the onboarding process for new hires because the process is discoverable.

Share Useful Data Insights across the Company during Incidents that Affect Business

System downtime affects the whole company, not just engineering or DevOps teams. When a high severity incident occurs, engineers should be focused on resolving it. Sending comms and updates to stakeholders like executives and customers is an unfortunate distraction. Comms automation solves that issue with prebuilt message templates and saved recipients for every situation. This not only goes a long way to speed time to resolution, but it also refunds the engineer a significant amount of mental bandwidth. Dick, at BetterCloud, regularly shares reliability data to the BetterCloud board. His colleague, Clark Polo, director of SRE and DevOps, said, “We use our incident management process as a way to improve BetterCloud. If we don’t know where we are today, we don’t know where we need to be.”

Keep Engineers Fresh and Focused

Companies that do not have an automated incident management tend to throw more resources at the problem, which means they could be waking up a cadre of engineers at night to fix a burning problem. AIM uses advanced intelligence to communicate and assigns specific engineers who are experts in that field. Rather than use an all-hands email that every engineer will feel responsible to respond to, intelligent processing can secure the right resources required to solve the issue, thereby reducing the risk of employee burnout and potentially more costs.

Produce a Culture of Reliability

It’s never easy when systems fail. The urgency of incident response merits a focused mind. You can achieve this by handing off administrative tasks to a trusted automation tool. Keep everyone on task, updated and aligned to a collective vision. Keep the customer happy, and keep the business thriving. As David Levinger, senior vice president of operations at Machinify, said of his team, “We hire fantastic engineers who are diligent and want to help the system improve. No one on our team that’s pulled into an incident doesn’t know what to do, where to go and how to solve it.”

How a company communicates and responds to incidents can be the difference between losing business and driving a stronger relationship that leads to more business. System reliability is too important for organizations to rely on ad-hoc or manual processes, especially when automation can lighten the workload, respond quickly and maintain a level of consistency. No more hiring additional engineers to solve a problem when it’s best solved with the right technology. Allow engineers the freedom to work on what matters and watch intelligent automation handle the rest.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Simply.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.