DevOps / Networking / Security / Sponsored / Contributed

Black Friday Downtime: How to Avoid Impacts on Your Business

3 Sep 2021 10:00am, by

Hannah Culver
Hannah is a solutions marketer at PagerDuty interested in how real-time urgent work plays out across all industries in this digital era.

It’s a brisk Friday morning in November. You’re sipping your coffee and mentally preparing yourself for the day that’ll define your fiscal year. How will you fare this Black Friday? Are your teams prepared?

We’ve all heard the frightening downtime figures, with companies losing thousands per minute. That’s on a normal day. On Black Friday, these costs can be astronomical.

To have downtime or degraded website performance on such a critical day means more than just a single lump sum of lost revenue; it also means lower customer satisfaction, reduced brand loyalty and a potential leg up for the competition. It’s also highly demoralizing for teams.

As 2020 heralded in the biggest year yet for online traffic, retailers are even more pressed to deliver exceptional digital experiences. According to Adobe’s 2020 Holiday Shopping Season Report, “The online holiday season exceeded $188B resulting in a strong growth rate of 32% over the 2019 season.”

Chart of Online Holiday Spend by Year from Adobe

This trend didn’t start with COVID-19, however. A survey conducted by National Retail Federation showed that in 2019, 124 million people shopped in stores while 142.2 million shopped on retailers’ websites. This is not an insignificant number. In 2021, Blackfriday.com predicts that combined online Black Friday and Thanksgiving spending will grow about 20% year over year and hit $17 billion.

It’s clear that a reliable digital experience will ultimately determine the success of retailers during this crucial time frame. So, how can your teams prepare? There are three things you can consider now that will help make your holiday season successful:

  1. Invest in best practices for peak traffic periods.
  2. Create a psychologically safe environment.
  3. Arm your teams with the right tools.

With these three, you can prepare your teams for Black Friday and what lies ahead afterward.

1. Invest in Best Practices for Peak Traffic Periods

Hypercare is the period of time where an elevated level of support is available to ensure the seamless adoption or operation of a system. During times of peak traffic, when stakes are high and service needs to be nearly flawless, many teams will shift to this operating mode. While hypercare can take many forms, there are some common ways your team can get ahead of the curve:

  • Implement a code freeze: Make sure no changes happen close to and on the day of your peak periods unless they pertain to resolving customer-facing incidents.
  • Document and practice major incident processes: Ensure that on-call team members are prepared to be on call, familiar with incident management processes and know-how to engage with other teams. A few dry runs can help team members build confidence.
  • Set up proper observability: Establish real-time monitoring, logging and tracing, as well as synthetic monitoring to maintain desired levels of performance.
  • Set up dashboards for visibility: Validate and create meaningful business services so that stakeholders can be alerted of any issues occurring in the environment.
  • Establish a stakeholder communication process: Ensure there is a clear process for communicating with stakeholders, including where stakeholders can find more information about critical incidents.
  • Load test, capacity and chaos plan: Leverage existing load testing and chaos engineering tools to establish a baseline. Ensure capacity exists to handle the expected peak demand.

See our Hypercare Checklist for more details on how PagerDuty helps support these initiatives.

It’s important to note, however, that these are process solutions. You also need to make sure that the people and technology are there to support it. In particular, making sure that team members are prepared is essential. One way to do this is by promoting psychological safety.

2. Create a Psychologically Safe Environment

Workplace psychological safety is a concept coined by Amy Edmondson of Harvard University and defined as, “A shared belief held by members of a team that the team is safe for interpersonal risk-taking.” In other words, this means that your teams feel safe to surface issues without fear of retribution, make mistakes without fear of punishment and try new ways of working knowing that they have safety nets and room to learn.

So, why does this matter to your teams? Well, studies have shown that psychological safety produces better results. According to data by Gallup, “Just three in 10 U.S. workers strongly agree that at work, their opinions seem to count. However, by moving that ratio to six in 10 employees, organizations could realize a 27% reduction in turnover, a 40% reduction in safety incidents and a 12% increase in productivity.”

Happier, safer and more productive teams. This is what psychological safety has to offer. Further studies have shown that specific leadership types are more effective at curating this. McKinsey & Company found that leaders who were both consultative and supportive were able to foster a safer environment for their teams.

McKinsey & Company’s ideal leadership style for psychological safety.

A change in leadership style can make a big impact on how your team operates day to day. For example, imagine that, despite your best efforts, you have an outage on Black Friday. As a leader, you take control in an authoritative style. You tell your teams exactly what to do, and as time is of the essence, there is no discussing it. Yet, this produces poor results as the SMEs (subject matter experts) resolving the incident have a more accurate view of the situation. However, due to the lack of psychological safety, they don’t speak up. This incident lasts longer than necessary and has a direct impact on revenue and customer experience.

Let’s reimagine this situation. You understand that there is an incident. You ask your team for their input on how to best resolve it, and if they need anything from you. You allow your SMEs to resolve the issue and offer help in the ways that they request. The incident is resolved speedily, if not in the way you had imagined. This positive result is a direct effect of psychological safety.

During these crucial moments, trusting your team to do the right thing matters. It matters to the incident resolution time and revenue, but it also matters in the long term to the team you lead. More psychologically safe teams are teams who will stay, and during a time like the “Great Resignation,” retaining talent is of utmost importance.

Failure is certain, and psychologically safe teams are able to handle incidents better, even on high-stress days like Black Friday. You can make sure your teams are even more prepared for these critical moments by putting the right tooling in place.

3. Arm Your Teams with the Right Tools

Tooling can’t prevent a failure on a day as important as Black Friday, but it can help teams recover from failure faster and learn so the problem won’t be repeated next year. Here are four key capabilities you should look for in your digital operations management platform that will help make Black Fridays — and every other day of the year — easier:

  • Eliminate noise and know who is on call: As the number of incidents increase year-over-year (19% from 2019 to 2020), so does the level of noise responders need to contend with. Find a solution that eliminates this noise so your teams can focus on what really matters and know which signals require their attention. Additionally, you should be able to manage on-call easily, without clunky spreadsheets or call trees. This helps you get the right SMEs on scene faster, resulting in better ack% (acknowledge percentage) and clear ownership between multiple teams.
  • Initiate seamless incident response: During incident response, you need to work quickly. An ideal solution would be one that allows you to work within the tools you already use — like Slack, Teams and Zoom — and helps you minimize toil. Automation is key to free up your response team for more mission-critical tasks. Use automation to coordinate the response process, assign roles to responders and even run diagnostic scripts or auto-remediations with the click of a button.
  • Learn from your mistakes: When the incident is finished, it’s natural to want to take a deep breath and pat yourself on the back. Once the feeling of relief ends, you may be curious — what happened to cause this issue? Especially on days as important as Black Friday, every incident is fair game for scrutiny. An ideal tool would help you understand what went wrong, how to fix it and how to prioritize the changes you need to make before the system fails the same way twice.
  • Keep an eye on your team health: Black Friday is still one of the most important shopping days of the year. Whereas most of the shopping used to be concentrated on this day, now holiday deals begin as early as September and carry on right until the holidays begin. With this in mind, it’s important to protect your teams against burnout. You need a platform that can help you understand how both your systems and people are faring under the holiday pressure.

If these capabilities sound like something your teams could benefit from, you still have time to adopt a solution that will help make your Black Friday a real doorbuster, whether it’s in person or online.

Try PagerDuty’s 14-day free trial today. Or, if you want to learn more about how retailers are approaching this period of rapid digitalization, check out our eBook, “Impact of Downtime on Retailers.”

The New Stack is a wholly owned subsidiary of Insight Partners. TNS owner Insight Partners is an investor in the following companies: Real.

Featured image via Pixabay

A newsletter digest of the week’s most important stories & analyses.