CI/CD / Monitoring

Surviving Peak Online Traffic Events

21 May 2019 3:00pm, by

Dave Karow
Dave has three decades of experience in developer tools, developer communities and evangelizing ways to make software delivery more sustainable. Dave grew up just off-campus from Stanford as Silicon Valley morphed from defense to chips to software and finally internet services. Dave's front-row seat to all of this action equips him with a unique point of view on the long arc and repeating themes of technology evolution. As CD Evangelist at Split Software, Dave speaks on aligning progressive delivery (i.e. gradual rollouts of new code) with observability of system health, user experience, and user behavior.

The big day has arrived, and consumers are flocking to your digital properties in droves. Your site is breaking traffic records left and right. In a single hour, you have seen several times more sessions than you ever have in a day. Visitors are so excited about your offer that they can’t seem to spend their money fast enough. On the demand side, things could not be better. But your joy is short-lived. Suddenly, your site slows to a crawl before finally crashing.

That’s the e-commerce nightmare scenario when it comes to peak online traffic events, like Black Friday, Cyber Monday, Valentine’s Day, the Super Bowl, or a major product launch. Slowdowns, downtime and surprises send your customers running to your competitors. And when they occur during high profile events, they can do real damage to your company’s reputation.

But while these big events are fraught with peril, they are also ripe with opportunity. By fixing issues before the big traffic arrives, problems can be avoided. The key to coming out ahead is knowing in advance that your web and mobile apps are able to support the spike in activity. The best way to do that is by running load tests ahead of the big event to identify and remove potential bottlenecks.

Make Sure Your Tests Are Valid

Of course, load testing means little if it’s not done right. It has to be valid. It’s not uncommon after systems degrade or crash, to be assured that load testing was done, only to find out that the person responsible for the tests simply “checked a box.” There was not enough information, time, or budget to do what was required to make the testing valid. As a result, the tests simply were not designed to test the conditions that were actually encountered.

It sounds like common sense, but it bears repeating. For a load test to be considered “valid,” it must model as accurately as possible the real-world performance challenges you expect your digital channels will face. By simulating problems customers could potentially encounter, you give yourself the opportunity to address those issues before real, live consumers actually arrive at your site. Just be sure and start the testing early so that there is enough time to fix problems. Valid testing will usually uncover multiple opportunities to address potential issues, but it won’t matter if you don’t allow enough runway to act on them. Your first big test should be designed and run weeks or even months before the big day.

Besides valid load testing, there’s something else you can do to ensure your digital channels don’t get bogged down on the big day. 

Use Ops Toggles to Quickly Reduce Loads

Most major sites have the ability to de-feature themselves in areas where problems might arise on busy days. You can accomplish the same thing by using feature flags to quickly shed loads during peak traffic. These flags, called “ops toggles,” control operational aspects of a system in production. If you are implementing a new feature with unclear performance implications, you might want to introduce an ops toggle so that you can disable or degrade the feature quickly, if needed. For example, disabling a “Recommendations” panel on your home page that is relatively demanding to generate.

Another example common in e-commerce is inventory management. If you have a very large inventory and you do not expect to sell out of anything, it is usually way more efficient not to handle inventory management in real-time as customers are placing orders, but to do it by batch later. However, by using an Ops Toggle, you could have it on as you normally would but also have the ability to turn it off if traffic gets too heavy and it’s slowing down your site.

Or maybe you are using a third party for something that is fun or interesting but not necessarily essential. If you are not sure if they can handle the heavy loads during peak traffic, you might want the ability to toggle them on and off in production. But you might also want the ability to toggle them off during testing. There are a couple of reasons for wanting to do this. One, you may not have permission to test the third party with a load test, so you may need to toggle them off at your larger load levels. And two, if they have a bottleneck in the middle of the test, you don’t want to have to cancel the whole test, you just want to be able to take them out of the loop.

Ops toggles are an unusual and special sort of feature flag. With most feature flags, your goal is to retire them once you have gained confidence in the new feature. With ops toggles, you keep them in place indefinitely so they are always there for you to use on a moment’s notice.

It’s never too early to start preparing by designing meaningful load tests and wrapping resource-intensive “nice to have” features with ops toggle feature flags.

Feature image via Pexels.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.