DevOps / Networking

How Expedia Prepped for Superbowl-Sized Spikes of Traffic

21 Jun 2022 5:00am, by

Online travel agency Expedia produced its 2022 Super Bowl commercial with the goal of managing a “traffic surge,” as Juliette Howland describes in her recent blog post. Howland states that the worst thing that could happen is to go through the “great expanse and effort” to get ready but then finding out that “site can’t handle it.”

In order to prepare for the potential surge, Expedia followed a similar model that online vacation rental service Vrbo (part of the Expedia Group) has found to be effective with its own Citrus Bowl sponsorships for a number of years now.

What Vrbo experienced is that the traffic spikes very quickly, approximately 1-2 minutes from base to peak and a slightly longer time to drop off, 5-15 minutes. Since autoscaling doesn’t happen that quickly, developers had to come up with alternate solutions to those spikes.

Due to the amount of fluctuation taking place with ad placements and their timing, it’s very common for engineering teams to get very little detail other than general high-level information regarding the ads themselves, calls to action and timing. The teams then make decisions based on the information provided and adapt if and when they receive additional details.

In order to do this, the team must “plan, prepare, and monitor” big ad events/sponsorships. They do this with pregame testing and a good old-fashioned staff roll call.

Testing

Peak capacity tests (PCTs) on Expedia lodging pages began in the spring of 2021 in preparation for an unrelated project and were scheduled to pause from January to early March of 2022 as those months are historically peak traffic time for Vrbo (there are shared services between Expedia and Vrbo). With news of the Super Bowl commercial reaching developer teams in November of 2021, it was decided to resume PCTs in early 2022.

PCTs’ for Expedia are still in their earlier stages when compared to Vrbo and consist of a handful of initial GET requests and a couple of POST requests for pages with the highest traffic and most potential for increased traffic from Hotels.com.

To mimic gradual traffic increases while heading into “peak season,” Expedia developers ran load tests which included gradually increasing traffic for 60 minutes and then holding for 30 minutes.

The first estimated run times caused concern for the engineering teams as it was revealed that there was also a new Vrbo campaign debuting on Super Bowl Sunday outside of the official Super Bowl commercials.

Without a specific run time released, engineers had to prepare for the biggest challenge: If the ads run around the same time, shared services would get hit with a much higher than usual amount of traffic.

Developers started running “tandem tests,” which meant performing Vrbo and Expedia lodging PCTs at the same time. To start the tandem tests, Expedia’s traffic was based on industry reports from other brands’ past Super Bowl commercials. Vrbo used information from the 2019 to 2020 Citrus Bowls.

Shortly before Super Bowl Sunday, the official run times were revealed. The Expedia commercial was slated to run in Q1, which has a high amount of viewers calling for more spike testing. The Vrbo ads would all run during the pregame show, which had a lower volume of viewers — meaning spike testing was no longer needed.

Since autoscaling doesn’t happen fast enough to handle these bursts, the engineering team’s first impulse was to have all teams prescale as much as possible. After discovering that they hit the upper limit of Amazon Web Services instances, they spent the last few days before the Super Bowl running tests with some autoscaling and some prescaling to find that “sweet spot” where they weren’t pushing instances to the point of falling over but could still handle a traffic spike.

Roll Call

The idea of a roll call was first introduced in 2020 for the Citrus Bowl; it helped the Expedia Group avoid outages due to the availability for immediate action. A roll call basically means all-hands-on-deck, or all available engineers waiting on stand-by should things go awry.

Having an engineer ready near a computer eliminates any lost time while waiting for developers to get to a computer and become ready to respond. If no roll calls are needed early in the day, later in the day roll calls are canceled.

Since the Expedia Group has teams all over the globe, there is always someone available to actively monitor the site and site traffic during the ad time and surrounding times in the event that the site goes down or experiences a failure somewhere.

In Conclusion

The Expedia Group saw increased traffic and no disruptions to any services or degradation to any user experiences. “Valuable lessons” were learned on how to prepare for the next big peak.

Feature image via Pixabay.