How Kubernetes Prepared 8×8 for a 50x Spike in Videoconferencing Usage
One potential opportunity of our sudden work-from-home mandate is the permanent acceptance of remote work. If done right, it could eliminate long commutes for many and make way for the type of flexible work that drives inclusive workplaces. But what “remote done right” actually means is still to be determined. Although cameras-on seems to be at the top of every set of the remote rule list.
The New Stack spoke to 8×8, a cloud communications and video collaboration provider to learn how the company phased in remote-by-default, and how it is creating systems and team resiliency during a 50-fold increase in traffic over less than a month.
Prepare for the Worst, Hope for the Best
“It’s been a pretty insane couple weeks,” 8×8 Chief Product Officer Dejan Deklich told The New Stack over its 8×8 Video Meetings system.
Last September, 8×8 celebrated its 30th birthday by adding a new video meeting platform to its integrated voice, video, chat and contact solution for desktop and mobile. Then in November, it launched the freemium version. Both are based on the Jitsi open-source video conferencing tech acquired by 8×8 from Atlassian in 2018.
Now 8×8 video conferencing has more than 6.5 million users — gaining roughly half a million new users per day — representing a 50 times growth over just the last few weeks.
Deklich said the worldwide team of 1,600 employees plus another 400 contractors is working 24/7 to fulfill the demand. This growth isn’t limited to one region but is scaling internationally, including a sudden need to provision video conferencing for 30 percent of Italian high school’s distance learning.
He said that “Pretty much every part of the system exploded, which is awesome.”
Luckily, nothing seemed to break. He attributes this scalability to pretty much everything running on the public cloud.
“I can’t imagine if you ran in actual data centers, what would happen,” Deklich said.
He said the company engineers “saw every technical problem” — including third-party monitoring tools “explode” — because no one ever planned to scale them 50-fold.
He says you can usually squeeze your system the capacity to cover two to three times current traffic. Now 8×8 has to go to 350 times the norm — which was about 150,000 new users a day — to make sure they are prepared to handle a dramatic upturn in traffic that is expected to continue through at least April.
What Deklich said is that, because of the quality of code and automation, “Everybody is very busy obviously, but there is no need to scale the team at this time.”
Automation Is Key to Surviving Unpredicted Scale
He says they’ve “invested tremendously in automation,” following the CI/CD golden rule that if you do it once, you do it by hand, but anything more than once needs automation.
The 8×8 architecture is also built on top of Kubernetes and distributed computing so, as Deklich said, there is no single point of failure or any bottlenecks.
“If properly architecturally implemented, you can scale horizontally. That allows you to handle the load and have high uptime and availability,” he said.
Deklich continued that they can guarantee high uptime and availability because only a tiny percentage of traffic is on a single node. This is what already enabled the organization to scale globally over the last three years.
“Everything we build and design is for customers distributed all over the world — 4,000 applications in just about every country around the world,” Deklich said, with over 50,000 enterprise customers.
He continued that “The only way this is going to work is if your workloads are distributed around the world.”
Deklich says that in order to achieve this global scale, it uses Amazon Web Services, Google, Oracle and anything else that can hold up cloud computing in a certain region. Kubernetes gives the company the ability to move workloads from one location to the other relatively easily.
Deklich’s product team also leverages a lot of monitoring and issue tooling, including Dashbase, New Relic, Splunk and VMware products. Company engineers can see Italy coming online, going to lunch, or stopping school and work for the day by seeing how a traffic spike goes up and then flattens. And then they see the U.S. come online and up it spikes again.
They have full continuous integration and deployment automation for everything from functional testing to automation testing to performance testing. They rely heavily on Red Hat Ansible for scripting automation, monitoring and load basing.
“If you don’t have it [automation at scale] there’s no way to survive 50 times in three weeks,” Deklich said.
During Times of Crisis, Teams Need Resiliency as Much as Systems
8×8 has an office in Singapore, which instituted a work-from-home mandate in the second week of February.
“Before anything really came to the U.S. we had the experience ‘Oh my god something really evil is coming,’” Deklich said.
The human resources and business continuity teams — which are well prepared with an HQ in California backed by earthquake contingency plans — updated the company guidelines to enable everyone to work from home.
8×8 started to stagger work-from-home rollouts the week of March 9 to test systems, like VPN, at team scale and to allow everyone to move their hardware home. The company went fully remote globally the week of March 16.
“This shows that this really works. Everybody is in the system. Everybody is reachable. To me, this now opens up a conversation with HR and finance,” Deklich said. “I can hire anybody in the world. I don’t care where they sit. I have a way to collaborate with them. A way to see them.”
Company executives made it clear upfront that family is a welcome interruption throughout the workday. Now, despite having just moved to a new Silicon Valley headquarters in January, they are work from home for the foreseeable future.
Though Deklich admits: “It also helps that we develop the software for collaboration and communication.”