TikTok to Open Source ‘Cloud-Neutralizing’ Edge Accelerator

“In a sense, we are trying to hack the cloud’s backbone for our benefit,” noted Vikram Siwach, TikTok manager for product management infrastructure, explaining the benefit of the company’s soon-to-be open sourced “Global Services Accelerator,” a programmable edge platform that matches app needs to the optimal cloud service.
Siwach revealed the details of TikTok’s first major open source package at KubeCon+CloudNativeCon 2023, held earlier this month in Chicago.

TikTok’s Vikram Siwach.
TikTok runs its own data centers — cheaper that way, given the heavy computational demands of generating instant recommendations for millions of users. The cloud is just too expensive for that workload.
But TikTok has found that the network backbones of the cloud providers are the fastest way to reach its global audience. So the infrastructure management team developed GSA to “manage our user experience,” Siwach said.
The GSA was initially designed to optimize network performance, though TikTok also found additional benefits in cutting cloud costs. “It allows you to choose the best partner for you, based on endpoint cost or the routing cost,” he said.
Siwach has grand ambitions for the accelerator, even as cloud providers have their own similar accelerator services: “Once we open source this code, you won’t need to use them,” he said.
Why TikTok Is so Fast
Say what you will about TikTok, but even the most cynical system engineer must admit the mobile-first service is lightning fast, not only in delivering video shorts almost instantaneously but even in serving up personalized feeds in real time.
The company found out early on that relying on the public internet alone can be chancy: It is not so fast and not so reliable.
TikTok runs three data centers worldwide, one for the Americas, one for Europe and Africa and the third for Asia and Australia. A Geo DNS service connects a user logging on with the nearest data center, in that region.
Experimentally, TikTok found that using a cloud provider’s backbone to connect users in Brazil to the U.S. data center resulted in a lower latency, compared to using the internet directly. Users were served recommendations much more promptly.
“The results were very promising,” Siwach said.
How TikTok Cuts Cloud Costs
It worked so well, in fact, this approach was applied to the other global regions, across multiple competing cloud providers. “This is the first use case where we are investing with multiple cloud providers,” Siwach said (mentioning none by name).
Initially, the accelerators were only used for the recommendations, but other services will be supported over time.
Having a proxy close to the user, wherever they are, brings certain advantages for TikTok. If the service gets hit with a wave of Denial-of-Service packets, it can drop those faster and save network traffic costs.
For TikTok, the accelerators also serve to provide “neutrality” across cloud providers, allowing them to pick the ones with the best price and performance.
The accelerators are based on nginx, though modified to meet the heavier performance demands.
The service gets three different types of traffic from users: HTTPS, Google’s QUIC, and Websockets, which comes in through the cloud providers Anycast IP and shuttled to layer 4 load balancers.
The GSAs are run on the cloud provider’s Kubernetes services (usually a single cluster running 4-500 nodes). The cloud providers do a good job of managing Kubernetes, Siwach said, and the workloads can be easily moved across cloud providers, thanks to K8s.
The cloud providers also handle the autoscaling of network resources, important given the social media services’ spikey usage patterns.
The GSA handles aspects such as traffic, full-path encryption and certificate management, user privacy, application firewalls and other security measures.
Programmable Functions at the Edge
The GSA can provide rich details of the cloud provider’s network performance, and also provides unified management, freeing the the user from relying on the cloud providers.
It also provides programmable functions, which can help in performance routing, security management and cost optimization.
Siwach predicted that the edge “is going to evolve into being a function-as-a-service, where you can write simple scripts and programs with logic.”
Here the programmable functions can be used in a wide variety of ways, depending on user requirements. In fine-grained detail, you learn how much the cloud provider endpoint is costing, or how much the load balancing is costing. You get metrics from both the user side and the server side. Then you can program against the results.
“In certain applications, the latency is more important. So we program for that. Certain applications don’t care about latency — uploading images — so we program for that,” he said.
A GSA design provides a way to address regulations and compliance laws, and other requirements both internally and externally generated.
In fact, programmable functions (such as the ability to append more info on a packet header) may play a pivotal role for TikTok in this regard. The service has a dedicated team to manage user safety and data. They program policy for the company and would benefit from a central interface like GSA’s.
“This particular piece of software you can deploy at the periphery of the cloud provider, program the policy and hand over the reins to somebody else who you trust,” Siwach explained.
Into the Open Source Fray
For TikTok, this release will also be its first major foray into open source. Because the accelerator uses open source components (though Siwach only mentioned Nginx), TikTok wants to contribute back to the community.
“So please be kind to us. We don’t know the ways around here,” Siwach told the audience.