A Look at Meta’s Low-Latency Cloud Gaming Infrastructure
Tackling the challenge of providing fast, smooth, jitter-free gameplay with super low end-to-end latency, social media giant Meta has created an infrastructure capable of running multiple games on a single server — for economical efficiency — while keeping data secure, company software engineers asserted in a blog post Thursday.
This low-latency gaming platform could also serve as the base Meta’s pending Metaverse, they asserted.
Facebook launched its cloud gaming platform in 2020, providing users quick access to native Android and Windows mobile games across all the browsers. Along with high a volume of consumer access came a high volume of developer and engineering challenges.
Network, Hosting, and Cluster Management
The first step Meta took in providing low end-to-end latency was a physical one — to reduce the distance between the cloud gaming infrastructure and the players themselves. For this Meta used edge computing and deployed in edges that were close to large populations of players. The goal of edge computing is to “have a unified hosting environment to make sure we can run as many games as possible as smoothly as possible,” Meta engineers Qunshu Zhang and Xiaoxing Zhu wrote.
Next, Meta took on the goal to “make sure we can run as many games as possible as smoothly as possible,” the engineers wrote.
For this, Meta partnered with Nvidia to build a hosting environment on top of the GPUs they are using. Meta believes this step will provide the “high fidelity and low latency we need for loading and streaming games.”
Cluster management is the last piece of the end-to-end latency reduction challenge. For this, Meta used its in-house Twine cluster management. Twine coordinates the game servers on edge while custom build orchestration services manage the streaming signals. There are different hosting solutions for both Windows and Android which allows for more flexibility.
Audio and Video Streaming
High audio and video quality are arguably the most important things about cloud gaming. The characters can jump but the audio and video absolutely must not. Meta engineers selected WebRTC with Secure Real-Time Transport Protocol (SRTP) for streaming user inputs and video/audio frames.
The engineers took a hard look at their workflow currently in use:
A player performed an action (make character jump) —> click event was captured and sent to the server —> game emulator received the event —> game rendered a frame that contained the result of the action (character jumped) —> Meta captured the rendered frame, copied and encoded it using a video encoder —> frame was packetized to fit into a User Datagram Protocol (UDP) packet —> sent through the network to the player —> packet gets decoded into frames and rendered for the player.
Although each action was quickly executed, all these actions added up could result in seemingly sluggish performance.
The revised workflow now includes fewer steps:
Now when a game renders a frame, it’s rendered in the GPU and doesn’t leave the GPU memory until it’s encoded. As a result, the new process doesn’t use much of the PCI bus between the GPU and the main server. For further efficiency, this process also creates an encoded frame that ends up being smaller than the raw frame.
Meta can take advantage of the inherent latency of the player’s computer monitor or phone screen, using imperceptible intervals between frames to help absorb some of the jitter and smooth out video.
To continue to push for improved latency, the video is sent ahead of the audio when it’s time to decode, which goes against the current practice of audio and video being sent together. Meta can also take advantage of the inherent latency of the player’s computer monitor or phone screen. The screen renders frames one by one at a certain rate (e.g., 30fps or 60fps). Meta can use those imperceptible intervals between frames to help absorb some of the jitter and smooth out the video. For devices with support for higher FPS, the latency can go down further.
Serving Windows- and Android-based system, the system inherently takes on the security challenges of those environments and also needs protection from threats like DDOS attacks.
To ensure safety, Meta’s cloud gaming infrastructure is completely separate from its core data. In terms of how the cloud gaming infrastructure is protected, security threats are tested at every level of development, starting in design and continuing through implementation and testing. This includes threat modeling, security code reviews, fuzz testing, and security testing. Meta also has external companies performing security assessments as an extra layer of protection.
What’s Next for Cloud Gaming?
In terms of tech improvements, Meta is currently working with mobile network carries to improve latency across mobile networks as well as with chipset networks to improve latency in user devices. Engineering teams are also working on new container technologies to provide better streaming efficiency and continuing to ensure that security measures keep pace with all growing areas.
Developers can also look out for significant upcoming improvements with system compatibility and better tool for development, testing, debugging, and analytics.