Lessons in Thrift: How Facebook Keeps its Web Pages Snappy
With over 2.2 billion users worldwide, Facebook may very well be the most widely-used software platform on the planet. YouTube or Google still might generate more traffic, though neither of those sites are nearly as personalized for each user as Facebook’s, making the social network site a marvel of modern-day web performance engineering. Doubly so considering it is mostly all built from open source software.
Part of the success can be attributed to the small Facebook Web Speed Team based in the company’s New York offices. “I guess our team mission is really simple: Make Facebook.com fast, by any means necessary,” said Aaron Bell, Facebook engineering manager, who is managing the Web Speed team, adding that the “any means necessary” is important “because, as it turns out to make Facebook.com fast is a really big problem.”
“Definitely what makes it challenging is the constant change,” Bell said. “The site never stabilizes. We’re always adding new features.”
The Web Speed team concentrates solely on speeding the delivery of a fully composed, fully-customized web page to each visitor’s browser. This team does not handle web apps, though they do cover mobile browsers.
Despite the rise of mobile apps, the web still remains vitally important for the social networking giant. Despite increasing use of Facebook’s mobile apps, the web site remains the platform of choice for most users that need to do those heavyweight tasks such as managing a group or an event. And the majority of ad revenue still comes through the website as well.
Interestingly, this team is based in New York, rather than in Silicon Valley. It turns out that a lot of browser development takes place on the U.S. east coast. Also, New York is home to many engineers who think a lot about high performance, low-latency computing, thanks to the financial technology community nurtured by Wall Street. Machine learning also has a big presence in the Big Apple.
“So there’s a lot of cross-pollination of ideas and performance work that goes on here,” Bell said.
Tricks of the Trade
Improving performance involves a lot of tweaking against how a browser works and how the server software works.
One tweak is to executing computational operations in parallel wherever possible. “Basically we play tricks to make sure that we’re saturating everything at once,” Nate Schloss, a Facebook engineer who on the team. This is the idea behind a tool the company developed called BigPipe, which breaks the pages into pagelets so they can be pipelined through multiple execution stages, both on the server side and on the browser.
The company is a big proponent of A/B testing — trying something on a number of users before rolling it out system-wide. “In general the data that we can get from the wild is much more representative of the way that Facebook works because there’s so much diversity in terms of devices and networks and stuff that we see that it’s really, really hard for us to figure this out correctly inside of a lab,” Schloss said.
Machine Learning for Packaging
“It’s constantly changing so even if we were to find a really good bundling approach it would be out of date in 20 minutes. So we have to automate that,” Bell said.
The company developed a tool, called Packager, which uses machine learning to automate the process of deciding which files to bundle into a package for a specific end user. It relies heavily on statistical analysis: Which files will the users need right away? Which will they need eventually? Which files have been updated? Some files get updated constantly; others not so much so.
ML can help in other ways as well, such as predicting what the user may click on next, so the servers can prepare the next batch of material to send.
Each person’s profile and history can offer clues as to which section of the home page they will click on next. Making these kinds of predictions, however, can lead to two possible pitfalls: over-estimating and under-estimating.
“You can either over-predict which is where you send too much then a bunch of it is unused,” Bell said, noting that this leads to low efficiency of the resources — network, CPU, server time, etc. “Then there’s under-prediction which is where you don’t send enough and then the user clicks on something and they don’t have all the resources they need and that’s by far the worst case.”
The team has concluded that if there is at least a small chance that the file would be used, then it is worth the cost, overall, to ship it to the user. For the most part. “There’s a bit of an art to it too. Sometimes it is too big of a file then we don’t send it,” Bell said.
Not all the work the Web Speed team does is strictly in-house. Facebook is a big believer in supporting Web standards, and its engineers can be found on many technical committees for Web technologies. “Ultimately our goal is to make the web as fast as possible for everybody. We really want the ecosystem to be healthy here,” Schloss said.
For instance, Facebook has been enthusiastic about Service Workers, for instance, which are client-side proxies that can tackle computational problems in the browser, such as making the decision of plucking something from the cache or fetching it from the network.
Feature image: Aaron Bell (left) and Nathan Schloss. Images courtesy of Facebook.