How Airbnb and Twitter Cut Back on Microservice Complexities
Two recently-posted talks from Airbnb and Twitter show how these web-scale companies are battling encroaching complexity in their respective microservices-based architectures. Both established robust data layers built on GraphQL, and streamlined core functionalities into a simplified set of services to make it easier for developers to build out new features at the edges.
Last decade, Airbnb moved to a service-oriented architecture (SOA) to rid monolithic complexities, only to find this microservices-based approach led to its own tangle of complications. The company currently has 2,000 services, managed by 500 engineers. The dependency graph, which is to say the overall system design, was “hard to reason about,” said Jessica Tai, Airbnb tech lead manager and core services infrastructure engineer, in an InfoQ Qcon talk posted earlier last month.
Such complexity made it hard to debug services. It took longer to develop features, thanks to a growing number of changes that had to be made at integration points. Services started to duplicate functionality, and data was getting fragmented.
In response, the lodging giant created what its engineers call SOA v2. In v2, services are grouped into either internal services or presentation services. Between them is a data aggregator, a new set of APIs that would relieve the need for presentation services to duplicate data services, using GraphQL.
For instance, the mobile app would have to fetch data about an Airbnb listing and the users themselves from different sources. This resulted in different parts of the app or website duplicating a lot of querying logic.
Using GraphQL, Airbnb’s “data aggregator” relies on a set of universal resolvers that know from where to fetch the data. It’s also a good place to embed some lightweight business logic as well. The data aggregator can also batch multiple queries into a single call to the underlying service, improving scalability.
Further reducing complexity, the underlying data services are clumped into “service blocks,” handling different core entities, such as “user” and “home,” all unbeknownst to the developer.
Within the data aggregator, each block provides “cohesive business logic around that entity,” Tai said. “Everything beneath the service is considered a black box to the client, and this helps to simplify the developer experience.”
While this approach introduces an additional network hop, it gives the engineers the chance to optimize for query patterns.
Incidentally, another social networking giant, Twitter, went through a similar refactoring process.
The social media giant has just rolled out a set of public APIs, as well as a multitenant microservice to cut down the sprawl of other microservices, said Steve Cosenza, Twitter Senior staff Engineer, in another QCon talk.
Originally, Twitter ran their public APIs through a single Ruby-on-Rails application (“Monorail”), which had grown into one of the largest Rails codebases in the world. And so it was increasingly difficult to update. By 2014, Twitter went the route of microservices, migrating the API service to a set of 14 microservices, running on an internal Java Virtual Machine (JVM)-based framework (“Maccaws”).
This first microservices approach worked well … for a while.
“While the microservices approach enabled increased development speeds at first it also resulted in a scattered and disjointed Twitter API,” Cosenza said. Independent teams designed and built endpoints for their specific use cases with little coordination. This led to fragmentation and, inevitably, a slowing of developer productivity.
Over time, Twitter started building out a set of internal APIs, also based on GraphQL, to help its own developers move more quickly. In 2020, this internal architecture was then pressed into use for what would be version 2 of the Twitter public API.
The public API platform was designed to scale to a large number of endpoints, which could all be rolled out as new services are introduced. The idea is to have developers worry about querying and mutating only the data they need, without setting up and running an HTTP API service for each new functionality.
The idea was to minimize any specific endpoint business logic within the core HTTP service, “otherwise the system would quickly become yet another unmaintainable monolith,” Cosenza said.
Core and common API logic would be handled by a dedicated infrastructure team. To developers, this core service offered a “powerful data access layer that emphasized declarative queries over imperative code.”
“Twitter clients query for data and render UIs while the public Twitter APIs query for data and render JSON,” he said.
The two core data components are “resource fields,” which bits of atomic data such as Tweets or users, and “selections,” which are ways to find and aggregate resource fields (“tweet lookup by ID”).
So where can developers add in their own endpoint-specific business logic? By default, they are given the option of using the domain-specific language for Twitter’s internal data cataloging system, Strato. For those cases where imperative code is needed, the developers can build a Scala microservice that is then exposed in a Strato column.
“In either case since the platform provides the common HTTP needs for API endpoints so new APIs can be released without spinning up a new HTTP service,” Cosenza said.