The Unlikely Journey of GraphQL

MongoDB’s claim to fame has been its devotion to the preferences of developers, with its view that JSON is the natural way to represent data, and that JSON query is the intuitive way to access it. But when it came to accessing mobile data, MongoDB looked to a different path to access JSON with the acquisition of Realm, embracing a new protocol, GraphQL as the more expedient means for querying mobile data.
From its beginnings at Facebook as a more declarative approach to querying data, GraphQL provides a data access API for exchanging data between apps that could pick up where the de facto standard REST API leaves off. GraphQL is becoming the little API that could.
REST Opened the Doors
Open sourced by Facebook back in 2015, GraphQL has since been on a surprising journey to the point where there is a rapidly growing ecosystem. The emergence of GraphQL reflects on the dominant role of APIs for applications to exchange data, and its success is entirely due to the API that it will likely supplement, not replace REST.
REST emerged as a standard, simple, straightforward means of feeding data to applications. It was a response to the complexity of first-generation services-oriented architectures that focused the handshaking at the business logic level. And it made the world of microservices eventually become thinkable. As an answer to predecessors, such as SOAP, that required apps to handshake at the process level to exchange data, REST reduced all that to a relatively simple matter of sending GET commands over HTTP to a specific endpoint. Just get the data, and let the apps handle the logic.
Not surprisingly, over the past decade, REST became the default approach for applications to embed logic and exchange data. It has become ingrained in countless applications and for developers, it’s pretty straightforward to code. REST is an imperative approach and works well in a static environment where the client requests a specific set of information.
But here’s the rub with REST: the calls are simple, but implementation can grow complex, especially if there are multiple back endpoints (data sources) involved. Satisfying a REST call can require chains of back-and-forth messages dealing with authentication, persistence, and load balancing. And in a microservices world, the output often involves aggregating bits of info from multiple sources. The result is that REST can get quite chatty, and such chattiness leads to latency. When you’re on a mobile device, latency is your enemy.
Microservices Created the Opening for GraphQL
GraphQL is drawing the spotlight because refactoring or modernization of applications into microservices is stressing REST to its limits. As information consumers, we expect more from the digital platforms that we use. Shop for a product, and we also will want to find reviews, competing offers, autofill keyword search, and likely other options. Monolithic apps crack under the load, and for similar reasons, the same fate could be happening to REST, which requires pinpoint commands to specific endpoints. And with complex queries, lots of pinpoint requests.
Facebook developers created GraphQL as a client specification for alleviating the bottlenecks that were increasingly cropping up when fetching data from polyglot sources to a variety of web and mobile clients. With REST, developers had to know all the endpoints. By contrast, with GraphQL, the approach is declarative: specify what data you need rather than how to produce it. While REST is imperative, GraphQL is declarative. The heavy lift is performed by the underlying knowledge graphs, that have already mapped the data and its typing, with the result that information can typically be retrieved in a single pass.
GraphQL initially gained traction in the places you’d expect — large destination online services like Facebook, Shopify, Expedia, GitHub and the like. And it’s become a favorite among JavaScript developers. Of course, no emerging technology is without its debate: Why use GraphQL and Why not use GraphQL.
While the simplicity of GraphQL is in the execution, the devil’s in the details of building those graphs. While Facebook had a relatively monolithic graph, for most organizations, the answer is likely to be a federation of multiple graphs.
And while we’re on the topic of graph, let’s pop one myth: Despite the name, GraphQL is not a query language for graph databases, although it could be used that way.
Here’s some commercial context. Facebook created the original client specifically for its own graph of data sources, so outside Facebook, the API needed to be generalized. Apollo GraphQL created an Apache-licensed open source client to fill the gap; it is one of a number that are currently commercially available. They added a transport layer that could connect the client with a federated view of what are likely to be multiple knowledge graphs representing different data sources. Atop that, Apollo has added its own closed-source Elastic licensed governance tier and an MIT-licensed server that connects to REST and APIs for other data sources.
There are a number of cool features about GraphQL that make it attractive. For starters, thanks to the underlying graph, a GraphQL query can specify the exact data entities necessary with the understanding of the relationships between those objects. And strong typing helps ensure that the results will come out in the right form. As noted, commands are much more compact with much, if not all, the chattiness of REST calls avoided.
There’s also the matter of resiliency. Because of the fragility of endpoints, which often change, REST APIs need to be versioned. By contrast, GraphQL APIs are more resilient because versioning is kept to the underlying knowledge graphs; while that’s not a get-out-of-jail-free card, it’s much less cumbersome than updating all those APIs. Furthermore, if some of the back end connections to data points go down, GraphQL can still return a partial answer.
Better Put up Those Guardrails
Like any API, GraphQL is only as useful as the guardrails placed around it. While, for instance, GraphQL can aggregate data from many sources, it is not on its own a data virtualization or data federation tool. For that, there is the need to formalize managing connections to data sources, not to mention managing security and authorizations. Absent tooling, IDEs, or applications that bundle GraphQL under the hood, working with the raw API may present problems because of its power. For instance, when building the underlying graphs, governance will be required so that an answer delivered to somebody in the US doesn’t contain data that cannot leave Germany. Or, when mapping all the connections between data sources, will the result burn down precious cloud compute cycles? All this is what commercial GraphQL tools are seeking to address.
What about knowing the data source? For developers, securing a RESTful API is much simpler because the data source is defined, whereas, with a GraphQL call, it may not be obvious to the developer what sources are being hit. And that’s where concerns over authentication, authorization, rate limiting, and so on could come in.
As we noted earlier, GraphQL will supplement, not replace REST or other data access APIs. It is suited strictly for getting quick answers involving fetching multiple pieces of the truth. It is not set up for supporting long-running queries. Admittedly, one could implement caching tiers, but why bother when there are more straightforward alternatives, such as with REST?
GraphQL Going Viral
GraphQL has built up a sizable technology landscape. There are the usual suspects among API management, where GraphQL is just the latest to be added to the list. As noted above, Apollo, which extended the original Facebook GraphQL client, aspires to become a next-generation data federation system. Then there’s Hasura which seeks to make the legacy world such as relational databases and RESTful APIs, GraphQL friendly. Hasura just added a feature taken for granted with relational databases for the ability to make joins. But this time, it’s not across relational tables per se, but separate GraphQL instances.
Then there are others like Wundergaph, which takes an opposite tack — a server-side GraphQL back end framework that crawls the database, generates a schema, and handles role-based access controls, caching, and state management. A growing number of databases, from Neo4J to DataStax, ArrangoDB, and others can expose data through GraphQL. Meanwhile, specialized engines such as Dgraph will autogenerate a persistent database once a GraphQL schema is developed. You can do the same thing, by the way, with Hasura and Yugabyte.
The flexibility of GraphQL has been the secret to all this extensibility. Like any API, it requires guardrails — even more so than REST where, at least, data sources are hard-coded in. Both GraphQL and REST involve complexity, just that with GraphQL, that complexity is buried under the hood. As noted above, as GraphQL grows more ambitious as a federated query engine, it will require more sophisticated guardrails and governance built in. And for highly regulated industries, it may require detailed data lineage to produce audit trails documenting that only the proper data was surfaced and nothing verboten breached. By comparison, that’s a lot more straightforward to do so in REST where the connections are coded by the developer rather than generated by the engine.
As noted above, GraphQL won’t displace REST. It’s not only because REST is already so entrenched in modern application stacks, but also that it is better suited to highly complex, long-running queries. With its versatility, we expect that GraphQL will become the operational database counterpart, built for cloud native environments operating with microservices, where the need for simple, quick answers requires complexity that is buried under the hood. From its modest beginnings at Facebook, the GraphQL API has finally popped up under the spotlight. What a long, strange trip it’s been.