LinkedIn’s Real-Time Graph Database Is LIquid
Everything that LinkedIn knows about the global economy is kept in a single graph database, all on working memory.
“The Economic Graph, is our digital representation of the global economy that we use to answer questions about and provide insights into the dynamics of the global workforce and job market,” wrote LinkedIn’s Director of Engineering Dr. Bogdan Arsintescu in a blog item posted Tuesday.
Arsintescu is the chief architect of LIquid, the real-time graph database system powering the Economic Graph. His post discusses this latest — the fourth generation — of LinkedIn’s graph database system.
LinkedIn has always had a unique set of requirements for a graph system, based on its unique requirements of scale and throughput, he explained, in a follow-up interview with TNS.
This Graph is massive, with over 270 billion connected entities, and it expands every time someone joins the service and adds their own info: forever joining their schools, skills, companies, positions, jobs, events, groups, to others with the same background. This allows users keep up with peers (how you learn about, say, an esteemed colleague starting his own media company).
More importantly, though, the Graph is also geared towards the “second-degree connections,” vital for networking opportunities of all sorts. “The magic of LinkedIn is in the second degree,” Arsintescu explained.
In addition to hosting the enormity of this entangled glob of interconnectedness, this database system must field 2 million queries a second. And Arsintescu expects that number to double in the next 18 months.
At that speed, the entire graph needed to be in memory, as a single homogenous set. It couldn’t be parsed off to analytics. “We needed it to be on the Daytona 500 side of the graph databases, to be extremely fast and able to scale to the size of LinkedIn,” Arsintescu explained to TNS.
Running a Giant Graph
Like many social networking concerns, LinkedIn has found the graph database to be a necessary component to connect users with their interests across the globe. Unlike most organizations, however, LinkedIn keeps its entire graph in working memory, thanks to a unique design.
LIquid is the database system delivering this real-time graph, one that can easily scale to ten times its existing size, while maintaining 99.99% availability. It boasts of “new database indexing techniques that made online querying of the data possible,” including connections only a few seconds old. One technique: Triples, the fundamental connective tissue of the graph, get indexed.
To provide the much-need memory, the system both scales up and scales out, sharing memory across servers, but also packing as much memory in per server as possible.
The Graph is held in a Replica, built on a cluster that can be 20-40 or so servers, each with a TB or more of RAM. Each Replica is capable of serving a number of queries per second (QPS). Thus the system’s throughput of QPS can be increased by adding more Replicas.
While confident of the Economic Graph footprint’s scalability, Arsintescu is nonetheless looking at ways to break apart what is now one gigantic homogeneous set of data, he revealed in the blog post. Not all data is equally important; it doesn’t get consulted as much. Nor does all data need to be accessed in real-time. So, to cut costs, research here is going on around tiered storage and workload optimization.
Graph For Developers
“Our primary client for this graph database inside LinkedIn is the developer,” Arsintescu explained.
For developers, the system provides a declarative query language based on Datalog, a deductive database programming language. It is composable; developers can build modules to better find their preferred data.
Also, A/B testing should be a breeze: A new experiment could be started just by changing the query parameters. “This system only generates the necessary data, minimizing the required compute resources,” Arsintescu reports.
Work is also ongoing in add more nuance and built-in sophistication into the querying process, such as creating algorithmically or ML-based derived data, as well as with “improved reasoning.”
Currently, LinkedIn has no plans to release LIquid as open source in any form. Nor does it have immediate plans to commercialize the technology, though is looking into ways it could be used elsewhere at Microsoft.