LinkedIn Finalizes its New Search Architecture, Galene
Last week, professional social networking service LinkedIn moved its search operations entirely over to a new architecture, called Galene, customized for its specific user base.
“A 100 percent of our search traffic is being served on the Galene search stack,” said Josh Walker, LinkedIn’s director of search and feed infrastructure. This means all 37 different search services on the site are now powered by Galene, named after a Greek goddess personifying calm seas.
“As a user, you see the graph nature of LinkedIn every time you do a search,” Walker said. A search on a user’s name will, for instance, show the number of connections between the two of you. A search for jobs can use the searcher’s resume to help determine to most appropriate positions.
Over 15 engineers were devoted to the task building Galene, with considerable input from others around the company. All in all, over 50 years of manpower over a two-year period was devoted to the project.
The architecture also lays a foundation for a set of interconnected information that LinkedIn is curating, called the economic graph.
LinkedIn was created as a site for people to share their professional information and connect with others. This alone is a challenge for a user base as large as LinkedIn’s, which at last count, accrued more than 400 million accounts filled with facts about people, jobs, companies, schools, groups and other professional content. The economic graph could glean a wealth of additional information by aggregating this user data and cross-indexing it from other data sources.
“With the existence of an economic graph, we could look at where the jobs are in any given locality, identify the fastest growing jobs in that area, the skills required to obtain those jobs, the skills of the existing aggregate workforce there, and then quantify the size of the gap,” explained LinkedIn CEO Jeff Weiner, in a blog post explaining the concept of the economic graph. “Even more importantly, we could then provide a feed of that data to local vocational training facilities, junior colleges, etc. so they could develop a just-in-time curriculum that provides local job seekers the skills they need to obtain the jobs that are and will be, and not just the jobs that once were.
For the search services, the company started out using Apache Lucene text search engine, though as the company focused on the kind of specific searches that its users would find most valuable, company engineers added additional layers of functionality atop of the stock Lucene. “Lucene is there in the core, but there is a lot more that went into the platform,” Walker said.
Searching the economic graph has a separate set of challenges from a regular Google-styled Web search, explained Tai-Ping Yu, LinkedIn’s lead for search infrastructure. For one, the service must be really tuned to how frequently people update their profile, which many of its users do quite frequently. LinkedIn is also working to bring in third-party sources of information to automatically update users profiles, at least for those users too uninclined to do it themselves.
Originally, the search team ran into issues while scaling out Lucene. Increasingly, they had found themselves spending more time figuring out ways to keep the system running as the user base grew. The multiple off-the-shelf components pressed into use, many open source, didn’t possess the flexibility needed. Rebuilding indices were time-consuming and Lucene, at the time, did not support live indexing. Updating entities, which required deleting the old one and replacing it with a newer one, was a computationally expensive process.
The idea was to have Galene work as a single unified system, rather than as a collection of discrete components. In this way, the engineers working on the ranking and relevance algorithms don’t have to fret over infrastructure issues, such as multi-threading the code, or scaling a large workload across the system. Galene also provides a flexibility for engineers to develop new types of search-related features.
“We’re not trying to presuppose what entities or data structures get searched,” Walker said.
Feature Image of the Mediterranean Sea via Pixabay.