Learning to Rank: A Key Information Retrieval Tool for Machine Learning Search
Ranking isn’t just for search engines, or even enterprise search, although it’s routinely used by services like Airbnb, Etsy, Expedia, LinkedIn, Salesforce and Trulia to improve search results.
The Learning To Rank (LETOR or LTR) machine learning algorithms — pioneered first by Yahoo and then Microsoft Research for Bing — are proving useful for work such as machine translation and digital image forensics, computational biology, and selective breeding in genetics — anything you need is a ranked list of items. Ranking also are quickly becoming a cornerstone of digital work.
Beyond document retrieval, LTR is useful for expert search, definition search, collaborative filtering, question answering, keyphrase extraction, document summarization, expert ﬁnding, anti-web spam, sentiment analysis, and product ratings. LTR methods work for images as well as text (Etsy uses a combination of image and text search to distinguish between products that would get a very similar ranking based on the text description alone, and LTR can distinguish higher and lower quality images), or for ranking nodes in a graph. And they’re increasingly available to developers, whether you’re looking for a Python library or a full-scale enterprise search platform like Apache Solr/Lucene. (Lucene LinkedIn’s old search framework; it’s no longer used there but was open sourced and is used by sites like Pandora, Soundcloud and Wikipedia. Solr is the open source enterprise search platform built on Lucene.)
“LTR is suitable for pretty much any area where you can find unbiased training data,” Ganesh Venkataraman, engineering manager at Airbnb (and former head of relevance at LinkedIn) told the New Stack. “It is particularly suitable for cases where there are a large number of features that are harder to hand tune.”
“LinkedIn uses LTR on almost all of its search verticals — people search, job search, recruiter search and so on. We have shown this to work very well with sizeable performance gains and have also seen it scale,” Venkataraman said. For work such as the job search, the technique ties very closely with another algo, query understanding (QU), which predicts the user intent based on query. For example, if the job search query is “google software engineer,” LinkedIn will understand that the member is most likely looking for jobs at the COMPANY “google” with the TITLE “software engineer.”
Using LTR for matching search queries with relevant ads on Microsoft Bing improved online ad revenue by $4 million in the first two months. Using it to rank job adverts on LinkedIn didn’t just get more people clicking through to look at search results; the search was finding jobs people were looking for, because application rates went up 5 percent in the first three weeks (and up 50 percent by the end of the project). When Bloomberg built an LTR plugin for Solr, not only was it a third faster than the hand-built ranking system they’d been using before, results were more relevant, and clicks went up by about 10 percent straight away.
How LTR Works
Learning To Rank uses supervised machine learning to train a model not for the usual single-item classification or prediction, but to discover the best order for a list of items, using features extracted from each item to give it a ranking. It’s not looking at the precise score for each item but the relative order – whether one item is ranks above or below another.
For web results, features might include the domain or PageRank, whether the page title matches the query, the age or popularity of the page, whether it has standard document sections like an introduction or a summary, how long it is and how long people spend reading it, or even whether it will render on a phone screen or just a desktop browser. To make ranking efficient, features need to be extracted automatically; they should be something you can script rather than something you have to code by hand.
The three main LTR techniques are pointwise, pairwise and listwise. Pointwise LTR uses regression or classification to discover the best ranking for individual results. You can think of classification as putting similar documents in the same class and regression as giving similar documents a similar function value, so ranking assigns similar documents similar preferences.
Pairwise LTR uses classification or regression to discover the best order for a pair of items at a time, classifying all the different pairings for items in the list as correctly or incorrectly ranked and working through them to get the ranking for the whole group. There are multiple methods using different techniques for both the cost function and the learning algorithm, including neural networks, random forests, boosting and Support Vector Machines (SVM).
Early pairwise methods like RankNet used a cost function that’s about minimizing the number of times a lower rated item is ranked above one with a higher rating and Stochastic Gradient Descent to train the neural network. LambdaRank gets faster, more accurate results using the gradient of that cost (whether an item should move up or down because of its ranking, and by how much), and LambdaMART improves on that using gradient boosted decision trees.
Listwise LTR methods like ListNet rank the whole list rather than working through pairs, using probability models for the cost. For images, using a subset of the list of images can work better and combining all three cost functions can also improve ranking performance.
The first search engine to use LTR was AltaVista (which was bought by Yahoo) and the key work on LTR was done between 2008 and 2011, with Yahoo and Microsoft releasing training data sets and holding competitions, and the techniques from that era are the ones that are commonly used — but there’s still plenty of on-going research. Venkataraman called “the intersection of deep learning and IR (information retrieval)” particularly important; recently, he says, “Word embeddings (converting query tokens into a higher dimensional space), as well as neural nets, got quite some attention.”
There’s also research into semi-supervised versions of LTR. Supervised learning relies on well-labeled training data from human experts (like the human judges Google uses to get test scores for changes to its search algorithm), which offers high-quality data — but it’s expensive and only covers a fraction of whatever it is that you want to rank. Analysing search logs and click-through rates is cheaper and works at a much larger scale, but there’s often bias in that kind of implicit data (the top results will get more clicks but may not be the best match, and if none of the results are clicked you get no useful information).
Tools for LTR
Several LTR tools that were submitted to LTR challenges run by Yahoo, Microsoft and Yandex are available as open source and the Dlib C++ machine learning library includes a tool for training a Ranking SVM. The pyltr library is a Python LTR toolkit with ranking models, evaluation metrics and some handy data tools.
For most developers, LTR tools in search tools and services will be more useful. LinkedIn open sourced sample code for building an end-to-end ‘instant’ search system on Elasticsearch that uses LTR. The Azure Search service for building custom search engines has LTR functionality built in for relevance ranking; the custom code Ernst and Young built to create a custom search engine for the US tax code is on available as Python scripts and Jupyter notebooks. That’s a good example of combining LTR with natural language processing (in this case using Microsoft’s LUIS Language Understanding Service API) for extracting features for ranking documents.
Bloomberg’s LTR plugin is now built into Solr; it extends the existing RankQuery in the platform and reranks a number of documents using your ranking model (which is deployed as a Solr managed resource) and feature information (so features can be built in Solr with short, simple scripts and rich search options like synonyms and phrase matching still work). That makes sophisticated results ranking available relatively simply, in a popular search platform.