Explore or Exploit? Trivago’s ML for New Recommendations
The never-ending question: do you continue to “exploit” something that’s already working or “explore” the vast unknown in search for something better? In Data Science this problem as a name — the multi-armed bandit. The multi-armed bandit being a slot machine (ie the “one-armed bandit”) and the challenge lies in finding out how to choose the most profitable outcome through a series of choices.
Online travel site Trivago built a machine learning algorithm to address the explore-exploit dilemma in its accommodation ranking model. Trivago Data Scientist Aida Orujova wrote a blog post explaining why.
Trivago already has a list of top-performing accommodations within a vast inventory of accommodations. But it wanted to create a machine learning algorithm that introduces new inventory that have potential to become popular with users while continuing to still show accommodations that match with the user.
After six months of testing, Trivago added an algorithm to production that favors popular user features on unseen accommodations, which then can be added to the search rankings.
In the case of Trivago, the “bandit” is the search algorithm and the sevens and cherries are accommodation rankings. Continuing to show the historically best performing accommodations is exploitation vs the exploration into the vast unknown of either new or previously unseen accommodations in the hopes of finding the new best performers.
Why Explore and Not Just Stick with What’s Working?
With this approach, users have the benefit of encountering an inventory that will be overall better in the long run. Advertiser bids will be optimized by sending them user data from a wider range of inventory.
What is the best way to explore the millions of accommodations so that the best are found and brought to the front of the rankings? There are three things to take into consideration when building the algorithm.
- Extent of exploration: increase in click share of accommodations that weren’t exposed before
- Cost: topline user and revenue metrics
- Quality: historical performance of inventory that wasn’t previously impressed
Data science engineers ran numbers iterative A/B tests in parallel over the course of six months and used the output and analyzed results to recommend the next test setup. Model and engineering complexity, cross-test interactions, and other factors were reviewed. The details below will focus on one test approach.
One of the features in the Ranking model is calculated using historical information on the performance of inventory over a set of days. The feature values distribution follows a beta-binomial model. Trivago is currently exporting the mean of this distribution as a posterior, which is conditional probability distribution. This default value sets the exploration level.
A potential issue with such a scheme is that it is too greedy, purely exploiting good-performing accommodations in the model.
Step 1 Naïve Approach
The idea of “the less we know, the more potential an accommodation has” is the behind this theory. The more data an accommodation has, the less it will deviate from that data.
This algorithm is the mean plus a fraction of the standard deviation of the posterior. This step favors items with low impressions since they have larger standard deviations. A lambda (λ) parameter is added to control the strength of exploration.
The algo called “naïve” because it depends on the number of clicks and impressions the accommodation got in the past.
Step 2 …Step N
Cycling through all the unknown accommodations at random (the Naïve approach) isn’t necessarily the best approach since Trivago has a wealth of data in regards to user accommodation preferences so there are unseen accommodations that will have an advantage over others. And this is the idea behind Step N, a “model-based approach.”
Trivago engineers train a model using historical performance of high impression inventory with item features then use the prediction to get f(item features) using item features for low impression inventory. Items with a high model score but low/no historical impressions will benefit most from this scheme.
Exploratory Data Analysis (EDA), feature engineering, data preprocessing (cleaning and disputing for missing feature values), standardization/ normalization of model outputs, etc are some of the data mining techniques used in the course of actions.
- Test variants of the same color illustrate different levels of the same test.
- The ideal location for the dots is the upper right quadrant.
- If one connects to the Pareto Front, test variants on the line represents the best possible trade-off while tests below that line offer suboptimal trade-offs.
Some Other Lessons
User Value was increased by exposing a higher share of quality unexplored inventory at no short-term revenue cost. The Explore-Exploit dilemma has the trade-off of impressing unexplored inventory at a cost to the users and partners but Trivago achieved the “golden mean” in their pursuit of optimizing value for all.
Lambda acts as a good parameter for controlling the extent of exploration. It was important to control the level of exploration and the lambda parameter allowed Trivago to measure the Pareto Front between the level of exploration and topline metrics.
The tunable parameter helps to set up pseudo-control variants for testing new A/B experiments.
Models don’t always do what you expect them to. There’s only so much one can expect from feeding a set of features into a black box model. Understanding how the outputs from the model translate into the metrics is incredibly important though that didn’t guarantee a better results than the naïve exploration.
There was no significant shift in partner clickshares due to the exploring mechanism. Exploring low-impressed inventory didn’t shift clicks toward a particular partner.
Status Quo and Next Steps
The exploration mechanism is up and running in production within Trivago’s Accommodation Ranking model. The learnings from the A/B experiments were documented and the code base for their Spark linear and auxiliary tooling were improved upon.
To stay on top of trends of the exploration levels in their model over time, monitoring dashboards were created. There are charts that included monitoring of:
- What feature values distribution used in the model looks like.
- The change in cost metrics during exploration.
- The change in quality metrics during exploration.
This information is just a starting point for when the company decides to tackle the exploration problem in even greater detail.