EBay Uses Machine Learning to Refine Promoted Listings
Online marketplace eBay incorporated additional buying signals such as “Add to Watchlist,” “Make Offer,” and “Add to Cart” into its machine learning model to improve the relevance of recommended ad listings, based on the initial items searched for. Chen Xue goes into great detail in this recent article.
EBay’s Promoted Listings Standard (PLS) is a paid option for sellers. With one option, PLSIM, eBay’s recommendation engines suggest sponsored items similar to something a potential buyer just clicked on. The PLSIM is paid on a CPA model (the seller pays eBay only when a sale is made) so that can be very motivating in terms of creating the most effective model to promote the best listings. This does tend to work out for sellers, buyers, and eBay.
The PLSIM journey looks like this:
- The user searches for an item.
- The user clicks on a result from their search —> lands on a view item (VI) page for a listed item (eBay refers to this as the seed item).
- User scrolls down the VI page and sees the recommended items in the PLSIM.
- User clicks on an item from the PLSIMs and either takes action (watch, add to cart, buy it now, etc) or checks out another new set of recommended items
The PLSIM journey from a machine learning perspective:
- Retrieve a subset candidate Promoted Listings Standard (the “recall set”) most relevant to the seed item
- Applies a trained machine learning ranker to rank the listings in the recall set according to the likelihood of purchase
- Re-ranks the listings based on ad-rate in order to balance seller velocity enabled through promotion with relevance of recommendations
The ranking model is based off the following historical data:
- The recommended item data
- Recommended item to seed item similarity
- Context (country, product category)
- User personalization features
EBay uses a gradient boosted tree, which, for a given seed item, ranks items according to items’ relative purchase probabilities.
From Binary Feedback to Multirelevancy Feedback
In the past, purchase probability relied on binary purchase data. It was as simple as “relevant” if it was purchased with a seed item and “irrelevant” if it was not. This was a failed method but there were major areas for optimization.
- False negatives: since users generally only buy one item from a list of recommendations, good recommendations could be seen as bad in instances where a purchase did not occur leading to false negatives.
- Purchases are rare: when compared to other user events making it challenging to train a model with sufficient volume and diversity in the purchases to be predictive of the positive class.
- Missing out on data: user actions ranging from clicks to add to cart reveal a host of user information revealing probable outcomes
From all of this, eBay engineers considered the following user actions in addition to the initial click and how to add them to the ranking model.
- Buy It Now (only applied to (Buy-It-Now) BIN listing)
- Add to cart (only applied to BIN listing)
- Make offer (only applied to Best Offer listing)
- Place bid (only applied to Auction listing)
- Add to Watchlist (applied to BIN, Best Offer, or Auction listing)
Relevance Levels of Multirelevance Feedback
eBay now understands that purchases are most relevant, and there’s a need to add other actions but the new question is: where do these actions fall on the scale of relevancy?
The journey always starts with a click on the recommended item before seeing the new raking action item. This leads eBay to rank “lack of selection” action leading to and “selecting a recommendation” (clicking on a recommendation) as the least and second least relevant actions leading to a purchase, respectively.
The charts below illustrate how eBay ranks the remaining possible actions — “Make Offer,” “Buy It Now,” “Add to Watchlist,” and “Add to Cart.”
In historical training data for a seed item, each of the potential items were labeled on the following scale as the relevance level.
The consequence of the labeling was that during training, the ranker penalized misranked purchases more heavily than misranked “buy it nows” and so on down the list.
Sample Weights of Multirelevance Feedback
But it’s not that simple because a click is not exactly one point less likely to result in an “Add to Watchlist” any more than “Add to Cart” is two points more likely to result in a purchase than a “Click.” The gradient boosted tree supported multiple labels to capture a range of relevance but there was no direct way of implementing the magnitude of relevance.
EBay had to run tests iteratively until they came up with numbers that made the model work. The researchers incorporated additional weights (referred to as “sample weights”) which were fed to the pairwise loss function. They optimized the hyperparameter tuning job and ran over 25 iterations before coming up with the best sample weights — “Add to Watchlist” (6), “Add to Cart” (15), “Make Offer” (38), “Buy It Now” (8), and “Purchase” (15). Without the sample weights, the new model performed worse. With the sample weights, the new model performs better than the binary.
They experimented with only adding a click as additional relevance feedback and applied tuned hyperparameter “Purchase” sample weight 150. The offline results are also shown below, where “BOWC” stands for the actions “Buy It Now,” “Make Offer,” “Add to Watchlist” and “Add to Cart.” Purchase rank reflects the average rank of the purchased item. The smaller, the better.
In total there were over 2,000 instances of models trained. The A/B tests were run in two phases. The first phase only included additional selection labels and showed a 2.97% increase in purchase count and a 2.66% increase in ad revenue on the eBay mobile App which was deemed successful enough to launch the model into worldwide production.
The second phase included more actions such as “Add to Watchlist,” “Add to Cart,” “Make Offer,” and “Buy It Now” into the model, and the A/B test showed even better engagement (e.g. more clicks and BOWCs).