eBay Rolls Out Kubernetes for Performance-Sensitive Search Operations
Would it be possible to run a large-scale, latency-sensitive workload like a search application on Kubernetes at nearly the same performance as could be enjoyed from running directly on the servers themselves? This is what a group of engineers at e-commerce giant eBay are investigating now. And so far, their results, with a little tuning, look promising, they’ve reported at the KubeCon+CloudNativeCon North America conference in November.
Currently, eBay runs about 60 Kubernetes production clusters, managing about 160,000 application pods across 30,000 servers. In addition to user-facing search functions, the K8s instances also handle duties such as managing the Hadoop AI/ML pipelines. To get closer to the end-users, some K8s clusters are run on the edge, via the Envoy proxy.
The biggest job on the work docket is to run a portion of eBay’s home-built distributed search platform called Cassini. At any point in time, Cassini serves over 1.4 billion active listings. On average this leads to about 300,000 queries/per second at each data center, which adds up to about 30-40% of the data center footprint globally. The platform teams endeavors for 99.999% availability (the proverbial “five-nines”) for this service.
The goal was to get the Kubernetes-driven search to run as fast as it would just running on the servers themselves. eBay fund that, out of the box, a set of Kubernetes clusters could execute 3,200 queries per second (QPS) compared to the bare metal performance of 3,600 QPS, with both consuming about 18-20% of CPU resources. Further performance boosts were obtained by updating to the latest Linux kernels, doing some CPU tunings and adopt the high-performance IPVLAN, which can be added into the Linux kernel as a driver. With these additional changes in place, they were nearly able to match the performance at 80% CPU utilization, with a blazing 9,500 QPS.
Using Kubernetes, eBay was “able to achieve the speed, flexibility of the search application, without compromising on the performance of that application,” said Yashwanth Vempati, an eBay developer working for the platform team, including all things Kubernetes.
The search team decided to try Kubernetes for a number of reasons. Historically, adding a new feature to Cassini, or even upgrading one, can take a lot of time, involving consultation with the hardware team, and provisioning of the OS and various services. Kubernetes could cut this provisioning time considerably. K8s also promised a way to bring apps to the edge. Kubernetes affinity rules provides a way to automate a secure topology so when one shard goes down it doesn’t take all the copies of a particular shard. And, of course, Kubernetes would provide scalability, where if the number of users, or items being sold, would spike, additional pods could be easily spun up to accommodate the increase.
eBay’s search space is different than that of most use cases of search insofar that the content is always changing, as items are posted and then sold, said Mohnish Kodnani, an eBay developer for the search team. In eBay’s test setup, Kubernetes has the bottom level of a five-layer query serving stack, the one that runs the query across multiple shards with the billion-or-so listings. as well as manages the scoring, ranking and returning the results to the user. This was the most complex of the layers, but one, if handled correctly could ensue success with the other layers, Kodnani said.
In order to make use of Kubernetes, eBay converted the search functionality into a set of microservices, which tuned out to be a difficult task. The Query Serving Pod, for instance, is made up of three containers. One runs the application logic, one exports the log data, and the third routes the operational metrics to Prometheus. The team wrote a file distribution Operator to distribute the data across the grid.
One challenge the company had is that Kubernetes offers no easy way for two pods to share a persistent volume. In order to solve the problem, the eBay team created a “mutating webhook” to help share a Persistent Volume Claim can be shared across different pods. It was a bit of a hack, the pair of engineers admitted, but it did the trick. This webhook draws a list of all the PVCs on that pod from the K8s API Server, and attaches them as volumes to the Data Distribution Daemonset Pod for that node. Instead of running sidecars, eBay runs the Data Distribution Nodes as a Daemonset to make upgrading individual components easier.
Despite the success of the initial trial, it will be a few years before eBay moves all of its search on top of Kubernetes, Kodnani said. The company wants to keep disruptions to its service at an absolute minimum. The developers don’t see a difference between the two backends thanks to a proxy layer the company developed in-house that gives developers a single API to write against, no matter the underlying platform.
To get all the details, check out the entire presentation here: