Nordstrom Builds Flexible Backend Ops with Kubernetes, Spark and JanusGraph
As customers come to expect more flexibility in the way they shop, Nordstrom has been experimenting with ways to optimize its supply chain.
The Seattle-based retailer has been online since 1998, and today operates 115 department stores as well as what it calls Omni hubs, which store inventory but offer no retail services; and what it calls local stores, which don’t hold inventory, but offer an array of retail services. For instance, you can get a pair of pants hemmed there. And then there are “vertical optimized fulfillment centers,” that handle subsets of inventory, such as beauty items that are small — a tube of lipstick, for instance — and are handled differently.
“Our customers expect a lot more flexibility. They want to know when they can get things, where they can get them, at what cost. So that’s what we’re working within today’s environment,” said senior software engineer Jeff Callahan, speaking recently at ApacheCon North America on how Nordstrom uses the JanusGraph open source graph database, Cassandra and Spark in backend operations.
“Kubernetes is a big part of what we do at Nordstrom.” — Jeff Callahan.
One idea the company has been working on lately is called cost-based routing. As an example, he showed a slide of Nordstrom stores in the Los Angeles area arranged as a clover-leaf pattern, though that’s not how they actually appear on the map. If a customer orders socks, shoes, and a jacket and each item is located at a store in a different circle in the cloverleaf but wants to pick those items up at a store in the fourth circle, the idea is how to most efficiently provide that and at what cost.
“As soon as you look at this across the entire country, … gets really complex really quickly,” he said. “Inventory is constantly selling, moving. So we have to know where the inventory is and how soon it’s going to be available. Staffing can obviously affect these things. So if a store doesn’t have enough staff to go out on the floor, take items to ship, then it’s going to be hard to get it on that truck. …And then of course, in L.A. no less, traffic can definitely be a big factor.”
It’s supporting new fulfillment options including pickup in store, next-day pickup and courier delivery from a total of 150 sites with a variety of carriers and levels of service. Meeting customer expectations is its first priority, followed by reducing the company’s cost.
The technology also has to serve the facility manager who may need to take it offline from receiving more orders if it becomes overwhelmed.
Paired with Data Science
The technology has to be flexible enough to support new concepts, such as cost-based routing, or others the business might come up with later. And it must fit within Nordstrom’s existing technology. It was developed in partnership with the data science team.
“Kubernetes is a big part of what we do at Nordstrom,” he said.
JanusGraph “plays nicely with it with a pluggable set of backend components. And it orchestrates a client’s interaction among those backend components.” Nordstrom uses Solrcloud for the indexing system, Cassandra for the data layer. JanusGraph uses ZooKeeper as coordinator.
The JanusGraph data model stores facilities as vertices, transport options as edges and real-time telemetry data as properties. The system includes a graph backend and graph client that Nordstrom created. It adopted an “embedded JanusGraph” pattern for the graph client, which includes JanusGraph libraries and runs in the same JVM as the application. Data pipelines define the flow of data across the backend in support of client services.
“The back end is really what we refer to as our solution,” he said.
Cassandra and SolrCloud are each single Helm charts.
“You can just single-command deploy that and it’s up and running in Kubernetes. And then the same thing with Zookeeper. You can express dependencies and boundaries. But it really abstracts the sort of details of that deployment and configuration so that’s very well encapsulated,” he said.
It’s all rolled into a single Helm chart, so the team can deploy the entire back end with it, come back in about 10 minutes, and it’s ready to run in Kubernetes.
“All these pieces are already configured at the end of that; they interact with each other. It leaves a config map, which is basically a set of properties that clients can use to connect to the back end,” he said.
Today, the fully populated graph has about 1 million vertices and 100 million edges, which includes SKUs [individual item numbers] as well as all the different shipping options.
“There are relationships there that, once you have those million vertices, just getting those connections, we end up with about 100 billion edges,” he said.
Today, it processes 100 million daily events through that back end and expects the load to increase by one to two orders of magnitude, he said.
The only real problem it encountered with JanusGraph was that it blocked SparkGraphComputer with CQL backend, which the team worked around by coding custom Spark jobs.
The company has found about a 10x reduction in our actual dollars spent versus DynamoDB, but has yet to determine total cost of ownership, he said.
The pipelines form of OLAP [analytics]-OLTP [transactions]-style interactions with the graph. The system performs between 20 and 100-plus concurrent interactions with the graph.
Among the lessons learned:
- Complexity is real. “For a system like this, when you’re trying to operate in production, you can’t be naive about it. You just acknowledge that and plan for it,” he said.
- Discipline in DevOps is a huge key. “We had to all agree that that was a priority for us,” he said.
- JAR dependencies could become confusing at times, especially with Spark involved. With several different, complicated projects coming together in this back end, jar conflicts sometimes created bizarre problems that were difficult to resolve.
- Helm charts were “a huge win.” “I highly recommend it for Kubernetes users,” he said. “It really is configuration is code, so we’re getting source control and tracking. And when you have all these complex systems that individually are hard to manage, having Helm charts to help you really made it much easier.”
Feature image via Pixabay.