Lessons from Deploying Microservices for a Large Retailer
Microservices “saved our butts” every day, said Heath Murphy, director of consulting services at the IT consultancy CGI.
Murphy led the design of a microservices architecture for an order fulfillment system for a US-based specialty retailer that needed to serve over 1,200 brick and mortar stores, 15 e-commerce sites and 5,000 B2B partner company stores. He shared his experience at CodePaLOUsa, a regional development conference held this month in Louisville, Ky.
Microservices has no single definition, Murphy said. He ultimately settled on a few key defining characteristics:
- Services “are often processes that communicate over a network to fulfill a goal using technology-agnostic protocols such as HTTP.”
- Services are organized around business capabilities.
- Services can be implemented using different languages, databases, hardware and software environments.
- Services are “small in size, messaging-enabled, bounded by contexts, autonomously developed, independently deployable, decentralized, and built and released with automated processes.”
CGI’s Microservices Architecture
The system CGI built for its client needed to be able to inject EDI files, flat files, and “a million other types of files,” Murphy said. It needed to handle API Rest service calls, direct database access calls, and the message queue as it flowed through validation, stock reservations and the warehouse. Finally, at the end of the journey, the process would need to handle inbound customer phone calls and hardware streaming data.
The business events it needed to handle included everything from new inbound orders to cancellations, outbound status updates and inventory updates.
CGI built a microservices architecture composed on one very large legacy database on SQL Server to handle it all — which he acknowledges was very much an anti-pattern in microservices, but what they were required to do. Since the client was a .Net shop, it relied on .Net and C#. All of the messaging was handled by RabbitMQ, which he noted is an “extremely important part of the architecture and one first dismissed.” Finally, it used an off-the-shelf product to handle EDI files, since they didn’t want to process EDI files in .Net.
What’s notable is what’s missing, he added. The microservices architecture did not use containers, MongoDB or a similar big data solution, or even the cloud.
The Importance of Logging
Logging proved to be a crucial part of the microservice app. Whereas logging is straightforward and sequential in a monolithic app, in a microservices architecture the log happens in dozens of processes and modules. This creates challenges and makes the logging process more important.
“You got logs happening everywhere, dozens of places,” Murphys said.
Murphy recommended development pick a package (his example was SeriLog) and use a boilerplate template. Check for good logging-in PRs. Make it easy to log to get better logs, he explained. The system also required correlating IDs in all inbound services; it rejected all calls without that attribute, he explained. Finally, he advised building or leveraging an aggregator platform. “Logs are helpful when searchable,” his presentation noted.
He recommended using the same correlation ID for all logging in the same business transaction. To correlate IDs: Generate the correlation ID in the HEAD-IN service. GUIDs make fine IDs, he noted. Then pass these along as the header. Finally, cross-reference the log keys to link all external business identifiers — such as order number, external partner order, warehouse order number, shipment tracking number and customer ID — back to the log key.
Service Bus to Manage Microservices
One lesson learned by the consultancy: If you build microservices so they’re dependent on one another, it can crash the system when one connection goes down. To avoid this, CGI set up the process so that API Management (APIM) sends a microservice to an Azure Service Bus, which triggers events between the microservices.
In the end, the system expected it would need to input about 100,000 fulfillment requests. What really happened was that it handled 500,000 requests, he said — thus “saving their butts.”
It was able to handle that because it relied on lightweight messages in JSON with no order details, which triggered the fulfillment request microservice. RabbitMQ was able to scale up and handle bursts of traffic. Microservices then would fetch the full order details, perform full validation, and leverage other microservices.
Murphy’s Advice: Monitor Everything
Murphy recommended monitoring everything. They ran heat checks to check connectivity. They monitored message queues and set a threshold for the number of errors that were acceptable within a certain time frame. The system triggers an alert when the system exceeded that threshold. They also monitor what was expected to happen but hadn’t happened yet.
“Logs tell you what happened but not what hasn’t happened yet,” Murphy’s slides noted. “Build monitoring around when things should have occurred but have not.”
Finally, the team used synthetic messages, injecting fake requests into the correlated business transactions and monitoring the service level agreements on timing, errors, etc.
The system does use humans to manage edge cases — which Murphy noted are guaranteed to pop up — but the DevOps culture also incorporates edge case fixes back into the sprint cycle.