Year-in-Review: 2023 Was a Turning Point for Microservices
Maybe we are doing microservices all wrong?
This was the main thesis of “Towards Modern Development of Cloud Applications” (PDF), a paper from a bunch of Googlers (led by Google software engineer Michael Whittaker) that was presented in June at HOTOS ’23: Proceedings of the 19th Workshop on Hot Topics in Operating Systems.
The problem, as Whittaker et al pointed out, was that microservices largely have not been set up correctly, architecturally speaking. They conflate logical boundaries (how code is written) with physical boundaries (how code is deployed). And this is where the issues start.
Instead, the Google engineers suggested another approach. Build the applications as logical monoliths but hand them off to automated runtimes, which makes decisions on where to run workloads, based on what is needed by the applications and what is available.
With this latency, they were able to lower latency systems by 15x and cost by up to 9x.
“If people would just start with organized modular code, we can make the deployment architecture an implementation detail,” Kelsey Hightower commented on this work in October.
What Went Wrong with Microservices?
A few months earlier, the engineering team at Amazon Prime Video posted a blog post explaining that, at least in the case of video monitoring, a monolithic architecture has produced superior performance than a microservices and serverless-led approach.
In fact, Amazon saved 90% in operational costs by moving off a microservices architecture.
For a generation of engineers and architects raised on the superiority of microservices, the assertion is shocking indeed.
“This post is an absolute embarrassment for Amazon as a company. Complete inability to build internal alignment or coordinated communications,” wrote analyst Donnie Berkholz, who recently started his own industry-analyst firm Platify.
“What makes this story unique is that Amazon was the original poster child for service-oriented architectures,” weighed in Ruby-on-Rails creator and Basecamp co-founder David Heinemeier Hansson. “Now the real-world results of all this theory are finally in, and it’s clear that in practice, microservices pose perhaps the biggest siren song for needlessly complicating your system. And serverless only makes it worse.”
Amazon Video’s Experience with Microservices
The task of Amazon engineers was to monitor the thousands of video streams that Prime delivered to customers. Originally this work was done by a set of distributed components orchestrated by AWS Step Functions, a serverless orchestration service, AWS Lambda serverless service.
In theory, the use of serverless would allow the team to scale each service independently. It turned out, however, that at least for how the team implemented the components, they hit a hard scaling limit at only 5% of the expected load. The costs of scaling up to monitor thousands of video streams would also be unduly expensive, due to the need to send data across multiple components.
Initially, the team tried to optimize individual components, but this did not bring about significant improvements. So, the team moved all the components into a single process, hosting them on Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Container Service (Amazon ECS).
“Microservices and serverless components are tools that do work at high scale, but whether to use them over monolith has to be made on a case-by-case basis,” the Amazon team concluded.
Downsides of Microservices
Arguably, the term “microservices” was coined by Peter Rodgers in 2005, though he called it “micro web services.” He gave a name to the idea that many were thinking though, especially in the age of web services and service-oriented architecture (SOA) gaining attraction at the time.
“The main driver behind ‘micro web services’ at the time was to break up single large ‘monolithic’ designs into multiple independent components/processes, thereby making the codebase more granular and manageable,” explained software engineer Amanda Bennett in a blog post.
The concept took hold, especially with cloud native computing, over the following decades, and has only started receiving criticism in some quarters.
In their paper, the Google engineers list a number of shortcomings with the microservices approach, including:
- Performance: serializing and sending data across the network to remote services hurts performance, and, if the application becomes complicated enough, could even lead to bottlenecks.
- Comprehension: Bugs are notoriously difficult to track down in distributed systems, given the many interactions across microservices.
- Management issues: It is considered an advantage that different parts of an application can be updated on their own schedules. But this leads to developers having to manage a huge number of binaries each with their own release schedule. And good luck running end-to-end test with a locally-run service.
- APIs get brittle: The key to microservice interoperability is that once a microservice is established the APIs can not change, let them break any other microservice that relies on the API. So APIs can only be extended with more APIs, creating bloat.
A New Kind of Microservice?
When The New Stack first covered the Amazon news, many quickly pointed out to us that the architecture the video folks used was not exactly a monolithic architecture either.
“This definitely isn’t a microservices-to-monolith story,” remarked Adrian Cockcroft, the former vice president of cloud architecture strategy at AWS, now an advisor for Nubank, in an interview with The New Stack. “It’s a Step Functions-to-microservices story. And I think one of the problems is the wrong labeling.”
He pointed out that in many applications, especially internal applications, the cost of development exceeds the runtime costs. In these cases, Step Functions make a lot of sense to save dev time, but can cost for heavy workloads.
“If you know you’re going to eventually do it at some scale,” said Cockcroft, “you may build it differently in the first place. So the question is, do you know how to do the thing, and do you know the scale you’re going to run it at?” Cockcroft said.
The Google paper tackles this issue by making lives easier for the developer while letting the runtime infrastructure bets figure out the most cost-effective way to run these applications.
“By delegating all execution responsibilities to the runtime, our solution is able to provide the same benefits as microservices but with much higher performance and reduced costs,” the Google researchers wrote.
Microservices only survive by accepting the fiction that network communication is free. https://t.co/6XmXB0cLdq
— Paul Snively (@paul_snively) December 28, 2023
A Year of Reconsideration
This year has been a lot of basic architectural reconsiderations, and microservices are not the only ideal being questioned.
Cloud computing, for instance, has also come under scrutiny.
In June, 37signals, which runs both Basecamp and the Hey email application, procured a fleet of Dell servers, and left the cloud, bucking a decades tradition of moving operations off-prem for vaguely defined greater efficiencies.
“This is the central deceit of the cloud marketing, that it’s all going to be so much easier that you hardly need anyone to operate it,” David Heinemeier Hansson explained in a blog post. “I’ve never seen it. Not at 37signals, not from anyone else running large internet applications. The cloud has some advantages, but it’s typically not in a reduced operations headcount.”
Of course, DHH is a race car driver, so naturally he wants to dig into the bare metal. But there are others willing to back this bet. Later this year, Oxide Computers launched their new systems hoping to serve others with a similar sentiment: running cloud computing workloads, but more cost-effectively in their own data centers.
And this sentiment seems to be at least considered more now that the cloud bills are coming due. FinOps became a noticeable thing in 2023, as more organizations turned to companies like KubeCost to control their cloud spend. And how many people were taken aback by the news that a DataDog customer received a $65 million bill for cloud monitoring?
Arguably, a $65 million observability bill might be worth it for an outfit that generates billions in revenue. But as chief architects take a harder look at engineering decisions made in the last decade, they may decide to make a few adjustments. And microservices will not be an exception.
TNS cloud native correspondent Scott M. Fulton III contributed to this report.