There are six issues that every organization will run into when attempting to implement a microservice architecture at scale, according Susan Fowler-Rigetti, an engineer at Stripe and formerly of Uber. She elaborated on them at the Microservices Practitioner Summit in San Francisco last month.
If you are running less than 100 microservices, you might be able to avoid these issues, she said, but scaling the services to any greater level brings its own set of problems that will need to be addressed in order to run your systems efficiently.
#1: Organizational Silo-ing and Sprawl
An inverse of Conway’s Law states the organizational structure of the company is going to mirror its architecture. So a company moving to microservices often ends up with several microservices teams that are all siloed, said Fowler-Rigetti. In addition, because nobody knows what the other teams are doing, there is no standardization across microservices, and best practices are not shared, leading to tech sprawl.
“Microservices developers and developer teams become just like microservices,” said Fowler-Rigetti. “They get really good at doing one thing and only that thing.” This is great for the specific team but becomes a problem when the developer wants to change teams.
Fowler-Rigetti said that she’s heard from developers who’ve changed teams and felt like they’ve moved to a different company because the rules are all different.
#2: More Ways to Fail
Larger and more complex systems mean more opportunities to fail, and the systems will fail. They always do at some point. With hundreds or thousands of microservices deployed, every single one of them is a possible point of failure.
#3: Competition for Resources
Microservices service organizations are like ecosystems, in that they are really, really complicated and really delicate, said Fowler-Rigetti.
Both hardware and engineering resources are scarce and expensive. And complicated. Unlike monoliths, one can’t just throw unlimited hardware at the problem or increase headcount. This may work in the beginning, she said, but it just doesn’t scale by the time you get to a few dozen microservices.
How does the organization prioritize when there are hundreds or thousands of microservices? Who gets prioritization? Who makes that decision?
#4: Misconceptions about Microservices
Misconceptions are rampant among developers and managers alike, and they can be really dangerous to the delicate microservices ecosystem.
The most popular myth is that microservices are the Wild West. You can do whatever you want, use whatever code, database, programming language etc., as long as you get the job done and other services can depend on you. There is a huge cost to this, as systems can end up having to maintain multiple libraries and database versions.
Another dangerous myth is that microservices are a silver bullet, in that they will solve all your problems. No, said Fowler-Rigetti. Microservices should be a step in the evolution of the company’s arch when it has reached the limit of its capacity to scale — not as a way out of engineering challenges.
#5: Technical Sprawl and Technical Debt
When developer teams build a microservices structure using different languages, individual infrastructures and launching custom scripts, the organization ends up with a huge system where there are a thousand ways to do every single thing.
It may end up with hundreds or thousands of services some of which are running, most of which are maintained, some of which are forgotten about. “You have some script running on a box somewhere doing God knows what and nobody wants to go clean that up,” said Fowler-Rigetti. “They all want to build the next new thing.”
Word to the wise: No customization is scalable.
#6: Inherent Lack of Trust
Since microservices live in complex dependency chains and are completely reliant on each other, and there’s no standardization or communication, then there is no way to know for sure that the dependencies are reliable. There is, she said, no way of knowing that microservices can be trusted with production traffic.
Get Out of the Mess
If you’re a developer in a company moving to microservices, none of this is news to you. So how do you get out of the maze?
Step one, said Fowler-Rigetti, is getting buy-in from all levels of the organization. Standardization is not just best practices, but mission-critical in order to microservices to work. As such, it needs to be adopted and driven at all levels of the stack.
Next is the company needs to “hold all microservices to high architectural, operational, and organizational standards across the entire organization, not on a service-by-service basis,” she explained. Only a microservice that meets these standards is deemed “production-ready.”
Need for Standardization
Fowler-Rigetti shared this chart above showing the levels of the microservices environment from a microservices perspective. The only level that the microservices teams need to be working on is on Layer 4.
Everything else, she said, needs to be abstracted away from them in order for microservices to be successful. This will limit technical sprawl and increase accountability.
A lot of people think they get scalability for free with microservices, but that’s not true when you get to a crazy-large scale.
Next, there needs to a consensus on production-readiness requirements, and those requirements need to be part of the engineering culture. Too often, she said, engineers see standardization as a hindrance, but in a new world of microservices where everything belongs in complex dependency chains, it is not.
No microservices or set of microservices should compromise the integrity of the overall product or system.
What Makes a Service Production-Ready?
Fowler-Rigetti gave a list:
Fowler-Rigetti delved into these categories in more detail:
Stability and Reliability
With the microservices there are more changes and faster deployments, leading to instability. A reliable microservice, she said, is one that can be trusted by its clients, dependencies, and the ecosystem as a whole. She sees stability and reliability as linked, with most stability requirements having accompanying reliability requirements. A development pipeline, with several stages before production, is a good example of this.
Scalability and Performance
A lot of people think they get scalability for free with microservices, Fowler-Rigetti said, but that’s not true when you get to a crazy-large scale. They need to be able to scale appropriately with increases in traffic.
Some languages are not designed to scale efficiently, as they don’t allow for concurrency, partitioning and efficiency. This makes it hard for microservices written in those languages to scale appropriately. Fowler-Rigetti declined to name any specific languages, but said, “I’m sure you can think of some.”
Scalability is how many requests a microservice can handle, she explained, and performance is how well the service can process those tasks. A performant microservice properly utilizes resources, processes tasks efficiently, and handles requests quickly.
A microservice that can’t scale with expected growth is likely to have a drastic increase in incidents and outages. The increase in latency leads to poor availability.
Fault-Tolerance and Catastrophe-Preparedness
To ensure availability, the ultimate goal, the developers need to ensure that none of the ways the microservice can fail will take down the system. So developers need to know all the failure modes and have backups in case failure occurs.
Robust resiliency testing is key to successful catastrophe preparedness, she said, including code testing, load testing, and chaos testing among other pro-active tests. Every single failure mode should be pushed into production to see how it survives.
Given the complexity of the microservices environment and the complex dependency chains, failure is inevitable. Microservices need to be able to withstand both internal and external failures.
Monitoring and Documentation
“Something I discovered in a terrifying way,” Fowler-Rigetti said, “is with microservices architecture is that the state of the system is never the same from one second to another. If you’re not aware of the state of the system, you won’t know when the system fails, and it will fail,” she stated.
Good monitoring tools showing the state of the system at all times are critical. The second most common cause of outages is a lack of good monitoring.
Logging is an essential part of monitoring because you will almost never be able to replicate a bug, according to Fowler-Rigetti. The only way to know what happened is to ensure that you recorded the state of the system at that time. And the only way to do that is through proper logging.
This makes it really easy to trust your services.
Documentation is the bane of every developer, but it is critical. It removes technical debt and allows people from other teams, or new members of the team to come up to speed.
Check out Fowler-Rigetti’s book “Production-Ready Microservices” for more wisdom, including detailed requirements, and a roadmap for moving forward.
Feature image via Pixabay.