When You Do (and Don’t) Need a Service Mesh
One of the questions I often hear is: “Do I really need a service mesh?” The honest answer is “It depends.” Like nearly everything in the technology space (or more broadly “nearly everything”), this depends on the benefits and costs. But after having helped users progress from exploration to production deployments in many different scenarios, I’m here to share my perspective on which inputs to include in your decision-making process.
A service mesh provides a consistent way to connect, secure and observe microservices. Most service meshes are tightly integrated with Kubernetes and other orchestration platforms. There’s no way around it; a service mesh is another thing, and at least part of your team will have to learn it. That’s a cost, and you should compare that cost to the benefits of operational simplification you may achieve.
But apart from costs and benefits, what should you be asking in order to determine if you really need a service mesh? The number of microservices you’re running, as well as urgency and timing, can have an impact on your needs.
How Many Microservices?
If you’re deploying your first or second microservice, I think it is just fine to not have a service mesh. You should, instead, focus on learning Kubernetes and factoring stateless containers out of your applications first. You will naturally build familiarity with the problems that a service mesh can solve, and that will make you much better prepared to plan your service mesh journey when the time comes.
If you have an existing application architecture that provides the observability, security and resilience that you need, then you are already in a good place. For you, the question becomes when to add a service mesh. We usually see organizations notice the toil associated with utility code to integrate each new microservice. Once that toil gets painful enough, they evaluate how they could make that integration more efficient. We advocate using a service mesh to reduce this toil.
The exact point at which service mesh benefits clearly outweigh costs varies from organization to organization. In my experience, teams often realize they need a consistent approach once they have five or six microservices. However, many users push to a dozen or more microservices before they notice the increasing cost of utility code and the increasing complexity of slight differences across their applications. And, of course, some organizations continue scaling and never choose a service mesh at all, investing in application libraries and tooling instead. On the other hand, we also work with early birds that want to get ahead of the rising complexity wave and introduce service mesh before they’ve got half-a-dozen microservices. But the number of microservices you have isn’t the only part to consider. You’ll also want to consider urgency and timing.
Urgency and Timing
Another part of the answer to “When do I need a service mesh?” includes your timing. The urgency of considering a service mesh depends on your organization’s challenges and goals, but can also be considered by your current process or state of operations. Here are some states that may reduce or eliminate your urgency to use a service mesh:
- Your microservices are all written in one language (“monoglot”) by developers in your organization, building from a common framework.
- Your organization dedicates engineers to building and maintaining org-specific tooling and instrumentation that’s automatically built into every new microservice.
- You have a partially or totally monolithic architecture where application logic is built into one or two containers instead of several.
- You release or upgrade all-at-once after a manual integration process.
- You use application protocols that are not served by existing service meshes (so usually not HTTP, HTTP/2, gRPC).
On the other hand, here are some signals that you will need a service mesh and may want to start evaluating or adopting early:
- You have microservices written in many different languages that may not follow a common architectural pattern or framework (or you’re in the middle of a language/framework migration).
- You’re integrating third-party code or interoperating with teams that are a bit more distant (for example, across a partnership or M&A boundary) and you want a common foundation to build on.
- Your organization keeps “re-solving” problems, especially in the utility code (my favorite example: certificate rotation, while important, is no scrum team’s favorite story in the backlog).
- You have robust security, compliance or auditability requirements that span services.
- Your teams spend more time localizing or understanding a problem than fixing it.
I consider this last point the three-alarm fire that you need a service mesh, and it’s a good way to return to the quest for simplification. When an application is failing to deliver a quality experience to its users, how does your team resolve it? We work with organizations that report that finding the problem is often the hardest and most expensive part.
Once you’ve localized the problem, can you alleviate or resolve it? It’s a painful situation if the only fix is to develop new code or rebuild containers under pressure. That’s where you see the benefit from keeping resiliency capabilities independent of the business logic (like in a service mesh).
If this story is familiar to you, you may need a service mesh right now. If you’re getting by with your existing approach, that’s great. Just keep in mind the costs and benefits of what you’re working with, and keep asking:
- Is what you have right now really enough, or are spending too much time trying to find problems instead of developing and providing value for your customers?
- Are your operations working well with the number of microservices you have, or is it time to simplify?
- Do you have critical problems that a service mesh would address?
Keeping tabs on the answers to these questions will help you determine if — and when — you really need a service mesh.