The Hitchhiker’s Guide to Service Mesh
HashiCorp sponsored this post.
At Kubecon EU last week, there was a lot of focus on service mesh, sidecars, microservices, and many more “meshy” things. To help navigate the resources, videos, sessions, and various other content from the event, we thought it would be good to provide a handy reference guide to explore what these concepts are and why they are so prominently featured in the market today. The goal of this guide is not to point the reader in one direction or the other, but help pick apart the tools and terminology used in a new and changing technology space. So don’t panic, grab your towel, and let’s dive into it!
But Why Service Mesh?
There are times when listening to talks about service mesh can feel a little like listening to Vogon poetry, but it really is an important advancement in modern application networking. At its core, service meshes are trying to solve a simple problem, how do I connect my applications efficiently and securely? Followers of traditional networking may feel that this is an unnecessary complication of a problem that has already been solved, but, as many have pointed out, cloud and containers have really changed how these traditional networking concepts are applied. To break it down a little further, let’s use a basic, real-world example of what is expediting the need for service mesh.
Imagine someone building a house. There are underlying components, like electricity, water, plumbing needed to make the home liveable and some more customizable features like furniture, paint color, and appliances. Let’s say you want to ask how things are going (for the sake of this example, it’s a pre-cell phone world). How do you reach the builder? You could simply drive over and ask if you know the address. If he doesn’t know, he needs some sort of information that he can use to reach the builder, like a phone number. Now think about how you get a landline connected to a new home. First, someone from the phone company needs to come and wire the house and connect it to their network, then they assign the home a phone number, then the phone number needs to get published to some sort of directory, then dialed, and finally, we can ask the builder, “how’s it going?”
Now imagine how annoying this would be if the builder told him, “I’m actually going to move this house every couple of days or weeks to a different location and need you to come rewire it for phone service every time it moves.” Wouldn’t it be easier if the builder just had something that followed him around and made him reachable regardless of where the house is? Something like a cell phone? This is the benefit of a service mesh. I don’t want to have to wait on a phone number (or IP address) to be assigned to get them connected with the rest of my services. I should be able to have my services consistently reach out with a single tool (like a sidecar proxy) and that device should get me connected to the other desired services.
Sounds Good, How Do I Get Started?
If the idea of a service mesh sounds as good as a Pan Galactic Gargle Blaster now, great! But like the drink, it’s important to know what exactly it’s made up of and probably best to only have one. For the sake of simplicity, we’ll break down some components of a service mesh and explain their role and significance. This blog post will cover:
- Foundation with Sidecar Proxies
- Traffic Management
- Global Mesh with Gateways
- Observability with Spans and Tracing
Foundation with Sidecar Proxies
Sticking with our theme here, think of sidecar proxies as your Babel fish. Pair these with your services to enable easy communication and other services. Sidecar proxies do exactly what the name implies. They attach to services running in the mesh and as a result, services need to only send requests like HTTP or gRPC to the proxy, instead of having the specific address or location of the service it’s sending the request to. These proxies are commonly referred to as the “data plane”; the plane where workload data moves through. Service mesh providers will often either deploy their own internal sidecars (commonly referred to as the “data plane”) or leverage third-party solutions like Envoy, HAProxy, or NGINX. That’s well and good, but you might be asking how services get paired with a sidecar proxy? The answer is that it’s done at the service level. We’ll use HashiCorp Consul as an example here, but other service mesh providers follow a similar pattern to accomplish this. When adding new services to a Consul service mesh, operators need to include a
service definition. Within this definition, users can define the service type, which ports the service will run on, and other useful information. When adding the service to a Consul service mesh, the user just has to add a
sidecar_service definition and Consul will take it from there. All this does is tell Consul that this newly registered service needs to be paired with a sidecar proxy and it should follow any newly defined or existing traffic policies. With these sidecars in place, we now can start to layer in the capabilities that make service meshes so beneficial.
Now that we’ve established a communication method for services in the mesh, it’s time to apply some rules. Sidecar proxies, on their own, are simply facilitators of requests — they need to be given explicit instructions in order to prevent your service from ending up in the galley of a ship of a Vogon Constructor Fleet. Bad jokes aside, it is important to have some sort of guidance for service to service communication. Fortunately, service mesh providers are able to interface directly with sidecar proxies to shape and control service to service communication. This is why you will often hear a service mesh provider being labeled as a “Control Plane”, as opposed to the data plane we referenced earlier.
The control plane is where the operator of the mesh defines policies and protocols for how services in the mesh should interact with one another. For instance, let’s assume that there is a web service named “frontend” and a database service named “db”. In a service mesh, I could write a rule (in Consul we call these intentions) that any version of my “frontend” is allowed to communicate with that “db” service. As I scale up or down the number of “frontend” services running in my environment, I don’t have to worry about configuring their ability to send requests to “db”, since the sidecar proxies have already been given permission to facilitate that traffic.
These traffic management policies can also be helpful as newer versions of applications are rolled out. Traditionally, testing in “real world” scenarios is a bit of a challenge for developers because it’s not ideal to deploy test applications in production environments. With a service mesh, though, this can be done in a way that it won’t cause extreme panic (remember, don’t panic). CI/CD practices — like canary deployments, A/B testing, etc. — promote the idea of developing quickly and often. With a service mesh, networking no longer becomes a bottleneck for implementing those practices. Versioned apps can be added to the network with metadata tags that will allow set amounts of traffic to be routed to them. In Consul, we call these “traffic splitters” and they can be weighted based on the amount of traffic I want a certain version to see. By doing this, I align networking with the application delivery process to enable faster testing and versioning.
From a security perspective, this is a great first step in preventing unapproved connections, but fortunately, service meshes offer even more advanced security capabilities. Using sidecar proxies enables the network to automate mTLS encryption and authentication between services. What this means is that upon a connection request, services will be asked to provide a certificate for authentication. These certificates are automatically generated and rotated by the service mesh, helping avoid potential outages due to an invalid or retired credential. TLS also adds the benefit of encrypting information in flight, thus preventing it from being replicated or viewed during transmission. Service meshes can also provide a more granular way of security, to enable zero-trust and prevent lateral movement within a larger datacenter environment.
Galactic Mesh with Gateways
Networks, like space, can be big. Really big. As a result, it’s nearly impossible to incorporate every service that will need to interact with the mesh into the service mesh. Some of this may be a result of services residing in specific data centers that organizations do not want the mesh to have visibility into, or because certain services are not able to have a sidecar proxy attached to them. Either way, there should be a way to enable these non-mesh services to both interact with the service mesh and still have the same security and traffic policies attached to their requests upon entering or leaving the mesh. That is where gateways come in. Gateways essentially act as fixed endpoints that the service mesh recognizes as trusted for a variety of purposes. Here are some of the ways that gateways can be leveraged to extend mesh functionality:
- Gateway: Service meshes can span multiple environments, like multiple Kubernetes environments, Nomad clusters, or virtual machines. These environments may operate independently, but need visibility and the ability to communicate with each other. Gateways are a way to exchange service information across these environments and facilitate requests.
- Ingress/API: External clients and services can leverage these gateways to send requests to mesh-based services. The ingress gateway consists of a proxy that has been exposed to the public internet on a single port. Requests that hit this gateway are authenticated and then forwarded based on the mesh’s traffic routing policies.
- Egress/Terminating: Managed service applications in the cloud or non-containerized legacy applications often cannot have a sidecar proxy. Instead, the service mesh provider will utilize a single proxy that facilitates requests to these services from inside the mesh.
Service meshes may use some or all of these in various combinations. Really, it depends on how large the network footprint an organization is looking to manage, but regardless they are powerful tools for helping requests navigate through the mesh.
Observability with Spans and Tracing
“Any man who can hitch the length and breadth of the galaxy, rough it, slum it, struggle against terrible odds, win through, and still knows where his towel is is clearly a man to be reckoned with.”
Just as it’s important to know where your towel is at all times, so too is it important to know what services are running on your mesh. A service mesh is a powerful tool, but it can sometimes feel like there is a large amount of blind trust that operators have to give when adopting one. I have to trust that connections are actually being made by the mesh and are ones that I would approve.
Service meshes can actually provide a much richer level of observability than some other networking environments, as they tend to be more application-aware from a layer 7 standpoint. Sidecar proxies have the ability to capture application-level traffic as well as something called “span” data. Spans are essentially logs of a request as it moves throughout the mesh. At each point that the request hits, it captures this span data, which can then be forwarded to a multitude of monitoring tools. From there, operators can actually drill into each of these spans and extract much more detailed information. Now it’s not just, “a connection failed,” it’s “a connection failed at this specific point and here is the reason for the failure.” This process of capturing span data and interpreting it is called distributed tracing and is a very in-demand capability. Service mesh providers pair tools like Open Tracing, Zipkin and Jaeger with various monitoring solutions to create robust dashboards and alerting capabilities.
A service mesh may not be the answer to life, the universe, and everything, but is a very powerful tool for modern applications. We’ve seen great advancements in how applications are created, deployed, and iterated on, so it only makes sense that to apply some of these practices to how networking is approached. The most important thing to remember is that service mesh adoption is not a switch that gets flipped and suddenly everything is a part of the mesh. It’s a journey that requires incremental changes and a strategy for migrating services from existing environments to these newer, more dynamic ones. At HashiCorp, we’ve tried to capture the steps in this journey and provide materials to help guide organizations along this path. Hopefully, this guide has provided some good foundational knowledge and will help any new hitchhikers navigate KubeCon and other mesh-related events.
And most importantly, so long, and thanks for all the fish.
Feature image via Pixabay.
At this time, The New Stack does not allow comments directly on this website. We invite all readers who wish to discuss a story to visit us on Twitter or Facebook. We also welcome your news tips and feedback via email: firstname.lastname@example.org.