Mutual TLS: Securing Microservices in Service Mesh
The world is moving toward microservices-based applications. Service mesh is emerging as one of the main architectures to deploy and manage microservices environments, because of the benefits it brings with advanced traffic management, holistic observability and better security. Microservices communicate with each other through APIs, so securing communications between the individual services is becoming more important than ever and has to be addressed.
Mutual TLS (mTLS) secures communication between microservices in a service mesh. It uses cryptographically secure techniques to mutually authenticate individual microservices and encrypt the traffic between them.
According to Google, 90% of internet traffic is encrypted to prevent eavesdropping and man-in-the-middle attacks. Yet many cloud native application deployments today do not have encrypted communications between microservices, based on the weak assumption that traffic inside the cluster is secure and not susceptible to attacks. This is a risky assumption. Not only should communications between microservices be secured, but many regulations (like GDPR and HIPAA) also recommend end-to-end encryption to protect all data in transit.
In this era of zero trust security, each individual microservice communication (request-response) must be authenticated, authorized and encrypted. Here’s why:
- Authentication uniquely identifies each microservice and ensures that a rogue microservice cannot access your sensitive data.
- Authorization determines which microservices can communicate with each other. You wouldn’t want the microservice that handles your company’s credit card processing to communicate with the microservice that manages the door badge reader for your office building.
- Encryption not only prevents third parties from intercepting and viewing your data in transit, but also thwarts man-in-the-middle attacks. You definitely don’t want credit card data to be visible to unauthorized entities on the network.
As companies move towards zero trust security, mTLS provides a cryptographically secure way to authenticate, encrypt and enforce communication policies between microservices.
What Is mTLS?
Mutual TLS (or mTLS) refers to transport layer security that uses a two-way encrypted channel between the server and client. Today, mTLS is the preferred protocol for securing communications among microservices in cloud native applications.
While transport layer security (TLS) has been used to secure traffic between clients and servers on the internet for many years, it typically uses unidirectional identification — where a server presents a certificate to prove its identity to a client. A basic example of this one-way authentication is when you access your bank account online. The server sends your computer a certificate to prove it is actually the bank you are connecting to. That same certificate includes a public encryption key that is used to create a cryptographically secure encrypted link between you and the bank over which data passes.
Mutual TLS extends the client-server TLS model to include authentication of both parties. Where the bank relies on other, application-specific mechanisms to confirm a client’s identity — such as a user name and password (often accompanied by two-factor authentication) — mTLS uses x.509 certificates to identify and authenticate each microservice. Each certificate contains a public encryption key and an identity, and is signed by a trusted certificate authority who proves that the certificate represents the entity presenting it.
In mTLS, each microservice in a service mesh verifies the other’s certificate and uses the public keys to create encryption keys unique to each conversation. This enables the communications between pairs of microservices to be authenticated and encrypted.
How mTLS Works in a Service Mesh
What we have learned at NetScaler, is, at a high level, the process of authenticating and establishing an encrypted channel using certificate-based mutual authentication in a service mesh involves the following steps:
- Microservice A sends a request for the certificate of microservice B.
- Microservice B replies with its certificate and requests the certificate of Microservice A.
- Microservice A checks with the certificate authority that the certificate belongs to Microservice B.
- Microservice A sends its certificate to microservice B and also shares a session encryption key (encrypted with the public key of microservice B).
- Microservice B checks with the certificate authority that the certificate it received belongs to microservice A.
- With both microservices mutually authenticated and a session key created, communication between them can be encrypted and sent via the secure link.
The Role of the Service Mesh Control Plane for mTLS
Istio is perhaps the most well-known, feature-rich and mature service mesh control plane that provides secure service-to-service communication, without the need for any application code changes. From an mTLS perspective, Istio and all service mesh control planes must offer:
- A certificate authority that handles certificate signing and management.
- A configuration API server that distributes communication policies (such as authentication policies, authorization policies and secure naming information) to the proxies.
The control plane distributes the certificates and authorization policies to the sidecars. When two microservices need to communicate, the sidecars establish a secure proxy-proxy link and are responsible for encrypting the traffic through it.
The Role of Sidecars for mTLS
While it is possible to define communication security policies and carry out authentication and encryption in the application microservices themselves, it requires implementing authentication mechanisms, defining authorization policies, and traffic encryption in the code of each microservice.
This is inefficient because you must write these into each and every microservice, you must update it when the application changes, and you need to test it on every release to ensure that the new code does not break the communication. This can be a burden on developers, leads to errors and prevents them from focusing on code that implements business logic. In a service mesh, the overhead of securing communications is offloaded to sidecars proxies, like NetScaler CPX or Envoy, that sit alongside each microservice.
When two microservices need to communicate, it is the sidecars that establish the mTLS connection through which encrypted traffic will flow. The sidecars exchange certificates and authenticate each other with the certificate authority. They check the authorization policies in the configuration pushed by the control plane, to see if the microservices are allowed to communicate. If they are, the sidecars will establish a secure link using a generated session key, so that all the data between the microservices will be encrypted. The actual microservice application code itself is not affected. Sidecars, therefore, make application development agile and more efficient.
Why Non-mTLS Communication Is Still Important
Sometimes it is important for microservices to communicate with external sources or microservices that may not have mTLS enabled, or may not be part of the same mTLS ecosystem. In these cases, data must be sent in plain text over an unencrypted and/or unauthenticated channel.
Microservices may need to make or receive API calls to other applications, which may be owned by a different app team who are not in a position to enable mTLS — or even an external third party. Similarly, microservices may need to send telemetry data to a non-mTLS observability stack — after all, every SRE needs telemetry data to gain visibility for root cause analysis and troubleshooting.
You need to be aware of how your chosen service mesh control plane implements mTLS and what features are implemented by default, or you risk breaking your applications.
Furthermore, as multicluster deployments become more popular, there will be an increase in the number of mTLS “mismatches” — as some clusters will have it enabled and others not.
Investigate your environment for where a microservice may need to accept both mTLS and non-mTLS traffic, so you can plan proactively.
Implementing mTLS in a Service Mesh
There are many service mesh control planes with varying levels of maturity and unique features. When it comes to mTLS, all service meshes work on the same principles to secure communications between microservices. Many service meshes offer a solid mTLS baseline, but they differ in their overall capability and the way they are deployed. You need to be aware of how your chosen service mesh control plane implements mTLS and what features are implemented by default, or you risk breaking your applications.
Istio, for example, is advanced and flexible with its mTLS implementation. It offers granular levels to define the extent of your mTLS deployment. Mutual TLS can be set specific to a service, across a namespace, or over the entire service mesh — obviously, Istio selects the narrowest matching policy for each service.
This granularity enables you to assign namespace ownership to different organizational groups and lets them define their own mTLS settings. That said, each group needs to be mindful of the level of mTLS restriction they deploy — especially for microservices that communicate externally.
Watch Out for mTLS Defaults: Don’t Break Your Application While Trying to Secure It
You should pay attention to how your service mesh implements mTLS by default. Istio supports three mTLS modes that enable you to control how microservices communicate in a service mesh:
- Permissive: Proxies will accept both mTLS and plain text traffic.
- Strict: Proxies accept only mTLS traffic.
- Disable: Mutual TLS is disabled.
Sensibly, Istio configures each proxy to use mTLS in permissive mode by default, which allows a service to accept both plain text and mutual TLS traffic. This flexibility is a best practice for all service mesh implementations because it lets microservices accept non-mTLS traffic from other sources so that you do not break the applications.
Permissive mode helps you get started with mTLS with less risk of breaking your applications because you can deploy, test communications and tighten security incrementally. This is extremely useful during workload migrations, because it allows microservices that cannot use mutual TLS to be moved into the mesh and still communicate.
Be aware that permissive mode is a great default, but it does actually weaken your security posture because it opens a door for plain text communication with other sources. While it may be tempting to implement strict mTLS from the start because it is more secure, it is a strategy that requires meticulous planning, full visibility, and analysis of your communication flows. There are many things that can break applications when you move to strict mode. For example:
- Microservices without sidecars will not complete an mTLS handshake; you may have to add a sidecar to those microservices without one.
- Incorrect naming of service ports will cause sidecars to reject mTLS requests; pay extra attention to Istio’s precise naming convention of $protocol-$service.
Be Aware of mTLS Differences in Various Service Mesh Control Planes
Of course, Istio is not the only service mesh to offer mTLS to secure communications — others offer similar functionality, but there are differences.
Red Hat OpenShift is based on the Istio control plane and has similar mTLS features, including granular implementation and Permissive mode by default, but replaces the underlying BoringSSL with OpenSSL.
LinkerD also offers mTLS, which by default is automatically enabled for HTTP-based communication between meshed pods via the LinkerD proxies. While LinkerD acknowledges some gaps in its mTLS offering, the latest 2.9 release addresses some of them and extends mTLS protection to all TCP connections — which is a big step on the road to zero trust communications.
In the Kuma service mesh, mTLS is not enabled by default. When it is enabled, every connection between data plane proxies is denied by default. While this is a laudable security stance, it does mean that you have to explicitly allow connection using the
TrafficPermissions feature. That said, Kuma lacks the breadth of features for secure communications that Istio offers and it will take some development for Kuma to catch up.
Amazon Web Services‘ AWS App Mesh also supports encryption between microservices. You can use AWS Certificate Manager or bring your own. AWS App Mesh supports “strict” and “permissive” modes.
Meeting Your mTLS Requirement
Mutual TLS is a critical component of zero trust networking and is vital to secure the communications between the microservices in your service mesh. Implementation, however, is not entirely straightforward. You need to be aware that microservices often communicate with non-mTLS entities and you should make allowances accordingly. You should choose the communication mode carefully by weighing convenience versus security. Lastly, whichever service mesh control plane you choose, pay attention to the specific implementation for mTLS — they are not all the same.
Proper planning prevents poor performance. It’s no different for mutual TLS.