Cloud Native / Networking / Service Mesh / Contributed

How eBPF Streamlines the Service Mesh

25 Oct 2021 6:50am, by

There are several service mesh products and projects today, promising simplified connectivity between application microservices, while at the same time offering additional capabilities like secured connections, observability, and traffic management. But as we’ve seen repeatedly over the last few years, the excitement about service mesh has been tempered by practical concerns about additional complexity and overhead. Let’s explore how eBPF allows us to streamline the service mesh, making the service mesh data plane more efficient and easier to deploy.

The Sidecar Problem

Today’s service mesh solutions for Kubernetes require you to add a proxy sidecar container such as Envoy or Linkerd-proxy to every single application pod. That’s right: even in a very small environment with, say, 20 services, each running five pods, spread across three nodes, you’ve got 100 proxy containers. However small and efficient the proxy implementation is, the sheer repetition is going to cost resources.

The memory used by each proxy increases in relation to the number of services that it needs to be able to communicate with. Pranay Singhal wrote about his experiences configuring Istio to reduce consumption from around 1GB per proxy (!) to a much more reasonable 60-70MB each. But even in our small, imaginary environment with 100 proxies on three nodes, this optimized configuration still needs around 2GB per node.

From redhat.com/architect/why-when-service-mesh — every microservice has its own proxy sidecar

Why do we need all these sidecars? This model allows a proxy container to share a network namespace with the application container(s) in the pod. Network namespaces are the Linux kernel constructs that allow containers and pods to have their own independent network stacks, isolating containerized applications from each other. This keeps applications out of each others’ way, and it’s why, for example, you can have as many pods as you like running a web app on port 80 — the network namespaces mean they each have their own port 80. The proxy has to share the same network namespace so that it can intercept and act on traffic to and from the application containers.

Enter eBPF

eBPF is a kernel technology that allows custom programs to run in the kernel. Those programs run in response to events, and there are thousands of possible events to which eBPF programs can be attached. These events include tracepoints, the entry to or exit from any function (in kernel or user space) or — importantly for service mesh — the arrival of network packets.

Importantly, there is only one kernel per node; all the containers (and hence all the pods) running on a node share the same kernel. If you add an eBPF program to an event in the kernel, it will be triggered regardless of which process caused that event, whether it’s running in an application container or even directly on the host.

Applications run within pods in user space, and all applications on the same host share the same kernel

One kernel per host

This is why eBPF is such an exciting technology for any kind of instrumentation in Kubernetes — you only need to add the instrumentation once per node, and all the application pods will be covered. Whether you’re looking for observability, security or networking, eBPF-powered solutions can instrument applications without the need for a sidecar.

The eBPF-based Cilium project (which recently joined the Cloud Computing Foundation at Incubation level) brings this “sidecarless” model to the world of service mesh. As well as the conventional sidecar model, Cilium supports running a service mesh data plane using a single Envoy proxy instance per node. Using our example from earlier, this reduces the number of proxy instances from 100 to just three.

In the sidecar model there is one instance of Envoy within each application pod. In the sidecarless proxy mode, the Cilium CNI communicates with a single instance of Envoy on the node, so there is no need for a proxy inside each pod

Reducing proxy instances with the sidecarless proxy model

Less YAML

Liz Rice
Liz Rice is the Chief Open Source Officer at Isovalent and Chair of the Technical Oversight Committee at the Cloud Native Computing Foundation.

In the sidecar model, the YAML that specifies every application pod needs to be modified to add the sidecar container. This is usually automated – for example, using a mutating webhook to inject the sidecar at the point each application pod is deployed.

In Istio, for example, this requires labeling the Kubernetes namespace and/or pod to define whether the sidecar should be injected — and of course it requires mutating webhooks to be enabled for the cluster.

But what if something goes wrong? If a namespace or pod is incorrectly labeled, the sidecar won’t be injected and the pod will not be connected to the service mesh. And worse, if an attacker compromises the cluster and is able to run a malicious workload — say, a crypto-currency miner — they will be unlikely to label it so that it participates in the service mesh. It won’t be visible through the traffic observability that the service mesh provides.

In contrast, in the eBPF-enabled, sidecarless proxy model, the pods do not need any additional YAML in order to be instrumented. Instead, a CRD is used to configure the service mesh on a cluster-wide basis. Even pre-existing pods can become part of the service mesh without needing a restart!

If the attacker attempts to bypass Kubernetes orchestration by running the workload directly on the host, eBPF programs can see and control this activity, because it is all visible from the kernel.

eBPF-Enabled Network Efficiency

Eliminating sidecars isn’t the only way that eBPF optimizes a service mesh. eBPF-enabled networking allows packets to take short-cuts that bypass parts of the kernel’s network stack, and this can lead to significant performance improvements in Kubernetes networking. Let’s see how this applies in the service mesh data plane.

Cilium directly connects the app to the node-based proxy on the host

Network packets travel through a much shorter path in the eBPF-accelerated, sidecarless proxy model for service mesh

In the case of service mesh, with the proxy running as a sidecar in a traditional network, the path a packet has to take to reach the application is pretty tortuous: an inbound packet has to traverse the host TCP/IP stack to reach the pod’s network namespace via a virtual Ethernet connection. From there, the packet has to go through the pod’s network stack to reach the proxy, which forwards the packet through the loopback interface to reach the application. Bearing in mind that traffic has to flow through a proxy at both ends of the connection, this results in significant increases in latency compared to non-service mesh traffic.

An eBPF-based Kubernetes CNI implementation such as Cilium can use eBPF programs, judiciously hooked into specific points in the kernel, to redirect the packet along a much more direct route. This is possible because Cilium is aware of all the Kubernetes endpoints and service identities. When a packet arrives on the host, Cilium can dispatch it straight to the proxy or pod endpoint to which it is destined.

Encryption in the Network

Given a networking solution that is aware of Kubernetes services, and provides networking connectivity between the endpoints of those services, it is not surprising that it can offer the capabilities of a service mesh data plane. But these capabilities can go beyond basic connectivity. One example is transparent encryption.

It’s common to use a service mesh to ensure that all application traffic is authenticated and encrypted. This is achieved through mutual TLS (mTLS); the service mesh proxy component acts as the end-point for the network connection, and negotiates a secure TLS connection with its remote peer. This connection encrypts traffic between the proxies, without having to make any changes at all to the application.

But TLS, managed at the application layer, is not the only way to achieve authenticated and encrypted traffic between components. Another option is to encrypt traffic at the network layer, using IPSec or WireGuard. Because it operates at the network layer, this encryption is entirely transparent not only to the application but also to the proxy — and it can be enabled with or without a service mesh. If your only reason for using a service mesh is to provide encryption, you may want to consider network-level encryption. Not only is it simpler, but it can also be used to authenticate and encrypt any traffic on the node — it is not limited to only those workloads that are sidecar-enabled.

eBPF is the Data Plane for the Service Mesh

Now that eBPF is widely supported in the kernel versions used in Linux production distributions, enterprises can take advantage of it for a more efficient networking solution, and as a more efficient data plane for service mesh. Solo.io described this at the recent ServiceMeshCon as “Super Charging your Service Mesh with eBPF”.

Last year I made some predictions, on behalf of the CNCF’s Technical Oversight Committee, about consolidation and clarity in the area of service mesh. In the same keynote, I covered the likelihood of eBPF becoming the basis for more projects and for more widely deployed capabilities. These two ideas are now coming together, as eBPF appears to be the natural path for the service mesh data plane.