TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
Kubernetes / Open Source / Security

How Cilium’s Mutual Authentication Can Compromise Security

New mutual authentication approach for Kubernetes workloads suffers from eventual consistency, which can produce security vulnerabilities.
Nov 29th, 2023 9:14am by
Featued image for: How Cilium’s Mutual Authentication Can Compromise Security
Featured image by Markus Winkler on Unsplash.

Recently, the Cilium project announced support for a new mutual authentication mechanism that can be deployed transparently to applications with a simple configuration flag. On the surface, this seems to be an easy way to get service-to-service mutual authentication for Kubernetes workloads using Cilium. However, the design suffers from a serious drawback that should not be overlooked:

The entire foundation of mutual authentication in Cilium is eventually consistent.

Eventual consistency in a security implementation’s data path can lead to failures in the intended security properties and cause traffic to proceed between services when it should be disallowed.

How Cilium Mutual Authentication Works

Cilium’s custom mutual authentication mechanism transparently authenticates flows between services and builds on Cilium’s existing Extended Berkeley Packet Filter (eBPF) data plane. Cilium uses eBPF to implement things like service networking, network policy and connection handling.

Cilium uses “mutual Transport Layer Security (mTLS)-less” (or mTLess) to authenticate a service. I call it “less” because it’s not using mTLS for what mTLS is designed to do: authenticating, encrypting and checking the integrity of data flowing over the transport between two peers. Cilium’s mutual authentication implementation is not mTLS, as I’ll explain below.

When Service (or Pod) A wants to talk to Service (or Pod) B, Cilium attempts to authenticate the two peers and then marks a special node-local “auth cache” indicating whether the specific flow is allowed.

Diagram showing the initial connection being dropped since it's not authenticated.

Figure 1: The initial connection will be dropped since it’s not authenticated.

When Pod A wants to talk to Pod B, it flows through the normal Cilium eBPF data plane, but the eBPF code will check whether this connection has been authenticated by checking the node-local auth cache. On the first attempt, the call will not be authenticated, so Cilium will drop the packets. But this will trigger a behind-the-scenes mechanism to try to authenticate the flow between Pod A and Pod B. If successful, it will update the node-local auth cache.

The expectation is the out-of-band, behind-the-scenes mechanism to authenticate Pod A calling Pod B will happen quickly enough that the initially dropped packets will be retried and not cause too much of a delay. The mechanism used behind the scenes is a “mTLess” connection from the cilium-agent (written in Go) to another cilium-agent on the node where Pod B is running. None of this happens in the eBPF data plane but rather in the user-space Cilium agents.

Diagram showing flow authenticated through mTLS connection.

Figure 2: A flow is considered authenticated if an mTLS connection succeeds between cilium-agents running on different nodes representing the identities of two particular services.

I call this connection “mTLess” because it’s used to test authentication and immediately closed, and all the session keys negotiated for encryption and integrity are thrown away. That is, Cilium does not preserve the mTLS security properties through the life of the connection; it only uses the authentication part of the handshake.

Diagram showing lacky mTLS connection ended.

Figure 3: Cilium ends the mTLess connection after the handshake.

If this mTLess connection is successful (i.e., the handshake succeeds), Cilium will consider the flow from Pod A to Pod B to be “authenticated.” At this point, an entry in the node-local auth cache will be updated to indicate flows from Pod A to Pod B should be allowed.

Diagram showing updated auth cache

Figure 4: Cilium updates the auth cache to indicate Pod A is authenticated to call Pod B.

Now when it retries the connection packets, the auth cache will indicate the flow is authenticated, should allow the connection and proceed to the rest of the eBPF data plane (which enforces network and other policies). This node-local auth cache does exhibit a sign of eventual consistency and could get out of sync, but that’s not the most concerning eventual consistency property.

Diagram showing the connection flowing.

Figure 5: Once the cache has been updated and the packets retried, the connection will flow.

The Big Problem with Cilium’s Approach

With a real mTLS connection, after a successful handshake, you expect the rest of the data to be encrypted using the keys negotiated between (and known only to) the participating parties. A successful authentication of a flow in Cilium does not mean it is encrypted (it will be plain text), nor does it guarantee the traffic will be encrypted in such a way that it’s visible only to the parties involved. If you want encryption, you can use Cilium’s encryption options based on WireGuard (or IPSec), but that is simply encryption between two Kubernetes nodes, not specifically the authenticated workloads. A lot can happen between the “mTLess” connection check and putting the actual (sensitive) data on the wire.

As some of the Cilium developers continue to iterate on this implementation, there is talk of transferring some of the keys negotiated in the mTLS connection to the preshared key mechanisms of the underlying node-to-node encryption approaches (e.g., IPSec). This implementation remains to be seen.

Diagram showing different session keys

Figure 6: Traffic is encrypted with session keys that are different between A/B and B/C.

Diagram showing Wireguard-based encryption.

Figure 7: Encryption based on WireGuard uses the same keys.

The real problems for Cilium’s eventually consistent mutual authentication implementation crop up around Cilium’s core identity model. I glossed over the TLS handshake above, but if you read the Cilium docs, you’ll see the X509 certificates used for “mTLess” have an identity model optionally based on the Secure Production Identity Framework For Everyone (SPIFFE). In fact, when deploying the components necessary to implement Cilium’s mutual authentication, you have the option to deploy the SPIFFE Runtime Environment (SPIRE), which is an implementation of SPIFFE that Cilium uses to mint certificates that represent workloads and their identities.

This SPIFFE identity is used in the certs used for the handshake, but SPIFFE is not the foundational universal workload identity that’s used in Cilium. SPIFFE is used as a separate identity layer that maps to Cilium’s existing identity implementation. Cilium builds all of its networking policies around its CiliumIdentity concept. The CiliumIdentity implementation maps an integer to a group of IP addresses (the pod IPs associated with a group of pods). This “integer” and its mapping to pod IP addresses represents the core identity primitive in Cilium.

Diagram showing core identity primitives

Figure 8: Cilium’s core identity primitives are based on integers that map to IP addresses in a node-local cache on each node.

We covered this topic in detail in our blog post “Could network cache-based identity be mistaken?” Because of this issue, we recommend a defense-in-depth posture when thinking about network security with a Container Network Interface (CNI) and service mesh.

Here’s the crux of the problem:

The mapping of all IPs for a given identity, for every identity that exists in the cluster, exists in a local cache on every node in the cluster.

Diagram showing IPs for each identity

Figure 9: All IPs for each identity, for every identity in the cluster, exist in a separate cache on each node.

Diagram of eventual consistency problem

Figure 10: Eventual consistency could end up with wrong or stale IP mappings.

For Cilium’s mutual authentication and policy enforcement to work, these caches must be updated with the correct IP-to-identity mapping. However, updating separate caches across all the nodes in the cluster is an eventually consistent operation. When Cilium’s eBPF data plane attempts to reason about a connection’s policy, it will refer to the IP-to-identity mapping it has in its node-local cache. If that cache is stale or delayed, it will result in incorrect network policy (which could be out of compliance, allow malicious activity, compromise data, etc). Whether or not you’re using WireGuard or IPSec to encrypt traffic between nodes makes no difference to this identity confusion scenario.

This demo illustrates identity confusion when using Cilium’s mutual authentication that leads to traffic that violates networking policies:

Wrapping up

So to summarize:

  • The Cilium project introduced a novel mutual authentication mechanism for Kubernetes workloads.
  • The mutual authentication in Cilium layers on eventual consistency, which can compromise security.
  • Cilium uses “mTLess” for authentication but doesn’t maintain encryption for the entire connection.
  • Cilium’s identity model includes SPIFFE, but its core identity is a separate identity layer based on integers.
  • The core problem is that IP-to-identity mappings are stored in local caches on each node, which can lead to eventually consistent updates.
  • Eventual consistency in Cilium’s mutual authentication can result in incorrect network policy and security vulnerabilities.

To correctly use a CNI that relies on identity to IP address mappings, consider a defense-in-depth posture that layers a service mesh (like Istio Ambient) on top. Istio Ambient implements a sidecarless service mesh that uses mTLS on the data path between services (regardless of their IP addresses). In a service mesh like Istio, the identity model is defined with SPIFFE and rooted in a certificate authority responsible for signing certificates that get used to authenticate traffic.

Learn how Istio Ambient Mesh can cut service mesh overhead by 90% or more.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.