The Kubernetes Network Security Effect
Kubernetes has a built-in object for managing network security: NetworkPolicy. While it allows the user to define the relationship between pods with ingress and egress policies, it is basic and requires very precise IP mapping of a solution — which changes constantly, so most users I’ve talked to are not using it.
Still Stuck with Firewall?
Back in the day, a network security policy was defined with IP addresses and subnets. You would define the source and destination, then the destination port, then action and track options. Over the years, the firewall evolved and became application-aware, with added capabilities for advanced malware prevention and more. It is no longer a firewall, but a full network security solution.
However, most network security solutions — even today — use IP addresses and ranges as the source and destination. This was the first challenge when these devices moved to the cloud. How can you define source/destination IP in such a rapidly changing environment, where IP addresses change all the time; an IP is assigned to a database workload and the next minute it is assigned to the web workload. In addition, if you want to understand the cloud and see the connections prior to network address translation, you must be inside the application. In Kubernetes in most cases, when a pod connects to an external resource, it will go through Network Address Translation — meaning the destination will see the source IP as the worker node address and not the pod.
For Infrastructure as a Service (IaaS) cloud deployments, most companies can solve this challenge by installing their network security solution with a proxy on a virtual machine (VM).
But when it comes to Kubernetes, it is just not working. Why?
- A normal pod in Kubernetes is just a few MBs, so you cannot deploy a full flagged network security solution in a pod. Placing it outside of Kubernetes solves the North-South hygiene to some extent (traffic in and out of the network), but not the East-West (traffic within the network and in-cluster connectivity).
- Kubernetes is the cloud on steroids — pods scale up and down rapidly. IP assignment changes and the rules cannot be bound to IP addresses and subnets.
- A fully flagged network security is not required. For example, there is no requirement to do deep packet inspection inside Kubernetes. Most companies are looking for East-West micro-segmentation — basically firewalling.
Lucky for us, Kubernetes was created with the NetworkPolicy object. This object treats each pod as a permitter on its own, and you can define Ingress policy and Egress policy. Both policies can leverage IP addresses, subnets (CIDR) and labels. Unfortunately, Kubernetes does not support FQDN (Fully qualified domain name) in the native security policy. This means that it’s impossible to create a policy that limits the access to S3 or Twitter (for example).
Network security is enforced by the network layer and the most common layers are Calico, Flannel and Cilium. By design, the Kubernetes network is flat. One microservice from one namespace can connect to another microservice, even if it is in another namespace.
Struggling with Building a Kubernetes Network Policy That Works
You would expect users to use network policies, but most are not using it.
Creating a network policy is an iterative task:
- Map the communication between different elements, the resources that access the application, the resources the application connects to, the ports and the protocols.
- Create a policy.
- Run the application and watch to see if everything works.
- Find and fix the things you missed.
- Repeat every time your network or an application changes.
The catch is that in Kubernetes your application, which is composed of pods (microservices), can change on a daily basis. There is no way you can keep the same pace as your development team — updating the network policy every single time they push changes to an application.
Imagine you map all the communication patterns, create a network policy accordingly and everything works. A few hours later, a developer is pushing a new version of a microservice that uses an API from a different pod, and it stopped communicating with the existing pod and with an external website. Since you forgot to update the network policy, your new microservice stops working. You cannot debug what is wrong, because there are no network logs in Kubernetes. Not only that, but even if you do succeed in fixing the issue, you might still prefer to keep the old policy that allows the new pod to communicate with the pod it stopped communicating with — making the network policy incorrect and missing the micro-segmentation goals.
Finally, Kubernetes does not have a built-in capability for visualizing network traffic, so if you break a connection between two microservices, good luck in debugging it.
Kubernetes network policy is configured by allowing rather than blocking. This means that if you want to block individual objects from a specific destination, you need to choose a different solution.
Lastly, the most annoying part is that the Kubernetes network policy is set in such a way that if pod A and B need to communicate, you need to define egress traffic for pod A and ingress traffic for pod B. This is prone to errors and incredibly challenging to debug.
In the above example, we show a native Kubernetes policy for a pod which is labeled as “C”. The policy configures the objects so that pod C:
- Connects to pod “A” on port 443 and 80
- Initializes traffic to pod “B” on port 443 and 80
- Initializes traffic to 10.128.0.1/24
Most organizations just do North-South network security (outside of the cluster) and pray that nothing will break this security control.
By design, Kubernetes security also suffers from the following issues:
- The Identity problem: if pod A, with two containers, connects to pod B, pod B sees the incoming connection from pod A, however, it does not know which container created this connection. This means there is no way to implement security guardrails granularly enough on the pod level. As a result, if malicious software is running in my pod, it will be able to communicate with other pods.
- Clear connections: all the connections in Kubernetes are based on the application/developer programming. Meaning that if the app has an HTTP REST API, an attacker can intercept and decode the communication (as of today, most of the in-cluster communication is not encrypted).
This figure demonstrates the intention versus the actual flow. The network administrator set the policy to allow web-to-database connections. His intent was to allow NGNIX, running in the web pod, to communicate with the SQL server. However, this also means that malware running in one of the web pods can communicate with the SQL server.
Istio to the Rescue
A service mesh is a way to control how different parts of an application share data with one another. Unlike other systems for managing this communication, a service mesh is a dedicated infrastructure layer built right into an application. This visible infrastructure layer can document how well (or not) different parts of an application interact, so it becomes easier to optimize communication and avoid downtime as an app grows.
Istio is the most popular service mesh solution available today.
To overcome the design issues we’ve discussed so far, Istio adds a sidecar container in order to identify individual workloads and moves the east/west traffic to mTLS. Now if pod A connects to pod B, pod A and B will communicate by first authenticating their certificates. A malicious attacker will have no way to intercept and decode the traffic.
This is great! But the fact is that most organizations are still not using Istio either. In fact, the last CNCF report from late 2020 indicates that only 30% of Kubernetes users are using a service mesh (Istio or otherwise). This is probably because Istio is very complex, and it has a performance penalty and latency.
Not only that, but it suffers from the same identity problem described above — namely that if a malicious actor enters pod A and creates a connection to pod B, it will still be allowed access as long as the Istio policy allows for such.
In the above diagram, each pod has an Envoy — a proxy that secures the communication from the original container by using a mutual TLS tunnel. It can be seen that the proxy (Envoy) does not care about the identity of the container. It can be a malicious container that communicates with other services and is awarded the identity that Istio/Envoy provides.
Kubernetes Network Security Best Practices
While the challenges described above are quite limiting, there is still much that can be done.
A Kubernetes network security solution should follow the following guidelines
- Enforce “Zero trust”: Each microservice acts as its own permitter. As such, it is recommended to follow the zero-trust model: do not trust and always verify! Whereby each request is authenticated and authorized before the access is approved.
- Upgrade to Mutual TLS: it is recommended to use mutual TLS in order to encrypt the communication between the different microservices. This will ensure that even if an attacker is present on the host, it cannot intercept and decode the traffic.
- Provide Network visibility: you cannot protect something that you cannot see. Visibility is the key for understanding communication patterns; not only of what is working, but also what is not working, or got dropped, etc.
- Apply robust policy to meet rapid changes: When it comes to policy language, use a language that handles the constant changes in microservices. In most cases the change will be inside of the cluster. Meaning that once you set the ingress/egress traffic from/to the cluster, most of the changes will happen in the communication between the microservices.
To learn more about Kubernetes and other cloud native technologies, consider coming to KubeCon+CloudNativeCon Europe 2021 – Virtual, May 4-7.