Evolving CNCF’s Telepresence: Adopting a TUN Device to Deliver Stability and Portability
In a microservice environment, your services will issue network requests to other services and applications, typically using Kubernetes’s DNS resolution mechanisms for service discovery. Easy enough when the application runs in-cluster, but how can you run it locally? It is neither practical nor sustainable to modify your application to be aware that it is running locally. Part of the beauty of cloud native is being able to develop the application once and know that it will work the same everywhere without modification. Telepresence helps remove the middleman of the API server to let you talk directly to the cluster.
The goal of cloud native development is to become more flexible and scalable so software can be built and shipped quickly. Cloud native technologies provide many benefits, but it’s also critical to understand the challenges a cloud native approach introduces. Developers no longer rely on the relative safety of separate local, test and production environments. Instead, the developer works with remote Kubernetes clusters from a local machine, possibly without full insight or visibility into what is happening. Switching from local to remote context can also increase cognitive load and inevitable latency, which in turn leads to slower development feedback loops. Against this backdrop, Telepresence was developed: to make the remote local.
Generally speaking, there are two ways to interact with a remote Kubernetes cluster — through its ingress, accessing an application, or its API server, usually via kubectl. As a developer, these approaches aren’t always effective because you need more to code, ship, and run a distributed service-oriented application.
Take It Away, Telepresence
The open source CNCF Telepresence tool solves these issues by providing a network connection from your local machine directly into the remote cluster. Telepresence lets you call services by their DNS name and get the same response as though you were running in-cluster. It makes the remote local, allowing you to run your application locally without any special logic, as though it were running in production.
But how does this work? How does Telepresence connect you to Kubernetes? It’s all about networking. Typically, Telepresence’s approach to networking includes a local daemon running on your machine that opens a tunnel to a pod on the cluster — the traffic manager — and moves traffic to and from the tunnel.
So how does traffic make it from a local application to the daemon? This is where the TUN device comes in.
Tapping the TUN Device
Or rather, this is where the TUN device should come in, and it will. But first, let’s look at the context that led to the inclusion of the TUN device.
The Networking Problem
Early versions of Telepresence used firewall rules for cluster connectivity. This isn’t unusual, as firewalls are often used to filter network traffic or to redirect traffic from one endpoint to another. But this approach was riddled with problems. To name a few, it:
- Introduces a dependency on SOCKS and SSH, which may not be present on every user’s machine.
- Is platform-specific and requires a different configuration for every platform.
- Offers no straightforward way for Telepresence to create such firewall redirects on Windows.
- Can create problems if the user has pre-existing firewall rules that conflict with Telepresence or that need to be manually cleaned up.
- Does not scale. Large clusters created thousands of rules that had to be constantly maintained.
The TUN Solution
The aim with newer versions of Telepresence (2.3.0 onward), then, was to use a TUN device to solve some of these challenges. Now connecting to the cluster via Telepresence is much more like connecting to a VPN. The main use case for TUN devices is giving VPN clients access to private networks. As a side bonus, this new approach to networking provides stability and portability gains.
A TUN device is a virtual point-to-point connection or virtual network card that grants access to a specific network that routes traffic through a physical device. As an example, think of how a Linux system connects to the internet. There is a physical ethernet card on the machine, and there is an ethernet cable plugged into the card. The Linux kernel represents this ethernet card as a network interface, often named eth0 or something similar. The kernel knows that this network interface has a specific IP address assigned to it, such as 192.168.1.103, and uses a specific gateway, such as 192.168.1.1, to route traffic to a specific network; if there’s a single network card, this is typically 0.0.0.0, i.e., the entire internet.
A TUN device is represented by the kernel in the same way — it has an IP address, might have a gateway, and routes access to a given network. The only difference is that a TUN device is not a representation of an underlying physical device. Instead, when packets are sent to a TUN device, the kernel expects a user space process to read them from it and address them however it likes. In other words, a TUN device is a network interface that does not directly correspond to a physical network device.
As stated, VPNs are the most common use case for TUN devices. In a VPN, the client would read packets from the TUN device, encrypt them and dump them into the open tunnel to the VPN server. The VPN server receives the packets, decrypts them and routes them to their destinations.
- Create a new TUN device.
- Request cluster networking from the traffic manager.
- Allow the traffic manager to run various heuristics to determine the CIDR ranges for pods and services within the cluster and reply to the root daemon.
- Register the traffic manager’s provided CIDRs with the TUN device, allowing the operating system to route traffic to them via the device.
TUN into Telepresence: Everything We Hoped for?
So, problem solved — right? We have a way to plug into the cluster as though it were just any other network (or any other VPN), but did we solve the problems presented by the firewall approach? Here’s what we have determined:
- There’s much less platform-specific code now. All that’s required is the system calls to create and configure the TUN device; the rest is platform agnostic.
- Thanks to the fine folks at wintun.net, there is an easy way to create a TUN device on Windows.
- It requires far fewer dependencies: wintun on Windows, nothing on Mac or Linux.
- No more conflicts with the firewall — we no longer manipulate firewall rules at all (except for some Linux-specific edge cases that are unrelated to connection tunneling).
Telepresence is always evolving to improve. With this in mind, we continue our work on optimizing the TUN device, including enhancements that boost its effectiveness:
- Introducing TCP stress testing.
- Optimizing the logic involved in processing traffic for latency and throughput.
- Liaising with the community to make sure it works with all VPN providers and ensure users have a straightforward way to debug nonworking VPN configurations.
- Scaling the traffic manager to handle as much traffic as users throw at it.
In Conclusion: Starting with Telepresence
Building on Telepresence’s promise of delivering fast local development for Kubernetes, the introduction of the TUN device has simplified networking, creating greater stability and portability while paving the way for performance and scalability gains.
As with all CNCF projects, the success of Telepresence depends on community involvement. With such a variety of laptops, clusters and applications — and the networking challenges these introduce — we’re always looking to receive feedback and hear about both bugs and successful use cases. Please join us in our Slack to ask questions or share your story!
If you’d like to try Telepresence for yourself, it’s easy to get started. You can also contribute to the open source Telepresence project. Ambassador Labs runs regular “meet the maintainers” sessions that are open to beginners and experts alike.