How Deutsche Telekom Manages Edge Infrastructure with GitOps
Deutsche Telekom (DT) is the leading telecommunications company in Germany and the EU, with over 242 million mobile users and 100 billion Euro in annual revenue. DT operates at a scale that few companies do today. Of particular interest is their foray into edge computing and how they manage their edge infrastructure at massive scale with a small team.
Edge computing, as the name suggests, is processing as much as possible as close as possible to the end user. That may mean processing data on a local server, a nearby server or even directly on the device. The goal is to reduce latency and improve the user experience. Edge computing is a key technology for the Internet of Things and 5G, and not just for telecommunications companies.
Deutsche Telekom, being one of the largest telecommunications companies in the world, needed to leverage the power of edge computing to deliver advanced mobile services to its large customer base. While that is a worthwhile goal to have, it comes with complex challenges at every step.
Vuk Gojnic, squad lead for Cloud Native / Kubernetes Platform at Deutsche Telekom was tasked with this project and to deliver support to application owners. These application owners further provide infrastructure services to vendors and application developers.
Taking the Platform Approach
Gojnic soon realized that it would be futile to try to build this system the traditional way, especially considering his team of about 10 to 15 were already busy with managing other infrastructure projects. To be successful, this edge platform required a new approach, one that reduces manual operations and does not compromise on essential requirements like performance and data security.
Gojnic decided to take a platform approach to building out DT’s edge infrastructure, a strategy recommended by the State of DevOps report over the past few years as a key indicator of high-performing DevOps teams. It involves building an internal developer platform from where resources can be templatized and easily created for consumption by development teams. A dedicated platform team like Gojnic’s manages and maintains the platform. For the platform team, the benefit is less manual duplicate effort and the ability to build solutions that easily scale across the entire organization.
The new platform would be architected to support thousands of applications and services, and handle billions of events per second. The platform would run on more than 10,000 Kubernetes clusters, spread across 10 data centers and 10,000 edge locations.
GitOps to Manage the Platform
Gojnic’s team enlisted help from the team at Weaveworks to help them follow a GitOps approach to build out this platform. GitOps is a modern software delivery practice that relies on Git repositories as the single source of truth. It endorses key principles such as declarative infrastructure, where every part of the system is described in a Git repository. It leverages the built-in version control capabilities of Git to keep track of all changes and automate compliance. Additionally, it uses an open source tool like Flux to spot whenever the production system drifts from the declared state in Git and reconciles the system back to its original state. These practices ensure that systems such as edge infrastructure can be built in a scalable, modern way that reduces effort, delivers the required performance and does not compromise on vital security and compliance requirements. To get started with GitOps, we suggest downloading Weave Gitops Core.
Shipping at the Edge with Das Schiff
DT built this new platform and called it Das Schiff, German for “The Ship.” It even open sourced the platform, describing it as “the engine for establishment and supervision of autonomous cloud native infrastructure (self)managed in a GitOps loop.”
At the heart of Das Schiff is the Flux CD tool to manage all aspects of continuous delivery for the platform. Digging deeper, the key components of Flux in use are its Kustomization controller and Helm controller. The Kustomization controller watches for changes made to the cluster, generates a YAML file for these changes, generates the Kubernetes manifests and runs pipelines in their defined order. The Helm controller generates Helm charts from objects and automates actions like testing, rollback and uninstalling of packages. Das Schiff also uses Prometheus and Logstash for monitoring time-series metrics and logs. You can read more about Das Schiff’s architecture in this WeaveWorks blog post.
Das Schiff enables DT teams to provision edge-specific hardware such as radio or network adaptors and hardware accelerators on physical machines. It enables virtual networking at the edge to support DT’s push for 5G and edge processing. Das Schiff also supports “mixed mode,” where teams can run a combination of virtualized and non-virtualized bare metal nodes in Kubernetes clusters. Considering the resource constraints of the edge, Das Schiff uses AWS Firecracker to provision microVMs to significantly reduce the footprint of nodes in a pool.
While this is a huge achievement, it did not come without challenges. As a result of its cloud native architecture, the new platform is massively distributed and extremely dynamic. However, by being strategic about its approach and leveraging the platform approach and GitOps, DT was able to build a system that can scale to the edge and support diverse teams and business requirements.
If this project of DT interests you, or if your organization is pushing for 5G networks or edge computing, I encourage you to check out the Das Schiff project on GitHub. To learn more about how GitOps aids 5G rollouts, visit our website.