A Deep Dive into the Microsoft Radius Architecture
In my previous article, I looked at the high-level workflow involved in using Microsoft’s newly launched Radius to deploy modern applications. Let’s take a look at the architecture and zoom into the details of how Microsoft built the control plane of Radius.
What Problem Does Radius Solve?
Radius attempts to solve three major issues that are pertinent to modern application deployment.
- Complexity of Kubernetes: Kubernetes, the most popular container orchestration platform, introduces complexity through its highly flexible yet intricate architecture. Developers navigate a labyrinth of objects like pods, services, and deployments, ensuring seamless container interaction, scaling, and management. This complexity burgeons as clusters grow, demanding expertise in persistent storage, network configuration, and security protocols.
- No Clear Separation of Concerns Between Dev and Ops: Kubernetes, despite its advanced orchestration capabilities, often blurs the lines of separation of concerns, a principle vital for system modularity and organizational efficiency. It intertwines infrastructure management, application deployment, and operational tasks, sometimes causing role ambiguity among development and operations teams.
- Lack of Unified Runtimes between Kubernetes and Cloud Platforms: Cloud native applications face a significant challenge with the lack of unified runtimes across different cloud platforms, complicating multicloud strategies and cloud native transitions. Even when applications are deployed in cloud-based managed Kubernetes clusters, consuming other managed services such as databases, object storage, caching services, and message queues needs quite a bit of plumbing. Developers need a unified runtime environment that abstracts Kubernetes and cloud environments.
Radius enables developers to treat Kubernetes, public cloud and edge environments as unified and abstract runtimes to deploy and scale modern applications.
The Influence of Azure on Radius UCP
Before diving into the details of Radius, let’s do a quick recap of how Azure manages cloud services.
One of the core building blocks of Azure is a resource, which represents a deployable unit such as a virtual machine, a SQL database, a Hadoop cluster or a Kubernetes cluster. Resources that logically belong to one workload or an application are placed in a resource group. Resource groups are part of an Azure subscription, which provides the highest level of isolation for resources.
Each Azure resource is identified by the subscription, the resource group and finally the identifier associated with the resource itself. This address scheme helps the Azure control plane uniquely identify the resources and manage their lifecycle.
Azure services, such as virtual machines and database instances, are handled by individual resource managers. They are directly responsible for the lifecycle management of resources. When Microsoft adds a new cloud service, it all starts with a resource manager who knows how to deal with the creation, update, and termination of individual resources. The Azure Resource Manager (ARM) templates provide a declarative mechanism for defining and provisioning cloud services.
For more context and additional background on ARM, read my article from 2016.
The resource managers associated with each managed service are registered with the Azure fabric controller, which acts as the control plane that exposes the API. The Azure Portal, CLI, SDKs and third-party tools such as Terraform and Pulumi talk to this API.
Radius heavily borrows these concepts from the Azure fabric controller for its control plane. If Azure fabric controller treats services virtual machines and Kubernetes clusters as resource providers, Radius treats Kubernetes API, Azure Resource Manager API, and AWS Cloud Control API as resource providers.
When an application definition points to resources running in a Kubernetes cluster with services running in either Azure or AWS, the control plane federates the call and delegates the resource management to each of the resource providers.
Each resource provider registered with the UCP is responsible for performing CRUDL (Create, Read, Update, Delete, List) operations on its resources. The control plane simply abstracts the underlying runtimes and exposes a unified API for the CLI and tools like Bicep to talk to it.
Microsoft called the control plane of Radius the Universal Control Plane (UCP). This is apt, given that the control plane is capable of subsuming any resource provider. For example, when Google Cloud is added as a supported runtime, the GCP resource provider gets registered with the UCP and the lifecycle of GCP-based cloud services will be managed by the dedicated resource provider. If a resource provider becomes available for KVM, Radius can even provision VMs.
Back in 2020, I wrote about how Kubernetes is evolving as a universal control plane. Radius is a classic example that implements this idea.
Coming back to Radius UCP, it’s completely open source and written in Go as an extensible control plane.
Resources provisioned and managed by UCP have a universal addressing mechanism based on the target environment. For example, when dealing with Azure, the identifier will include the subscription, resource group, and resource. Similarly, AWS resources are identified through Amazon Resource Names (ARNs). The local provider maps to the environment and the application name as defined in the Bicep or Terraform file. This approach not only removes the burden of UCP from managing the resource identifiers but also relies on the resource provider for naming and addressing individual resources.
When you install Radius on a Kubernetes cluster and check for available resource providers, you see the local-dev, AWS, and Azure providers registered with the UCP.
Below are examples of resource identifiers provisioned in a local Kubernetes cluster, Azure and AWS through Radius UCP.
It’s important to note that the UCP federates the CRUDL call to the respective resource provider, which also exposes a scalable REST API. Resource providers are designed to accept traffic only from UCP. This ensures authentication and authorization are in place.
Finally, Radius has tight integration with Dapr, which makes it easy to develop multicloud applications without having to tightly couple with the cloud services. To get started with Dapr, refer to my tutorial. Dapr is one of the resource providers of Radius, which makes it easy for the operators to swap the building blocks at deployment time.
As Kubernetes gets more complex and the gap between cloud native apps and cloud services continues to widen, DevOps needs platforms such as Radius.
Radius simplifies the workflow and lifecycle management of workloads and their dependencies so developers can concentrate on their apps. It brings the apps running in a Kubernetes cluster closer to the managed services available in the public cloud. This means developers can target a unified runtime without worrying about the implementation details of containerized workloads and cloud services.