Microsoft Azure Brings Confidential Computing to Kubernetes

There are plenty of solutions for protecting data at rest and in motion; protecting data while you’re using it is less common. Last year Microsoft introduced a Kubernetes SGX plugin to support “confidential computing” — running workloads like NGINX, Redis Cache and MemCache that were built to use trusted execution environments, or your own apps written with its open source Open Enclave SDK, which supports both Intel SGX and Arm TrustZone in encrypted memory.
“Through this device driver plugin, we’re binging a level of security assurance down to the chip level that you just can’t get with a software-based solution, director of Azure Compute Gabe Monroy told the New Stack at the time. “This is all about getting code and data effectively encrypted in a way that protects it not just within the operating system but so that even the cloud providers can’t peek into it.”
At the time, that required creating a Kubernetes cluster on a VM that supported Intel SGX (in Azure or on your own hardware) and installing the confidential computing device plugin, which exposed the Encrypted Page Cache RAM as a resource Kubernetes can schedule. There are a limited number of enclaves on each CPU, so the Kubernetes scheduler plugin is needed to make sure a pod that needs an enclave lands on a node that has an enclave available.
“One of the things we needed to do is actually teach Kubernetes about how many enclaves are available and that sort of thing, so that you could schedule properly and it could find the places where there are enclaves available,” Kubernetes co-founder and Microsoft Corporate Vice President Brendan Burns told the New Stack.
Now confidential computing is available in AKS, in preview, using the same SGX Device Plugin to make EPC memory a resource type that you specify and an SGX Quote Helper service that attests to the identity and state of the application and the environment it’s running in. You can add confidential computing nodes — based on DCsv2-Series VMs — to a new or existing AKS cluster alongside standard node pools. These VMs are available in a limited number of Azure regions, (with wider availability expected by the end of 2020); they run on Intel XEON E-2288G processors with SGX (with up to eight vCPUs and 168MiB of encrypted memory per VM) and are not shared with other tenants or subscriptions. You need at least six DSCv2 cores.
The container application executes directly on the CPU, without a guest OS or hypervisor to deliver and attestation to run enclave-aware containers with process-level container isolation for applications that have trusted code (also known as an enclave) that runs in the encrypted memory. As well as the Open Enclave SDK, there are several commercial and open source tools to help developers develop apps to take advantage of confidential computing, like Fortanix, SCONE, Anjuna, Graphene and Occlum.
Currently confidential computing in AKS only supports Linux containers and Ubuntu 18.04 Gen 2 VM worker nodes and during the preview you have to use the Azure CLI for deployment rather than the Azure portal.
Making Confidential Common
Confidential computing isn’t yet at the stage where it will appeal to mainstream cloud users, but bringing it to Kubernetes is a sign that the platform is maturing in terms of the workloads it supports. “Customers are saying our most secure workloads, our most privileged workloads are coming to Kubernetes and containers and cloud native computing,” Burns said.
Confidential computing currently suitable for big data analytics with data owned by several organizations, for secrets and key management, for blockchain and for confidential microservices in regulated industries like financial, health and government, he suggested. But as the technology matures, it’s going to become more widely relevant.
The U.S. Defense Department JEDI contract has certainly pushed Microsoft to add more security options to Azure but Burns noted that expectations about the level of security the cloud can provide just keep ratcheting up.
“When we started AKS, the API servers were on the public internet but it became clear that you have to have API servers inside the private network and only inside the private network,” he pointed out. (AKS now works with Azure Private Link).
“It used to be that the cloud said ‘public networks — just deal with it and if you don’t like it, fine, go do something else.’ Now we supply private networks and private API endpoints and I think confidential computing is the next thing where we’re saying ‘We’re going to put our super privileged, our high-value intellectual property in a privileged container.’ But I think over time, we’re going to find is that it just becomes standard, where everybody just runs everything in it because that’s the way they think it needs to be.”
Getting to a point where encrypting memory in use the way storage is now routinely encrypted at rest will require further developments, he noted. “We’ve done a lot of work collaborating with Intel around enclaves and around enclave support for containers, but really this is the beginning of evolving confidential computing and cloud native computing.” Hardware needs to develop further, for example having one CPU support multiple enclaves; Intel is also looking at encrypting more or even all memory in a system. And there’s more work to be done in the Kubernetes community until it can be the default.
“I think there’s too much opt-in right now for confidential computing; it’s not automatic enough. I can’t just tick the box next to my container and have it be all confidential. I have to actually enlighten my code, to know that it’s running in an enclave. That’s great for the specific use cases where people need that now. But I think we’re going to get to a place where everybody just wants to tick the tick box and just know that their entire container is running securely.”
This is part of moving Kubernetes form being hardware agnostic to taking advantage of the full range of hardware and become the default “cloud native infrastructure.”
“In the initial draft of Kubernetes we had really generalized resources like CPU or memory and we’ve actually called it logical compute. We sort of said you shouldn’t care about the CPU or the GPU or instruction sets. But we’ve had people come to us and say ‘I wrote code for this specific instruction that’s in this class of Intel Xeon and only this class of Xeon so I need to know what process is running on the node and schedule accordingly.’ There are all sorts of this hardware where you have interesting things like bonded GPUs, where you have two GPUs that have a really fast interconnect like InfiniBand: Azure has a whole bunch of InfiniBand. Not to mention Arm: everyone built their Raspberry Pi Kubernetes clusters and then suddenly we had to deal with the fact that, the containers wouldn’t run on some hardware. It becomes a very multidimensional scheduling problem.”
In a sense, Burns suggested, Kubernetes and AKS are ‘the new VM’: “It’s no longer a specialized thing that some customers are going to use; it’s the thing that the vast majority of new application development is headed towards and as such, it’s going to need to have every single feature. Every single feature that we add to a VM has to be in the Kubernetes service and has to be available to users. If there’s a processor family, we have to support it. If there’s FPGA we have to support it.”
Azure is the first public cloud with confidential computing support for Kubernetes, but others are also adopting the standard. Google also has confidential computing VMs, but they use the Secure Encrypted Virtualization Technology in AMD 2nd Gen EPYC processors rather than Intel SGX: when GKE 1.18 is released, Confidential GKE Nodes will be available in beta (likely in October).
Policy and Access
Other security and policy improvements to AKS will be broadly useful. Organizations using Azure are likely already using Azure AD for identity and access management (for both users and services). Now that AKS supports Azure RBAC, which includes managing access to Kubernetes resources with the same role assignments in the same portal: one reason the team did the work to make AKS resources visible in the main Azure portal.
Those roles manage who can access a cluster and the resources already on it, but not what resource they can create there. That’s where AKS can draw on Microsoft’s expertise with enterprise and policy management and bring that to the open source community, Burns suggested. Kubernetes Policy is already GA on AKS, not just because it’s based on Gatekeeper, the validating admission controller webhook that Microsoft donated an implementation of to the Open Policy Agent project at the Cloud Native Computing Foundation, but because it’s needed to help customers move off Pod Security Policy.
This was a rare example of a Kubernetes feature that won’t be progressing from beta. “People started using it in beta because Kubernetes has had a steady track record of beta to 1.0 releases, but it was decided that Pod Security Policy is never going to leave beta,” Burns said. “Because the work that we did in Azure Policy ,that the community picked up as the Gatekeeper project, is actually more general purpose than Pod Security Policy was, they’re getting rid of Pod Security Policy and they’re going to replace it with the Open Policy Agent driven policy. We have a bunch of customers who took a dependency, so we needed to get the admission policy admissions controller out to GA so that we could get customers migrated over to it before the other service was deprecated.”
That kind of support is one of the key reasons for using a cloud Kubernetes service rather than running your own infrastructure. “It’s nice to say ‘beta’ but the minute customers take a dependency on it, you have to support it,” Burns said.
The Cloud Native Computing Foundation is a sponsor of The New Stack.
Feature image by PIRO4D from Pixabay.