How to Battle Azurescape, the Cross-Account Container Takeover Exploit

One of the nightmare scenarios of cloud computing has always been an attacker being able to break out of their containers into other users’ containers. Well, lucky us, it’s finally happened.
Palo Alto Networks’ Unit 42 researchers uncovered an interlocked exploit chain that allows a malicious Azure user to invade other users’ cloud instances within Microsoft’s container-as-a-service (CaaS) offering, Azure Container Instances (ACI). Unit 42 labeled this exploit Azurescape. Principal Unit 42 security researcher Yuval Avrahami also called it “the first cross-account container takeover in the public cloud.” I call it a disaster.
The Fix Is in
Before you get too wound up about Azurescape, the fix is already in. Microsoft patched ACI shortly after Unit 42 released the news. There have also been no reports of Azurescape being exploited in the wild. Yet.
Microsoft said, “Out of an abundance of caution we notified customers with containers running on the same clusters as the researchers via Service Health Notifications in the Azure Portal. If you did not receive a notification, no action is required with respect to this vulnerability.”
Ah… that’s not good enough. The question remains: “What if someone other than Unit 42 uncovered these holes?” That would be bad news with a capital B.
Microsoft Security Recommendations
So, even if Microsoft didn’t warn you, I’d use their security recommendations. These are:
- Revoke any privileged credentials that were deployed to the platform before Aug. 31, 2021. Common places to specify configuration and secrets for container groups include:
- Environment Variables
- Secret Volumes
- Azure file share
- Consult these security best practices resources
- As part of standard security practices, you should revoke privileged credentials on a frequent basis.
- Stay up to date on important security-related notifications like this one by configuring Azure Service Health Alerts.
Unit 42 adds that you should check your access logs for any irregularities. I quite agree. And, if you find any, regardless of the incident date, I’d reset privileged credentials.
Specifically, this problem only impacts ACI based on Kubernetes clusters. In 2021, Microsoft began hosting ACI on Service Fabric Clusters as well. Today, Unit 42 estimates Azure uses Kubernetes to host around 37% of newly created ACI containers.
How Unit 42 Went to Work
Now, you may be wondering since CaaS security is all about stopping this kind of thing from ever happening how this ever occurred in the first case. Unit 42 did it by creating WhoC, a container image that reads the container runtime executing it. This relies on a design flaw in Linux containers, which enables them to read the underlying host’s container runtime. Sound familiar? It should. A similar technique was used to break out of Docker via runC in CVE-2019-5736.
The key security hole was that ACI was using an out-of-date version of runC, the industry-standard container runtime. Gosh. Using obsolete software as a fundamental layer in your stack, where haven’t we heard of this kind of mistake before?
In this case, Microsoft was still using runC v1.0.0-rc2. This almost five-year-old version was already known to be vulnerable to at least two container breakout CVEs. There have been over 15 runC versions released since then. And, at long last, runC finally had a stable 1.0 release in June 2021. Since then, there have been two other releases. In short, Microsoft should have replaced this dangerously obsolete version years ago.
With this, Unit 42 was able to easily break out of the container to the underlying host, a Kubernetes node. This, while dangerous, wasn’t that bad. The attacker would still be within the node virtual machine (VM).
Further checking out the environment revealed that CVE was running a variety of Kubernetes versions: Kubernetes v1.8.4, v1.9.10, and v1.10.9. Does that ring any alarms? It should. These are older versions, released between November 2017 and October 2018. And guess what? They’re all vulnerable to multiple known vulnerabilities.
One of these, CVE-2018-1002102, enables an attacker to play games with how the API-server communicated with Kubelets. This, in turn, enables a malicious Kubelet to be spread within the cluster. That in itself didn’t turn into a true security vulnerability. But, while exploiting it, the researchers found an Authorization header carrying a Kubernetes service account token. These are unencrypted JSON Web Tokens (JWTs), so they’re vulnerable.
With other users’ JWTs in hand, an attacker could pretend to be the owner of other accounts. And, adding insult to injury, properly exploited you use these tokens to execute commands on any pod in the cluster — including the API-server pod!
And, that, my friends is that. You can now act as cluster admins with full control over the cluster and all its customer containers.
But wait! There’s more! The crew also found you could achieve the same results by using a server-side request forgery (SSRF) vulnerability in the bridge pod. The trick here is that the API-server doesn’t actually verify that the status.hostIP value is valid. That’s bad. What was far worse was the API-server would also accept any string — including URL components. After some fiddling, the team came up with a hostIP value that would trick the bridge into executing a command on the API-server container, instead of our container. And, once again, from there, it was a simple step to becoming the de facto cluster administrator.
So, What Can You Do?
Well, first, while it appears as if no real harm is being done using these vulnerabilities, you must make darn sure that your cluster infrastructure is kept up to date with the latest security patches.
That’s security 101, but as is so often the case, people are not doing it. You also can not ever assume that Azure, or Amazon Web Services (AWS), Google Compute, whoever is running your cloud for you is actually on the job. They’re not. Yes, I know you’re paying for the services so you don’t have to worry about these details. Tough. It’s still your job whether you like it or not.
In addition, Unit 42 also recommends that you:
- Refrain from sending privileged service accounts tokens to anyone but the API server. If a recipient is compromised, an attacker can masquerade as the token owner.
- Enable BoundServiceAccountTokenVolume. This recently graduated feature gate ensures token expiration is bound to its pod. When a pod terminates, its token is no longer valid, minimizing the impact of token theft.
- Deploy policy enforcers to monitor and prevent suspicious activity in your clusters. Configure them to alert service accounts or nodes that query the SelfSubjectAccessReview or SelfSubjectRulesReview APIs for their permissions. Prisma Cloud customers can download a relevant rule template and enforce it via the built-in admission control for Kubernetes. We recommend setting the rule to Alert. Others can rely on open source tools such as OPA Gatekeeper.
Yes, it’s a lot of work. But would you rather do the work now? Or explain to your customer or CEO why their precious data was just vacuumed out or their compute time was spent on Bitcoin mining instead of the job? I know which one I’d prefer.