Ubuntu Linux and Azure DNS Problem Gives Azure Fits

For over three years now, Linux, not Windows Server, has been the most popular virtual machine (VM) operating system on Microsoft Azure. And, of all the Linux distributions used on the cloud, Canonical Ubuntu has long been the most popular. Alas, this is not a “Yea, for Linux story!” It’s the opposite. Even Linux has its fair share of problems, and in the latest, a recent DNS update in Ubuntu 18.04 has led to Azure VMs failing. Lots of Azure VMs failing.
The trouble began at 06:00 UTC on Aug. 30, 2022, the problem lasted until August 31. Now, that it’s history, we can only hope we won’t see a repeat, while bearing in mind, that clouds, like any other technology, will fail from time to time.
Heart of the Problem
The heart of the problem is when a security patch, systemd 237-3ubuntu10.54, was made to Ubuntu 18.04 instances, it made them unable to resolve DNS queries. This, of course, broke networking, and that was that. Repeat after me. It’s always DNS.
The fixed CVE-2022-2526, a systemd use-after-free memory vulnerability in how systemd handled DNS packets. Left unrepaired, this is a high-level security problem that could shut down systems and obtain root-level privileges. Besides Ubuntu 18.04, the security hole and its fix are also present in Red Hat Enterprise Linux (RHEL) 7 and 8.x and Debian Linux.
So, you ask, why isn’t this problem showing up everywhere and in all kinds of clouds? It’s because Microsoft Azure has a specific netplan setup, an Ubuntu-specific way of setting up cloud networking, that uses the “driver” match to set up networking. If a udevadm trigger is executed, the pair that contains this info is lost. Then, the next time netplan is executed, the server loses its DNS information. In short, the blame doesn’t all fall on Ubuntu. Azure should also get its fair share.
How to Fix It
That said, the 64-bit question is: “How do I fix it?” There are several answers:
You can, of course, just reboot your instances. This will give your revived VM a fresh DHCP lease and new DNS resolvers.
Microsoft has also deployed an auto-remediation for Azure Kubernetes Service (AKS) clusters. But, and it’s a big bit, some AKS nodes aren’t covered by the auto-remediation detection, and so they’re not being fixed.
The moral of the story, as always, is to always check DNS for when anything goes wrong. Yes, I’m quite serious. And even simple fixes on complex cloud-based systems can lead to very complicated problems. So, always be careful about production systems when you patch them, even for the most minor problems.