With Unstable Meltdown Patches You Might Want to Consider Detection Instead

It’s been three weeks since serious vulnerabilities were announced in modern CPUs and the problems are far from being resolved. The patches, both the software-based and hardware-based ones, have caused instability on some systems, raising the question of whether it’s best to err on the side of caution and choose detection over patching.
As a quick reminder, there are three separate attacks that stem from an underlying performance-enhancing feature of modern CPUs called speculative execution where the CPU guesses in advance the most likely execution path a program will take so it can return results much faster.
One attack variant is called Meltdown and allows userspace applications to read protected kernel memory that could include secrets, such as passwords, encryption keys and other sensitive information. Meltdown affects Intel CPUs and some ARM ones, and is mitigated by making significant changes to how operating systems isolate kernel from userspace memory.
Another more generic attack is called Spectre and comes in two variants, at least so far. It allows local attackers to trick applications into leaking secrets from their own memory and can be exploited on almost all CPUs. Fixing Spectre requires recompiling software with new protections, as well as making low-level changes to how CPUs behave through microcode updates.
Patch Troubles
First of all, it’s worth noting that both the Meltdown and Spectre patches will have a performance impact and while that impact might not be noticeable for most tasks on consumer PCs, certain server workloads could be significantly affected; especially those that involve a lot of I/O activity, such as database operations. Furthermore, the performance impact of the patches will generally be higher on older generation CPUs.
But if there’s anything that scares data center operators more than performance degradation, it’s server instability. And the Meltdown and Spectre patches have not been been without problems in that regard.
Last week, Intel confirmed that the microcode updates it released on Jan. 9 to fix Spectre variant 2 caused reboot issues on datacenter and consumer systems powered by Haswell and Broadwell CPUs; though there are also user reports of instability with other CPU families. The company believes it has isolated the root cause and has developed an improved fix that has been shared with system manufacturers for testing.
“The whole IBRS_ALL feature to me very clearly says ‘Intel is not serious about this, we’ll have a ugly hack that will be so expensive that we don’t want to enable it by default, because that would look bad in benchmarks’,” — Linus Torvalds
In the meantime, many large OEMs, including Dell, HP and Lenovo, have stopped distributing the previously released BIOS/UEFI updates that incorporated the buggy CPU microcode from January 9. The vendors now have to provide BIOS downgrade options for affected customers and, after testing the new patch, they’ll have to convince those customers to upgrade their BIOS again, causing more scheduled reboots and downtime.
Fortunately, Linux provides a kernel-based mechanism for loading updated microcode, which allows users to test patches without actually updating the BIOS.
CPUs don’t have non-volatile memory, so whatever instructions are written in their silicon cannot be changed once the chips leave the factory. This means that any subsequent microcode patch has to be re-applied after every system reboot and the earlier that happens in the boot process, the better. That’s why the preferred method for microcode updates is through the system BIOS, as the code executes before the operating system.
However, since computer manufacturers are highly inconsistent with the frequency of their BIOS updates and motherboards, in general, have short support lives, some operating systems like Linux provide an alternative way to apply CPU microcode patches early in the kernel initialization process. This ensures that users of older systems can also benefit from the latest patches and bug fixes for their CPUs and makes microcode rollbacks easier if something goes wrong.
The problem is that even if Intel fixes the recent reboot issues, some people are not happy with the company’s overall approach to fixing Spectre. Linus Torvalds strongly criticized the new CPU “features” added by Intel to mitigate branch target injection, the flaw known as Spectre variant 2.
“All of this is pure garbage,” he commented to proposed kernel patches that attempt to use of the new branch speculation control mechanism added in Intel’s CPU microcode. “Is Intel really planning on making this shit architectural? Has anybody talked to them and told them they are f*cking insane?”
“The whole IBRS_ALL feature to me very clearly says ‘Intel is not serious about this, we’ll have an ugly hack that will be so expensive that we don’t want to enable it by default, because that would look bad in benchmarks’,” Torvalds said in a follow-up message on the Linux kernel mailing list. “So instead they try to push the garbage down to us. And they are doing it entirely wrong, even from a technical standpoint.”
The Meltdown patches, which only require OS-level fixes, haven’t gone exactly smooth for Linux users either and their stability on older kernels has been questioned.
Meltdown is fixed through a new kernel feature called KPTI (Kernel Page Table Isolation) that originally started out under the name KAISER as a method of strengthening Kernel Address Space Layout Randomization (KASLR) against attacks. KASLR is a security feature present in all modern operating systems, in one form or another, and its goal is to randomize kernel memory addresses to make it much harder to achieve arbitrary code execution by exploiting memory corruption vulnerabilities.
The problem is that KPTI was intended as a new feature for the latest kernel version and was engineered under that premise. Later, when news of the Meltdown flaw came along, kernel developers realized the new feature they were already working on also happened to mitigate the CPU vulnerability. However, it also meant that KPTI would have to be backported to older, long-term support kernel versions, to protect them against Meltdown as well.
The patches backported to kernels prior to 4.14 are derived from a rather old KAISER version and don’t match what the 4.14 and 4.15 kernels do, Linux kernel hacker Andrew Lutomirski said on Hacker News earlier this month. This means they will have bugs, some of which won’t get patched because there’s minimal support upstream for these backported patches, he said.
Over time, the upstream KPTI version and the backported variant will diverge even more and improvements won’t get backported, which will cause additional problems down the road when other low-level architectural changes will need to be backported.
“At least some versions of ‘KAISER,’ on meltdown-affected hardware, expose the kernel stack to userspace,” Lutomirski said. “If that’s not usable for rooting a box, I’ll eat my hat. KPTI doesn’t have this problem.”
This means that the best course of action is to use the latest stable version of the Linux kernel — currently 4.14 — which is also likely to get the Spectre-related patches that are still being worked on. However, many users will be stuck with whatever kernel version their Linux distribution provides, so upgrading is not really an option for many of them.
Long-Term Uncertainty
Even if all of the current issues with the CPU microcode and the OS-level patches get resolved, there are no guarantees that things will calm down anytime soon when it comes to the fallout of speculative execution.
The researchers who found Spectre noted in their paper that the two variants identified so far are most likely not the only ways to exploit this CPU feature. This means that future research will probably uncover additional attack methods and variations that might require additional patches.
Ultimately, speculative execution needs to be re-engineered, if not completely removed from future generations of CPUs, possibly at a significant cost to the performance we’re used to. But even that won’t guarantee that the industry won’t go through this painful process again because processors have many other features that can hide flaws. Meltdown and Spectre might have opened a can of worms.
“Attacks against hardware, as opposed to software, will become more common,” renowned cryptographer and security expert Bruce Schneier said in a recent essay published in The Atlantic. “Last fall, vulnerabilities were discovered in Intel’s Management Engine, a remote-administration feature on its microprocessors. Like Spectre and Meltdown, they affected how the chips operate. Looking for vulnerabilities on computer chips is new. Now that researchers know this is a fruitful area to explore, security researchers, foreign intelligence agencies, and criminals will be on the hunt.”
Focusing on Detection
The reality is that many organizations won’t be able to deploy the Meltdown and Spectre patches on all of their systems because they have old applications that need old operating systems to work and can’t easily be upgraded. For other companies, the performance hit to their workloads will be unacceptable, so they’ll simply decide to take the risk of not patching and isolate those systems as much as possible.
“If a workload seems unlikely to be practically exploitable without other major failures, detection could certainly be preferable,” researchers from Linux infrastructure security firm Capsule8 said in a blog post. “However, we feel that even in cases where there’s more legitimate risk, detection is still a decent alternative, especially if a response can be automated. For instance, it should often be feasible to detect and shut down an offending process before sensitive information is fully exposed.”
Both Meltdown and Spectre leak information by monitoring the CPU’s cache access times to reconstruct the data stored there, even the transitory one that results from speculative execution and should otherwise be discarded. This exploitation technique is known as a side-channel attack and involves the attacker putting the CPU cache in a known state and then measuring the time of operations to determine changes in the cache’s state.
Capsule8 has built a detection strategy using the Linux Perf subsystem that can spot attempts to exploit Meltdown and Spectre with lower performance impact than the actual patches. The detector’s code has been released under the Apache 2.0 license and works with Capsule8’s open-source monitoring sensor.
“This is a very low-impact way to continuously calculate and monitor the cache miss rate on an entire system,” the researchers said. “In our testing, running this detection consumes an average of 3 percent CPU on one core, peaking at 10 percent, during our simulated CPU and cache intensive workloads.”
“Every company will have their own risks in terms of operational risk versus security risk,” researchers from vulnerability management firm Qualys, said in a blog post. “For that reason, Qualys believes that it may be better for some organizations not to patch and instead use a different compensating control to mitigate exposure as much as possible.”
At this point, many network security and antivirus companies should have detection signatures for the publicly-released Meltdown and Spectre proof-of-concept exploits, and it’s also worth keeping in mind that in order to execute these attacks, hackers need to first be able to execute malicious code on vulnerable systems. The chance of that happening can be reduced with various other security controls.
Photo by Luke Flynt on Unsplash.