HTTP/2 Brings Rapid Reset Misery
Security bugs are a dime a dozen. Not a week goes by that we don’t see another one. But, usually, they’re fixable. But, then along comes one with the problem that isn’t in the software; it’s in a fundamental standard, and then all hell breaks loose.
I’m referring to the HTTP/2 Rapid Reset (AKA CVE-2023-44487) zero-day, which led to the largest distributed denial of service (DDoS) attacks we’ve ever seen… so far. While the victims — Amazon Web Services (AWS), Cloudflare, and Google Cloud — were able to fight these monster attacks off, the same attacks are more than capable of blowing away less well-prepared websites and services.
That’s because HTTP/2 Rapid Reset’s vulnerability doesn’t spring from a specific software component but from within the specifications of the HTTP/2 webpage delivery network protocol itself. Developed by the Internet Engineering Task Force (IETF) approximately eight years ago, HTTP/2 gave us a faster and more efficient successor to the traditional HTTP protocol. Its widespread adoption, especially in mobile applications, has solidified its role in modern internet infrastructure. Unfortunately, for us, it also gave us a built-in vulnerability.
To understand why this is so, you must understand how HTTP/2 differs from its predecessor, HTTP/1. Cloudflare explains this in great detail in HTTP/2 Rapid Reset: deconstructing the record-breaking attack,
The Reader’s Digest version is that all versions of HTTP share the same HTTP Semantics. That is the overall architecture, terminology, and protocol, such as request and response messages, methods, status codes, header and trailer fields, message content, and so on. Each individual HTTP version defines how these Semantics are transformed into a “wire format” for transmission and reception over the Internet.
With HTTP/1, the wire format is quite simple. It uses a serialized stream of ASCII characters, sent over a reliable transport layer, typically TCP. Request and response messages are exchanged. While a single TCP connection can exchange multiple requests and responses, in HTTP/1, each message must be sent as a whole in a strict order. This means these messages are exchanged serially and can not be multiplexed.
The Complexities of HTTP/2
HTTP/2 is much more complicated. In it, each HTTP message is serialized into a set of HTTP/2 frames. These identify the type, length, flags, stream identifier (ID), and payload of every message. The stream ID makes it clear which bytes on the wire apply to which message. With this, you safely multiplex messages for greater speed while maintaining. concurrency. Improving performance even further, in HTTP/2, streams are also bidirectional.
But this performance boost comes with a price. It’s possible to overwhelm a server with multiple HTTP requests. To prevent this from happening, you can set the maximum number of active concurrent streams with the SETTINGS_MAX_CONCURRENT_STREAMS setting. HTTP/2 streams also have a lifecycle that should help, in theory, to protect HTTP/2 from DDoS attacks.
However, HTTP/2 also makes it easier for a client to cancel an in-flight request. AKA, “Hey, Amazon, I don’t need to see that automated kitty litter box page after all.” Instead of tearing down the whole connection, a client can send a RST_STREAM frame for a single stream. When a server gets this message, it stops processing the request and aborts the response. The result? Less load on the server resources and no wasted bandwidth.
But, what happens if you send multiple HTTP/2 cancellation requests one after the other? What if you send so many that you overwhelm the server? Then, my friend, you have the start of a DDoS attack.
For example, if you have, as most people do with a serious server or service, an HTTP/2 proxy or load-balancer in front of the rest of your server and software stack, it’s comparatively easy to overwhelm it with rapid resets.
You see, just because the server or proxy has a maximum number of concurrent requests doesn’t mean that a malicious client can’t flood it with high request rates. Besides, since clients are biased toward speed, they don’t wait around for server settings. The result is a race condition. That’s what happened. As a result, users saw HTTP 502 bad gateway error messages, and HTTP 499 client closed errors. The DDoS attacks were on.
For Cloudflare, which has been remarkably forthcoming on what went wrong, that meant when a client connected to Cloudflare to send HTTPS traffic, it first hits their TLS decryption proxy: This service decrypts the TLS traffic, processes the traffic, and then forwards it to their “business logic” proxy. Well, that’s what is supposed to happen.
This proxy loads all the customer settings and then routes the requests to the proper upstream services. Under the attack, not only was the proxy overwhelmed, but it was not able to report properly on what was happening. Putting salt into the wound, the actual network connections themselves were being flooded by too much traffic.
Cloudflare has mitigated the problem so far by using a variety of different techniques. These include setting the SETTINGS_MAX_CONCURRENT_STREAMS value higher; monitoring connections for abuse of the RST_STREAM frame and blocking them; and tuning “IP Jail” so that IP addresses being used in such attacks are blocked not just from the targeted site, but also from using HTTP/2 to any other Cloudflare-protected domain.
So, is the problem fixed now? Oh no. I wish. As Cloudflare stated, “Because the attack abuses an underlying weakness in the HTTP/2 protocol, we believe any vendor that has implemented HTTP/2 will be subject to the attack. This included every modern web server.”
Let me emphasize that last phrase: “Every modern web server.” That’s not just proper web servers. It’s anything that delivers web services. For example, you’ll also find it in programs built from Microsoft .NET 8.0 RC1, .NET 7.0, and .NET 6.0; the Kubernetes API server; NodeJS; and a host of other servers and programs.
More DDoS on the Way
The good news is that the HTTP/2 Rapid Reset vulnerability only enables DDoS attacks. You can’t use HTTP/2 Rapid Reset to take over servers or steady data. The bad news is the simplicity and low number of machines needed to make these Godzilla-sized attacks Cloudflare estimated the 201 million requests per second (RPS) attack that hit it, which was almost three times bigger than the previous biggest attack in their records, was generated by a mere 20,000 machines.
Yeah, we’re not out of the woods yet by any means. There will be many more HTTP/2 Rapid Reset attacks. And, I expect many of them to be more successful than this first wave. Few companies have the resources to deal with such an assault, as has AWS, Cloudflare, and Google.
Your job for this week, if you haven’t started already, is to go over your entire infrastructure, looking for anything that delivers web services and then updating them to newer, more secure programs. If you can’t get a patch, well, it’s time to consider drastic solutions such as firewalling them from the internet until you can defend your services from these attacks.
Good luck. We’re all going to need it.