Developers are Buzzing on Fuzzing
It’s important to find vulnerabilities in your code — because if you don’t, attackers will. To find these vulnerabilities, there are a number of security testing techniques available (each with pros and cons) including static code analysis, pen testing or even manual code auditing. However, another security testing technique called fuzzing has been getting a lot of attention recently due to its benefits, vast improvements over time, and ease of use.
The Basics of Fuzzing
Fuzzing is a technique that sends randomized input to a program and monitors its behavior (typically for crashes or assertion failures). Fuzzing is interesting because it finds both security and functional bugs. For example, Google has used fuzzing to find 50,000 bugs to date. Out of those, more than 13,000 bugs were security vulnerabilities.
Although there are many different ways to fuzz, the basic idea is really simple. You create a lot of inputs, throw it at the program and see what it does and if there is a problem. If you’re using it to find security vulnerabilities, you want to prefer inputs that are more likely to trigger vulnerabilities and more likely to cover new code.
Fuzzing is continuous by design. Over time, it can hit roadblocks and stop finding new code coverage and new bugs (that’s why it’s important for developers to be continuously improving the fuzzer). Developers have to constantly be involved in understanding and figuring out how their fuzzers are evolving while maintaining the issues where the fuzzers are crashing.
How Fuzzing Is Used Today: Black-box and Coverage-Guided Fuzzing
The main fuzzing approaches in use today are black-box and coverage-guided fuzzing, both having advantages in different situations. In black-box fuzzing, someone writes a format-aware script to generate inputs to the program being fuzzed (aka the target) to help the fuzzer deeply analyze it. In contrast, coverage-guided fuzzers (such as libFuzzer, AFL and AFL++) monitor the internal execution paths of a target during fuzzing to try to automatically determine future inputs to it. Blackbox fuzzing is commonly used by security researchers and tends to find more bugs than the coverage-guided fuzzers. However, it takes effort and expertise to write good black-box fuzzers — coverage-guided fuzzers automate creating interesting inputs, and while the automatically-found inputs may not be as good, coverage-guided fuzzers tend to be easier to use. Coverage-guided fuzzers tend to be used more by developers on programs that they’re maintaining. Coverage-guided fuzzers are more like unit tests a developer writes to test their own project.
What’s really interesting is using techniques from black-box fuzzing and combining them with coverage-guided fuzzing. In this combined technique, fuzzers have an idea about the format they are fuzzing and get feedback from the target they are fuzzing. The most common application of this approach is with a tool called libprotobuf-mutator, which gives users more control of which parts of the program they want to fuzz with coverage-guided fuzzing.
One coverage-guided fuzzer worth additional discussion is AFL++. AFL (American Fuzzy Lop) was the first coverage-guided fuzzer, found many vulnerabilities and was actively used — but after the original AFL developer left Google, AFL wasn’t maintained. Members from AFL’s active community forked AFL to create AFL++, which is actively developed, however. AFL++ is continuously improved as the community is constantly testing and adding the latest improvements that come out of academia as well as implementing their own new ideas. Today, Google has swapped out AFL with AFL++ in ClusterFuzz and supports the development of AFL++ through funding efforts.
ClusterFuzz: Making Fuzzing Easier
Fuzzing has become simpler to do over the years. Today, by using ClusterFuzz, developers just have to write the fuzzer unit test once, and it automatically links to each of the fuzzing engines (e.g. AFL++ or libFuzzer). Developers don’t have to understand how fuzzing engines work, and they get the benefits of using different engines for free. All developers have to do is write 20 to 50 lines of fuzzer tests, and the rest of the fuzzing life cycle is fully automated. From generating the right set of builds with the right sanitizers and compiling these fuzzing targets, to finding bugs and de-duplicating them, ClusterFuzz assigns bugs to the developers and later even verifies the patches that developers create.
ClusterFuzz, which is at the center of this fuzzing infrastructure, automates this end-to-end to simplify the fuzzing workflow.
FuzzBench: Improving Fuzzer Engines
Previously, there was no standard way to evaluate potential improvements to coverage-guided fuzzing. FuzzBench was created to offer a real-world approach to evaluating improvements to fuzzing engines that were being developed in academia and industry. FuzzBench is a free service for researchers where they integrate their fuzzer with a simple API and submit the integration. Then, they are sent an evaluation that compares their fuzzer against other state-of-the-art fuzzers from academia and industry.
FuzzBench is now used in a number of academic papers for evaluation. It has helped researchers evaluate their fuzzing engine improvements. FuzzBench benefits ClusterFuzz users, the researchers who are developing these fuzzing engines and fuzzer users in general who look at results as a guide to which techniques they could be using. Like ClusterFuzz, the fuzzer benchmarking workflow in FuzzBench has been simplified.
All a researcher has to do is just submit their fuzzer code and then it automatically runs at a very large scale against real-world benchmarks that were picked from OSS-Fuzz (a tool which provides continuous fuzzing for open source software). As a result, coverage-guided fuzzing is continuously improving.
Case Study: Envoy Fuzzing
Envoy Proxy is an open source software project initially started by Lyft. Today, many companies are using it in a host of different applications, which makes security testing of it critical. Envoy Proxy is a network proxy, with a large attack surface.
There are three different classes of attacks that we’ve identified using fuzzing. The first is the general query of death — where an input just crashes the proxy. Fuzzing can find these sorts of things because it will crash and then we can find and extract the issue. The second type of problem is performance-related and can cause a denial of service. For example, a time-related issue may appear as a timeout in fuzzers. Sometimes, timeout inputs can be crafted to become an exploit that causes a DoS attack or potentially stack overflows with uncontrolled resource consumption that may cause the proxy to consume tons of memory. A third type of issue is where attacks turn into high severity authentication bypasses that can be identified from a simple fuzz test run with sanitizers.
Fuzz tests are finding issues that normal developers would never write unit tests for — fuzzing can really explore those oversights by reviewers and developers.
If You Don’t Fuzz Your Code, Someone Else Will
Fuzzing is an important tool to have in your toolbox if you are searching for vulnerabilities. In an ideal world, fuzzing should be as ubiquitous and simple as writing a unit test. Coupled with the ease of tools such as OSS-Fuzz and ClusterFuzz, the value of fuzzing is a lot greater than what it appears to be on the surface. Fuzzing has been useful at finding both stability issues and security vulnerabilities alike. It’s productive at finding memory corruption vulnerabilities in programming languages such as C and C++, but it’s even useful in memory-safe languages such as Go and Rust, where they have found denial of service issues or functional correctness issues.
So, fuzz your code because if you don’t, someone else will.