Directed Fuzzing seems to be a current hot research topic. This repository aims to provide a curated list of research papers focusing on directed greybox fuzzing (see more directed whitebox fuzzing and miscellaneous).
Click to see the abstract!
Unlike coverage-based fuzzing that gives equal attention to every part of a code, directed fuzzing aims to direct a fuzzer to a specific target in the code, e.g., the code with potential vulnerabilities. Despite much progress, we observe that existing directed fuzzers are still not efficient as they often symbolically or concretely execute a lot of program paths that cannot reach the target code. They thus waste a lot of computational resources. This paper presents BEACON, which can effectively direct a grey-box fuzzer in the sea of paths in a provable manner. That is, assisted by a lightweight static analysis that computes abstracted preconditions for reaching the target, we can prune 82.94% of the executing paths at runtime with negligible analysis overhead (
[ICSE'22] WindRanger: A Directed Greybox Fuzzer driven by Deviation Basic Block
[paper] [project] [talk]
Click to see the abstract!
Directed grey-box fuzzing (DGF) is a security testing technique that aims to steer the fuzzer towards predefined target sites in the program. To gain directness, DGF prioritizes the seeds whose execution traces are closer to the target sites. Therefore, evaluating the distance between the execution trace of a seed and the target sites (aka, the seed distance) is important for DGF. The first directed grey-box fuzzer, AFLGo, uses an approach of calculating the basic block level distances during static analysis and accumulating the distances of the executed basic blocks to compute the seed distance. Following AFLGo, most of the existing state-of-the-art DGF techniques use all the basic blocks on the execution trace and only the control flow information for seed distance calculation. However, not every basic block is equally important and there are certain basic blocks where the execution trace starts to deviate from the target sites (aka, deviation basic blocks).
In this paper, we propose a technique called WindRanger which leverages deviation basic blocks to facilitate DGF. To identify the deviation basic blocks, WindRanger applies both static reachability analysis and dynamic filtering. To conduct directed fuzzing, WindRanger uses the deviation basic blocks and their related data flow information for seed distance calculation, mutation, seed prioritization as well as explore-exploit scheduling. We evaluated WindRanger on 3 datasets consisting of 29 programs. The experiment results show that WindRanger outperforms AFLGo, AFL, and Fairfuzz by reaching the target sites 21%, 34%, and 37% faster and detecting the target crashes 44%, 66%, and 77% faster respectively. Moreover, we found a 0-day vulnerability with a CVE ID assigned in ffmpeg (a popular multimedia library extensively fuzzed by OSS-fuzz) with WindRanger by supplying manually identified suspect locations as the target sites.
[ASIACCS'22] TargetFuzz: Using DARTs to Guide Directed Greybox Fuzzers
[paper]
Click to see the abstract!
Software development is a continuous and incremental process. Developers continuously improve their software in small batches rather than in one large batch. The high frequency of small batches makes it essential to use effective testing methods that detect bugs under limited testing time. To this end, researchers propose directed greybox fuzzing (DGF) which aims to generate test cases towards stressing certain target sites. Different from the coverage-based greybox fuzzing (CGF) which aims to maximize code coverage in the whole program, the goal of DGF is to cover potentially buggy code regions (e.g., a recently modified program region). While prior works improve several aspects of DGF (such as power scheduling, input prioritization, and target selection), little attention has been given to improving the seed selection process. Existing DGF tools use seed corpora mainly tailored for CGF (i.e., a set of seeds that cover different regions of the program). We observe that using CGFbased corpora limits the bug-finding capability of a directed greybox fuzzer. To mitigate this shortcoming, we propose TargetFuzz, a mechanism that provides a DGF tool with a target-oriented seed corpus. We refer to this corpus as DART corpus, which contains only 'close' seeds to the targets. This way, DART corpus guides DGF to the targets, thereby exposing bugs even under limited fuzzing time. Evaluations on 34 real bugs show that AFLGo (a state-of-theart directed greybox fuzzer), when equipped with DART corpus, finds 10 additional bugs and achieves 4.03× speedup, on average, in the time-to-exposure compared to a generic CGF-based corpus.
[S&P'22] Exploit the Last Straw That Breaks Android Systems
[paper] [project]
Click to see the abstract!
The Android system services usually play a critical role in running multiple important tasks, and delivering seamless user experiences, e.g., conveniently storing user data. In this paper, we conduct the first systematic security study on the data storing process in Android system services, and consequently discover a novel class of design flaws (named Straw), which can lead to serious DoS (Denial-of-Service) attacks, e.g., permanently crashing the whole victim Android device. Then we propose a novel directed fuzzing based approach, called StrawFuzzer, to automatically vet all system services against the straw vulnerabilities. StrawFuzzer balances the tradeoff between path exploration and vulnerability exploitation. By applying StrawFuzzer on three Android systems with the latest security updates, we identified 35 unique straw vulnerabilities affecting 474 interfaces across 77 system services and successfully generated corresponding exploits, which can be used to conduct various permanent/temporary DoS attacks. We have reported our findings with suggestions for repairing the vulnerabilities to corresponding vendors. Up to now, Google has rated our vulnerability as high severity.
[ICSE'22] Linear-time Temporal Logic guided Greybox Fuzzing
[paper] [project] [talk]
Click to see the abstract!
Software model checking as well as runtime verification are verification techniques which are widely used for checking temporal properties of software systems. Even though they are property verification techniques, their common usage in practice is in "bug finding", that is, finding violations of temporal properties. Motivated by this observation and leveraging the recent progress in fuzzing, we build a greybox fuzzing framework to find violations
of Linear-time Temporal Logic (LTL) properties.
Our framework takes as input a sequential program written in C/C++, and an LTL property. It finds violations, or counterexample traces, of the LTL property in stateful software systems; however, it does not achieve verification. Our work substantially extends directed greybox fuzzing to witness arbitrarily complex event orderings. We note that existing directed greybox fuzzing approaches are limited to witnessing reaching a location or witnessing simple event orderings like use-after-free. At the same time, compared to model checkers, our approach finds the counterexamples faster,
thereby finding more counterexamples within a given time budget.
Our LTL-Fuzzer tool, built on top of the AFL fuzzer, is shown to be effective in detecting bugs in well-known protocol implementations, such as OpenSSL and Telnet. We use LTL-Fuzzer to reproduce known vulnerabilities (CVEs), to find 15 zero-day bugs by checking properties extracted from RFCs (for which 12 CVEs have been assigned), and to find violations of both safety as well as liveness properties in real-world protocol implementations. Our work represents a practical advance over software model checkers — while simultaneously representing a conceptual advance over existing greybox fuzzers. Our work thus provides a starting point for understanding the unexplored synergies among software model checking, runtime verification and greybox fuzzing.
[thesis] Directing greybox fuzzing to discover bugs in hardware and software - Sadullah Canakci
[paper]
Click to see the abstract!
Computer systems are deeply integrated into our daily routines such as online shopping, checking emails, and posting photos on social media platforms. Unfortunately, with the wide range of functionalities and sensitive information stored in computer systems, they have become fruitful targets for attackers. Cybersecurity ventures estimate that the cost of cyber attacks will reach $10.5 trillion USD annually by 2025. Moreover, data breaches have resulted in the leakage of millions of people’s social security numbers, social media account passwords, and healthcare information. With the increasing complexity and connectivity of computer systems, the intensity and volume of cyber attacks will continue to increase. Attackers will continuously look for bugs in the systems and ways to exploit them for gaining unauthorized access or leaking sensitive information. Minimizing bugs in systems is essential to remediate security weaknesses. To this end, researchers proposed a myriad of methods to discover bugs. In the software domain, one prominent method is fuzzing, the process of repeatedly running a program under test with “random” inputs to trigger bugs. Among different variants of fuzzing, greybox fuzzing (GF) has especially seen widespread adoption thanks to its practicality and bug-finding capability. In GF, the fuzzer collects feedback from the program (e.g., code coverage) during its execution and guides the input generation based on the feedback. Due to its success in finding bugs in the software domain, GF has gained traction in the hardware domain as well. Several works adapted GF to the hardware domain by addressing the differences between hardware and software. These works demonstrated that GF can be leveraged to discover bugs in hardware designs such as processors. In this thesis, we propose three different fuzzing mechanisms, one for software and two for hardware, to expose bugs in the multiple layers of systems. Each mechanism focuses on different aspects of GF to assist the fuzzing procedure for triggering bugs in hardware and software. The first mechanism, TargetFuzz, focuses on producing an effective seed corpus when fuzzing software. The seed corpus consists of a set of inputs serving as starting points to the fuzzer. We demonstrate that carefully selecting seeds to steer GF towards potentially buggy code regions increases the bug-finding capability of GF. Compared to prior works, TargetFuzz discovered 10 additional bugs and achieved 4.03× speedup, on average, in the total elapsed time for finding bugs. The second mechanism, DirectFuzz, adapts a specific variant of GF for software fuzzing, namely directed greybox fuzzing (DGF), to the hardware domain. The main use case of DGF in software is patch testing where the goal is to steer fuzzing towards recently modified code region. Similar to software, hardware design is an incremental and continuous process. Therefore, it is important to prioritize testing of a new component in a hardware design rather than previously well-tested components. DirectFuzz takes several differences between hardware and software (such as clock sensitivity, concurrent execution of multiple code fragments, hardware-specific coverage) into account to successfully adapt DGF to the hardware domain. DirectFuzz relies on coverage feedback applicable to a wide range of hardware designs and requires limited design knowledge. While this increases its ease of adoption to many different hardware designs, its effectiveness (i.e., bug-finding success) becomes limited in certain hardware designs such as processors. Overall, compared to a state-of-the-work hardware fuzzer, DirectFuzz covers specified targets sites (e.g., modified hardware regions) 2.23× faster. Our third mechanism named ProcessorFuzz relies on novel coverage feedback tailored for processors to increase the effectiveness of fuzzing in processors. Specifically, ProcessorFuzz monitors value changes in control and status registers which form the backbone of a processor. ProcessorFuzz addresses several drawbacks of existing works in processor fuzzing. Specifically, existing works can introduce significant instrumentation overhead, result in misleading guidance, and have lack of support for widely-used hardware languages. ProcessorFuzz revealed 8 new bugs in widely-used open source processors and identified bugs 1.23× faster than a prior work.
[Usenix'22] BRAKTOOTH: Causing Havoc on Bluetooth Link Manager via Directed Fuzzing
[paper] [project]
Click to see the abstract!
In this paper we propose, design and evaluate a systematic directed fuzzing framework to automatically discover implementation bugs in arbitrary Bluetooth Classic (BT) devices. The core of our fuzzer is the first over-the-air approach that takes full control of the BT controller baseband from the host. This enables us to intercept and modify arbitrary packets, as well as to inject packets out-of-order in lower layers of closed-source BT stack, i.e., Link Manager Protocol (LMP) and Baseband. To systematically guide our fuzzing process, we propose an extensible and novel rule-based approach to automatically construct the protocol state machine during normal over-the-air communication. In particular, by writing a simple set of rules to identify protocol messages, we can dynamically construct an abstracted protocol state machine, fuzz packets resulting from a state and validate responses from target devices. As of today, we have fuzzed 13 BT devices from 11 vendors and we have discovered a total of 18 unknown implementation flaws, with 24 common vulnerability exposures (CVEs) assigned. Furthermore, our discoveries were awarded with six bug bounties from certain vendors. Finally, to show
the broader applicability of our framework beyond BT, we have extended our approach to fuzz other wireless protocols, which additionally revealed 6 unknown bugs in certain Wi-Fi and BLE Host stacks.
[arxiv'22] Multiple Targets Directed Greybox Fuzzing
[paper] [project]
Click to see the abstract!
Directed greybox fuzzing (DGF) can quickly discover or reproduce bugs in programs by seeking to reach a program location or explore some locations in order. However, due to their static stage division and coarse-grained energy scheduling, prior DGF tools perform poorly when facing multiple target locations (targets for short). In this paper, we present multiple targets directed greybox fuzzing which aims to reach multiple programs locations in a fuzzing campaign. Specifically, we propose a novel strategy to adaptively coordinate exploration and exploitation stages, and a novel energy scheduling strategy by considering more relations between seeds and target locations. We implement our approaches in a tool called LeoFuzz and evaluate it on crash reproduction, true positives verification, and vulnerability exposure in real-world programs. Experimental results show that LeoFuzz outperforms six state-of-the-art fuzzers, i.e., QYSM, AFLGo, Lolly, Berry, Beacon and WindRanger in terms of effectiveness and efficiency. Moreover, LeoFuzz has detected 23 new vulnerabilities in real-world programs, and 11 of them have been assigned CVE IDs.
[arxiv'22] FishFuzz: Throwing Larger Nets to Catch Deeper Bugs
[paper] [project]
Click to see the abstract!
Greybox fuzzing is the de-facto standard to discover bugs during development. Fuzzers execute many inputs to maximize the amount of reached code. Recently, Directed Greybox Fuzzers (DGFs) propose an alternative strategy that goes beyond “just” coverage: driving testing toward specific code targets by selecting “closer” seeds. DGFs go through different phases: exploration (i.e., reaching interesting locations) and exploitation (i.e., triggering bugs). In practice, DGFs leverage coverage to directly measure exploration, while exploitation is, at best, measured indirectly by alternating between different targets. Specifically, we observe two limitations in existing DGFs: (i) they lack precision in their distance metric, i.e., averaging multiple
paths and targets into a single score (to decide which seeds to prioritize), and (ii) they assign energy to seeds in a round-robin fashion without adjusting the priority of the targets (exhaustively explored targets should be dropped).
We propose FishFuzz, which draws inspiration from trawl fishing: first casting a wide net, scraping for high coverage, then slowly pulling it in to maximize the harvest. The core of our fuzzer is a novel seed selection strategy that builds on two concepts: (i) a novel multi-distance metric whose precision is independent of the number of targets, and (ii) a dynamic target ranking to automatically discard exhausted targets. This strategy allows FishFuzz to seamlessly scale to tens of thousands of targets and dynamically alternate between exploration and exploitation phases. We evaluate FishFuzz by leveraging all sanitizer labels as targets. Extensively comparing FishFuzz against modern DGFs and coverage-guided fuzzers shows that FishFuzz reached higher coverage compared to the direct competitors, reproduces existing bugs (70.2% faster), and finally discovers 25 new bugs (18 CVEs) in 44 programs.
[CCS'22] MC2: Rigorous and Efficient Directed Greybox Fuzzing
[paper] [project]
Click to see the abstract!
Directed greybox fuzzing is a popular technique for targeted software testing that seeks to find inputs that reach a set of target sites in a program. Most existing directed greybox fuzzers do not provide any theoretical analysis of their performance or optimality. In this paper, we introduce a complexity-theoretic framework to pose directed greybox fuzzing as a oracle-guided search problem where some feedback about the input space (e.g., how close an input is to the target sites) is received by querying an oracle. Our framework assumes that each oracle query can return arbitrary content with a large but constant amount of information. Therefore, we use the number of oracle queries required by a fuzzing algorithm to find a target-reaching input as the performance metric. Using our framework, we design a randomized directed greybox fuzzing algorithm that makes a logarithmic (wrt. the number of all possible inputs) number of queries in expectation to find a target-reaching input. We further prove that the number of oracle queries required
by our algorithm is optimal, i.e., no fuzzing algorithm can improve (i.e., minimize) the query count by more than a constant factor. We implement our approach in MC2 and outperform state-of-theart directed greybox fuzzers on challenging benchmarks (Magma and Fuzzer Test Suite) by up to two orders of magnitude (i.e., 134×) on average. MC2 also found 15 previously undiscovered bugs that other state-of-the-art directed greybox fuzzers failed to find.
[ACSAC'22] One Fuzz Doesn’t Fit All: Optimizing Directed Fuzzing via Target-tailored Program State Restriction
[paper] [project]
Click to see the abstract!
Fuzzing is the de-facto default technique to discover software flaws, randomly testing programs to discover crashing test cases. Yet, a particular scenario may only care about specific code regions (for, e.g., bug reproduction, patch or regression testing)—spurring the adoption of directed fuzzing. Given a set of pre-determined target locations, directed fuzzers drive exploration toward them through distance minimization strategies that (1) isolate the closest-reaching test cases and (2) mutate them stochastically. However, these strategies are applied onto every explored test case—irrespective of whether they ever reach the targets—stalling progress on the paths where targets are unreachable. Accelerating directed fuzzing requires prioritizing target-reachable paths.
To overcome the bottleneck of wasteful exploration in directed fuzzing, we introduce tripwiring: a lightweight technique to preempt and terminate the fuzzing of paths that will never reach target locations. By constraining exploration to only the set of target-reachable program paths, tripwiring curtails directed fuzzers’ search noise—while unshackling them from the high-overhead instrumentation and bookkeeping of distance minimization—enabling
directed fuzzers to obtain up to 99× higher test case throughput. We implement tripwiring-directed fuzzing as a prototype, SieveFuzz, and evaluate it alongside the state-of-the-art directed fuzzers AFLGo, BEACON and the leading undirected fuzzer AFL++. Overall, across nine benchmarks, SieveFuzz’s tripwiring enables it to trigger bugs on an average 47% more consistently and 117% faster than AFLGo, BEACON and AFL++.
[S&P'23] SELECTFUZZ: Efficient Directed Fuzzing with Selective Path Exploration
[paper] [project]
Click to see the abstract!
Directed grey-box fuzzers specialize in testing specific target code. They have been applied to many security applications such as reproducing known crashes and detecting vulnerabilities caused by incomplete patches. However, existing directed fuzzers favor the inputs discovering new code regardless whether the newly uncovered code is relevant to the target code or not. As a result, the fuzzers would extensively explore irrelevant code and
suffer from low efficiency. In this paper, we distinguish relevant code in the target program from the irrelevant one that does not help trigger the vulnerabilities in target code. We present SELECTFUZZ, a new directed fuzzer that selectively explores relevant program paths for efficient crash reproduction and vulnerability detection. It identifies two types of relevant code—path-divergent code and data-dependent code, that respectively captures the controland data-dependency with the target code. It then selectively instruments and explores only the relevant code blocks. We also propose a new distance metric that accurately measures the reaching probability of different program paths and inputs. We evaluated SELECTFUZZ with real-world vulnerabilities in sets of diverse programs. SELECTFUZZ significantly outperformed a baseline directed fuzzer by up to 46.31×, and performed the best in the Google Fuzzer Test Suite. Our experiments also demonstrated that SELECTFUZZ and the existing techniques such as path pruning are complementary. Finally, with SELECTFUZZ, we detected 14 previously unknown vulnerabilities—including 6 new CVE IDs—in well tested real-world software. Our report has led to the fix of 11 vulnerabilities.
[TDSC'23] G-Fuzz: A Directed Fuzzing Framework for gVisor
[paper] [project]
Click to see the abstract!
gVisor is a Google-published application-level kernel for containers. As gVisor is lightweight and has sound isolation, it has been widely used in many IT enterprises. When a new vulnerability of the upstream gVisor is found, it is important for the downstream developers to test the corresponding code to maintain the security. To achieve this aim, directed fuzzing is promising. Nevertheless, there are many challenges in applying existing directed fuzzing methods for gVisor. The core reason is that existing directed fuzzers are mainly for general C/C++ applications, while gVisor is an OS kernel written in the Go language. To address the above challenges, we propose G-Fuzz, a directed fuzzing framework for gVisor. There are three core methods in G-Fuzz, including lightweight and fine-grained distance calculation, target related syscall inference and utilization, and exploration and exploitation dynamic switch. Note that the methods of G-Fuzz are general and can be transferred to other OS kernels. We conduct extensive experiments to evaluate the performance of G-Fuzz. Compared to Syzkaller, the state-of-the-art kernel fuzzer, G-Fuzz outperforms it significantly. Furthermore, we have rigorously evaluated the importance for each core method of G-Fuzz. G-Fuzz has been deployed in industry and has detected multiple serious vulnerabilities.
[arxiv'23] Directed Greybox Fuzzing with Stepwise Constraint Focusing
[paper]
Click to see the abstract!
Dynamic data flow analysis has been widely used to guide greybox fuzzing. However, traditional dynamic data flow analysis tends to go astray in the massive path tracking and requires to process a large volume of data, resulting in low efficiency in reaching the target location. In this paper, we propose a directed greybox fuzzer based on dynamic constraint filtering and focusing (CONFF). First, all path constraints are tracked, and those with high priority are filtered as the next solution targets. Next, focusing on a single path constraint to be satisfied, we obtain its data condition and probe the mapping relationship between it and the input bytes through multi-byte mapping and single-byte mapping. Finally, various mutation strategies are utilized to solve the path constraint currently focused on, and the target location of the program is gradually approached through path selection. The CONFF fuzzer can reach a specific location faster in the target program, thus efficiently triggering the crash.
We designed and implemented a prototype of the CONFF fuzzer and evaluated it with the LAVA-1 dataset and some real-world vulnerabilities. The results show that the CONFF fuzzer can reproduce crashes on the LAVA-1 dataset and most of the real-world vulnerabilities. For most vulnerabilities, the CONFF fuzzer reproduced the crashes with significantly reduced time compared to state-of-the-art fuzzers. On average, the CONFF fuzzer was 23.7x faster than the state-of-the-art code coverage-based fuzzer Angora and 27.3x faster than the classical directed greybox fuzzer AFLGo.
[S&P'23] ODDFUZZ: Discovering Java Deserialization Vulnerabilities via Structure-Aware Directed Greybox Fuzzing
[paper] [project]
Click to see the abstract!
Java deserialization vulnerability is a severe threat in practice. Researchers have proposed static analysis solutions to locate candidate vulnerabilities and fuzzing solutions to generate proof-of-concept (PoC) serialized objects to trigger them. However, existing solutions have limited effectiveness and efficiency. In this paper, we propose a novel hybrid solution ODDFUZZ to efficiently discover Java deserialization vulnerabilities. First, ODDFUZZ performs lightweight static taint analysis to identify candidate gadget chains that may cause deserialization vulnerabilities. In this step, ODDFUZZ tries to locate all candidates and avoid false negatives. Then, ODDFUZZ performs directed greybox fuzzing (DGF) to explore those candidates and generate PoC testcases to mitigate false positives. Specifically, ODDFUZZ applies a structure-aware seed generation method to guarantee the validity of the testcases, and adopts a novel hybrid feedback and a step-forward strategy to guide the directed fuzzing.
We implemented a prototype of ODDFUZZ and evaluated it on the popular Java deserialization repository ysoserial. Results show that, ODDFUZZ could discover 16 out of 34 known gadget chains, while two state-of-the-art baselines only identify three of them. In addition, we evaluated ODDFUZZ on real-world applications including Oracle WebLogic Server, Apache Dubbo, Sonatype Nexus, and protostuff, and found six previously unreported exploitable gadget chains with five CVEs assigned.
[IET Software'23] CIDFuzz: Fuzz testing for continuous integration
[paper]
Click to see the abstract!
As agile software development and extreme programing have become increasingly popular, continuous integration (CI) has become a widely used collaborative work method. However, it is common to make changes frequently to a project during CI. If existing testing methods are applied to CI directly, it will be difficult to make testing resources focus on changes generated by CI, which results in insufficient testing for changes. To solve this problem, we propose a fuzz testing method for CI. First, differential analysis is performed to determine the change points generated during CI, change points are added to the taint source set, and static analysis is conducted to calculate the distances between each basic block and the taint sources. Then, the project under test is instrumented according to the distances. During fuzz testing, testing resources are allocated based on seed coverage to test the change points effectively. Using the proposed methods, we implement CIDFuzz as a prototype tool, and experiments are conducted on four open‐source projects that use CI. Experimental results show that, compared with AFL and AFLGo, CIDFuzz can reduce the time costs of covering change points up to 39.59% and 41.64%, respectively. Also, CIDFuzz can reduce the time costs of reproducing vulnerabilities up to 34.78% and 25.55%.
[EuroS&P'23] Hunting for Truth: Analyzing Explanation Methods in Learning-based Vulnerability Discovery
[paper]
Click to see the abstract!
Recent research has developed a series of methods for finding vulnerabilities in software using machine learning. While the proposed methods provide a remarkable performance in controlled experiments, their practical application is hampered by their black-box nature: A security practitioner cannot tell how these methods arrive at a decision and what code structures contribute to a reported security flaw. Explanation methods for machine learning may overcome this problem and guide the practitioner to relevant code. However, there exist a variety of competing explanation methods, each highlighting different code regions when given the same finding. So far, this inconsistency has made it impossible to select a suitable explanation method for practical use.
In this paper, we address this problem and develop a method for analyzing and comparing explanations for learning-based vulnerability discovery. Given a predicted vulnerability, our approach uses directed fuzzing to create local ground-truth around code regions marked as relevant by an explanation method. This local ground-truth enables us to assess the veracity of the explanation. As a result, we can qualitatively compare different explanation methods and determine the most accurate one for a particular learning setup. In an empirical evaluation with different discovery and explanation methods, we demonstrate the utility of this approach and its capabilities in making learning-based vulnerability discovery more transparent.
[ISSTA'23] 1dFuzz: Reproduce 1-day Vulnerabilities with Directed Differential Fuzzing
[paper]
Click to see the abstract!
1-day vulnerabilities are common in practice and have posed severe threats to end users, as adversaries could learn from released patches to find them and exploit them. Reproducing 1-day vulnerabilities is also crucial for defenders, e.g., to block attack traffic against 1-day vulnerabilities. A core question that affects the effectiveness of recognizing and triggering 1-day vulnerabilities is what is the unique feature of a security patch. After conducting a large-scale empirical study, we point out that a common and unique feature of patches is the trailing call sequence (TCS) and present a novel directed differential fuzzing solution 1dFuzz to efficiently reproduce 1-day vulnerabilities in this paper. Based on the TCS feature, we present a locator 1dLoc able to find candidate patch locations via static analysis, a novel TCS-based distance metric for directed fuzzing, and a novel sanitizer 1dSan able to catch PoCs for 1-day vulnerabilities during fuzzing. We have systematically evaluated 1dFuzz on a set of real-world software vulnerabilities in 11 different settings. Results show that 1dFuzz significantly outperforms state-of-the-art (SOTA) baselines and could find up to 2.26x more 1-day vulnerabilities with a 43% shorter time.
[Usenix'23] FishFuzz: Catch Deeper Bugs by Throwing Larger Nets
[paper] [project] [artifact]
Click to see the abstract!
Fuzzers effectively explore programs to discover bugs. Greybox fuzzers mutate seed inputs and observe their execution. Whenever a seed reaches new behavior (e.g., new code or higher execution frequency), it is stored for further mutation. Greybox fuzzers directly measure exploration and, by repeating execution of the same targets with large amounts of mutated seeds, passively exploit any lingering bugs. Directed greybox fuzzers (DGFs) narrow the search to a few code locations but so far generalize distance to all targets into a single score and do not prioritize targets dynamically.
FISHFUZZ introduces an input prioritization strategy that builds on three concepts: (i) a novel multi-distance metric whose precision is independent of the number of targets, (ii) a dynamic target ranking to automatically discard exhausted targets, and (iii) a smart queue culling algorithm, based on hyperparameters, that alternates between exploration and exploitation. FISHFUZZ enables fuzzers to seamlessly scale among thousands of targets and prioritize seeds toward interesting locations, thus achieving more comprehensive program testing. To demonstrate generality, we implement FISHFUZZ over two well-established greybox fuzzers (AFL and AFL++). We evaluate FISHFUZZ by leveraging all sanitizer labels as targets. In comparison to modern DGFs and state-of-the-art coverage guided fuzzers, FISHFUZZ reaches higher coverage compared to the direct competitors, finds up to 2.8x more
bugs compared with the baseline and reproduces 68.3% existing bugs faster. FISHFUZZ also discovers 56 new bugs (38 CVEs) in 47 programs.
[arxiv'23] FGo: A Directed Grey-box Fuzzer with Probabilistic Exponential cut-the-loss Strategies
[paper] [project]
Click to see the abstract!
Traditional coverage grey-box fuzzers perform a breadth-first search of the state space of Program Under Test (PUT). This aimlessness wastes a lot of computing resources. Directed grey-box fuzzing focuses on the target of PUT and becomes one of the most popular topics of software testing. The early termination of unreachable test cases is a method to improve directed grey-box fuzzing. However, existing solutions have two problems: firstly, reachability analysis needs to introduce extra technologies (e.g., static analysis); secondly, the performance of reachability analysis and auxiliary technologies lack versatility.
We propose FGo, a probabilistic exponential cutthe-loss directed grey-box fuzzer. FGo terminates unreachable test cases early with exponentially increasing probability. Compared to other technologies, FGo makes full use of the unreachable information contained in iCFG and doesn‘t generate any additional overhead caused by reachability analysis. Moreover, it is easy to generalize to all PUT. This strategy based on probability is perfectly adapted to the randomness of fuzzing.
The experiment results show that FGo is 106% faster than AFLGo in reproducing crashes. We compare multiple parameters of probabilistic exponential cut-the-loss algorithm and analyze them in detail. In addition, for enhancing the interpretability of FGo, this paper discusses the difference between the theoretical performance and the practical performance of
probabilistic exponential cut-the-loss algorithm.
[Usenix'23] DAFL: Directed Grey-box Fuzzing Guided by Data Dependency
[paper] [project] [artifact] [slides]
Click to see the abstract!
Despite growing research interest, existing directed greybox fuzzers do not scale well with program complexity. In this paper, we identify two major scalability challenges for
current directed grey-box fuzzing. Particularly, we find that traditional coverage feedback does not always provide meaningful guidance for reaching the target program point(s), and
the existing seed distance mechanism does not operate well with programs with complex control structures. To address these problems, we present a novel fuzzer, named DAFL. DAFL selects code parts that are relevant to the target location and obtains coverage feedback only from those parts. Furthermore, it computes precise seed distances considering the data-flow semantics of program executions. The results are promising. Out of 41 real-world bugs, DAFL was able to find 4, 6, 9, and 5 more bugs within the given time, compared to AFL, AFLGo, WindRanger, and Beacon, respectively. Furthermore, among the cases where all fuzzers produced a median TTE, DAFL was at least 4.99 times faster on average compared to 3 state-of-the-art directed fuzzers including AFLGo, WindRanger, and Beacon.
[EuroS&PW 2023'23] Guiding Directed Fuzzing with Feasibility
[paper]
Click to see the abstract!
Directed fuzzing is a useful technique that can confirm bugs found by static analysis, reproduce existing bugs, and efficiently test code changes. A general mechanism in directed fuzzing is to calculate the distance between the current progress and the target, and use that as a feedback to guide the directed fuzzing. A fundamental problem with existing distance calculation is that it is feasibility-unaware. For instance, it always assumes that the two branches of an if statement have equal feasibility, which is likely not true in real-world programs and would inevitablly incur significant biases in the directed fuzzing. In this work, we propose feasibility-aware directed fuzzing, AFLGopher. Our new feasibility-aware distance calculation provides precise feedback to guide directed fuzzing to reach targets efficiently. We propose new techniques to address challenges of the feasibility prediction. Our new classification method allows to predict the feasibility of all branches based on limited traces, and our runtime feasibility-updating mechanism gradually improves the prediction precision. We implement AFLGopher, and the evaluation results show that AFLGopher uses less time to reach the bugs, compared to the state of the art.
[CCS'23] HyperGo: Probability-based Directed Hybrid Fuzzing
[paper] [project]
Click to see the abstract!
Directed grey-box fuzzing (DGF) is a target-guided fuzzing intended for testing specific targets (e.g., the potential buggy code). Despite numerous techniques proposed to enhance directedness, the existing DGF techniques still face challenges, such as taking into account the difficulty of reaching different basic blocks when designing the fitness metric, and promoting the effectiveness of symbolic execution (SE) when solving the complex constraints in the path to the target. In this paper, we propose a directed hybrid fuzzer called
HyperGo. To address the challenges, we introduce the concept of path probability and combine the probability with distance to form an adaptive fitness metric called probability-based distance. By combining the two factors, probability-based distance can adaptively guide DGF toward paths that are closer to the target and have more easy-to-satisfy path constraints. Then, we put forward an Optimized Symbolic Execution Complementary (OSEC) scheme to combine DGF and SE in a complementary manner. The OSEC would prune the unreachable branches and unsolvable branches, and prioritize symbolic execution of the seeds whose paths are closer to the target and have more branches that are difficult to be covered by DGF. We evaluated HyperGo on 2 benchmarks consisting of 21 programs with a total of 100 target sites. The experimental results show that HyperGo achieves 38.47×, 30.89×, 28.52×, 106.09× and 143.22× speedup compared to AFLGo, AFLGoSy, BEACON, WindRanger, and ParmeSan, respectively in reaching target sites, and 3.44×, 3.63×, 4.10×, 3.26×, and 3.00× speedup in exposing known
vulnerabilities. Moreover, HyperGo discovered 37 undisclosed vulnerabilities from 7 real-world programs.
[CCS'23] SyzDirect: Directed Greybox Fuzzing for Linux Kernel
Click to see the abstract!
Bug reports and patch commits are dramatically increasing for OS kernels, incentivizing a critical need for kernel-level bug reproduction and patch testing. Directed greybox fuzzing (DGF), aiming to stress-test a specific part of code, is a promising approach for bug reproduction and patch testing. However, the existing DGF methods exclusively target user-space applications, presenting intrinsic limitations in handling OS kernels. In particular, these methods cannot pinpoint the appropriate system calls and the needed syscall parameter values to reach the target location, resulting in low efficiency and waste of resources.
In this paper, we present SyzDirect, a DGF solution for the Linux kernel. With a novel, scalable static analysis of the Linux kernel, SyzDirect identifies valuable information such as correct system calls and conditions on their arguments to reach the target location. During fuzzing, SyzDirect utilizes the static analysis results to guide the generation and mutation of test cases, followed by leveraging distance-based feedback for seed prioritization and power scheduling. We evaluated SyzDirect on upstream Linux kernels for bug reproduction and patch testing. The results show that SyzDirect can reproduce 320% more bugs and reach 25.6% more target patches than generic kernel fuzzers. It also improves the speed of bug reproduction and patch reaching by a factor of 154.3 and 680.9, respectively.
[MS Thesis'23] Hybrid Testing: Combining Static Analysis and Directed Fuzzing
[paper]
Click to see the abstract!
New CVEs are discovered each year and their underlying bugs leave applications vulnerable to exploitation. Software is still frequently written in bug prone languages, e.g. C and C++,
and a single missed check during manual testing can result in vulnerabilities. Existing automated testing tools such as fuzzing are limited in scope or in the case of static analysis, have a high false positive rate. Without improved automated testing, it can be challenging for developers to debug large, complex codebases. In this paper, Hybrid Testing is presented
as a solution. Hybrid Testing combines static and dynamic analyses, leveraging static analysis to perform complex reasoning about logic, memory management, and concurrency. It creates a novel orchestration system which allows us to automatically verify the output of static analysis tools using directed fuzzing. Hybrid Testing is the first vulnerability detection technique with full codebase coverage and no false positives. It can be seamlessly integrated into the development cycle and scales well to large codebases. This work details the design and implementation of Hybrid Testing and evaluates its performance across a corpus of open-source C and C++ applications in the Magma benchmark. Hybrid Testing aims to promote more secure software through rigorous testing, making it easier for developers to detect security issues. We demonstrate Hybrid Testing can find vulnerabilities up to 25% faster with 17% higher accuracy (when detecting additional bugs) than current automated testing strategies.
[Usenix'23] DDRace: Finding Concurrency UAF Vulnerabilities in Linux Drivers with Directed Fuzzing
[paper] [slides] [project]
Click to see the abstract!
Concurrency use-after-free (UAF) vulnerabilities account for a large portion of UAF vulnerabilities in Linux drivers. Many solutions have been proposed to find either concurrency bugs
or UAF vulnerabilities, but few of them can be directly applied to efficiently find concurrency UAF vulnerabilities. In this paper, we propose the first concurrency directed greybox
fuzzing solution DDRace to discover concurrency UAF vulnerabilities efficiently in Linux drivers. First, we identify candidate use-after-free locations as target sites and extract the relevant concurrency elements to reduce the exploration space of directed fuzzing. Second, we design a novel vulnerabilityrelated distance metric and an interleaving priority scheme
to guide the fuzzer to better explore UAF vulnerabilities and thread interleavings. Lastly, to make test cases reproducible, we design an adaptive kernel state migration scheme to assist continuous fuzzing. We have implemented a prototype of DDRace, and evaluated it on upstream Linux drivers. Results show that DDRace is effective at discovering concurrency
use-after-free vulnerabilities. It finds 4 unknown vulnerabilities and 8 known ones, which is more effective than other state-of-the-art solutions.
[arxiv'23] TOPr: Enhanced Static Code Pruning for Fast and Precise Directed Fuzzing
[paper]
Click to see the abstract!
Directed fuzzing is a dynamic testing technique that focuses exploration on specific, pre-targeted program locations. Like other types of fuzzers, directed fuzzers are most effective when maximizing testing speed and precision. To this end, recent directed fuzzers have begun leveraging path pruning: preventing the wasteful testing of program paths deemed irrelevant to reaching a desired target location. Yet, despite code pruning’s substantial speedup, current approaches are imprecise—failing to capture indirect control flow—requiring additional dynamic analyses that diminish directed fuzzers’ speeds. Thus, without code pruning that is both fast and precise, directed fuzzers’
effectiveness will continue to remain limited.
This paper aims to tackle the challenge of upholding both speed and precision in pruning-based directed fuzzing. We show that existing pruning approaches fail to recover common-case indirect control flow; and identify opportunities to enhance them with lightweight heuristics—namely, function signature matching—enabling them to maximize precision without the burden of dynamic analysis. We implement our enhanced pruning as a prototype, TOPr (Target Oriented Pruning), and evaluate it against the leading pruning-based and pruning-agnostic directed fuzzers SieveFuzz and AFLGo. We show that TOPr’s enhanced pruning outperforms these fuzzers in (1) speed (achieving 222%
and 73% higher test case throughput, respectively); (2) reachability (achieving 149% and 9% more target-relevant coverage, respectively); and (3) bug discovery time (triggering bugs faster 85% and 8%, respectively). Furthermore, TOPr’s balance of speed and precision enables it to find 24 new bugs in 5 opensource applications, with 18 confirmed by developers, 12 bugs labelled as “Priority - 1. High”, and 12 bugs fixed — underscoring the effectiveness of our framework.
[S&P'24] Titan : Efficient Multi-target Directed Greybox Fuzzing
[paper] [project]
Click to see the abstract!
Modern directed fuzzing often faces scalability issues when analyzing multiple targets in a program simultaneously. We observe that the root cause is that directed fuzzers are unaware of the correlations among the targets, thereby could degenerate into a target-undirected method. As a result, directed fuzzing suffers severely from efficiency when reproducing multiple targets. This paper presents Titan, which enables fuzzers to distinguish correlations among various targets in the program and, thus, optimizes the input generation to reproduce multiple targets effectively. Leveraging these correlations, Titan differentiates seeds’ potential of reaching each target for the scheduling and identifies bytes that can be changed simultaneously for the mutation. We compare our approach to eight state-of-the-art (directed) fuzzers. The evaluation demonstrates that Titan outperforms existing approaches by efficiently detecting multiple targets, achieving a 21.4x speedup and requiring 95.0% fewer number of executions. In addition, Titan detects ten incomplete fixes, which cannot be detected by other directed fuzzers, in the latest versions of the benchmark programs with two CVE IDs assigned.
[arxiv'23] Toward Unbiased Multiple-Target Fuzzing with Path Diversity
[paper]
Click to see the abstract!
Directed fuzzing is an advanced software testing approach that systematically guides the fuzzing campaign toward userdefined target sites, enabling efficient discovery of vulnerabilities related to these sites. However, we have observed that some complex vulnerabilities remain undetected by directed fuzzers even when the flawed target sites are frequently tested by the generated test cases, because triggering these bugs often requires the execution of additional code in related program locations. Furthermore, when fuzzing multiple targets, the existing energy assignment in directed fuzzing lacks precision and does not ensure the fairness across targets, which leads to insufficient fuzzing effort spent on some deeper targets.
In this paper, we propose a novel directed fuzzing solution named AFLRUN, which features target path-diversity metric and unbiased energy assignment. Firstly, we develop a new coverage metric by maintaining extra virgin map for each covered target to track the coverage status of seeds that hit the target. This approach enables the storage of waypoints into the corpus that hit a target through interesting path, thus enriching the path diversity for each target. Additionally, we propose a corpus-level energy assignment strategy that guarantees fairness for each target. AFLRUN starts with uniform target weight and propagates this weight to seeds to get a desired seed weight distribution. By assigning energy to each seed in the corpus according to such desired distribution, a precise and unbiased energy assignment can be achieved. We built a prototype system and assessed its performance
using a standard benchmark and several extensively fuzzed real-world applications. The evaluation results demonstrate that AFLRUN outperforms state-of-the-art fuzzers in terms of vulnerability detection, both in quantity and speed. Moreover, AFLRUN uncovers 29 previously unidentified vulnerabilities, including 8 CVEs, across four distinct programs.
[S&P'24] Predecessor-aware Directed Greybox Fuzzing
[paper] [project]
Click to see the abstract!
Directed Greybox Fuzzing (DGF) is a target-oriented fuzzing technique that can reproduce or discover software vulnerabilities. The goal is commonly achieved through two phases: static analysis which obtains program structural information beforehand, and dynamic execution that guides fuzzing towards target sites. However, existing DGF methods still incur heavyweight and incomplete issues. The former comes from extra efforts on identifying and approaching the target sites, while the latter refers to the incompleteness of testing on the target sites due to indirect calls or insufficient paths that recent DGF can cover. In this paper, we propose a Predecessor-aware Directed Greybox Fuzzing (PDGF) method and regard DGF as a path-searching problem. PDGF divides a given program into predecessor and non-predecessor areas, and maintains a set of predecessors by lightweight program analysis initially and augmented during the dynamic execution thereafter. Meanwhile, PDGF introduces a novel fitness metric called regional maturity to indicate the coverage rate of predecessors, and contains a simulated annealing-based power scheduling technique together with seed selection and mutation, to cover the predecessor area efficiently and extensively. We evaluate the proposed PDGF on a benchmark that contains 30 real-world program target sites, and conduct extensive comparisons with state-of-the-art DGF tools. Experimental results reveal that PDGF outperforms competitors in terms of Time-To-Exposure, path diversity, and bug finding. Besides, PDGF discovered nine new vulnerabilities, six of which have been assigned CVEs.
[OOPSLA'23] A Cocktail Approach to Practical Call Graph Construction
[paper]
Click to see the abstract!
After decades of research, constructing call graphs for modern C-based software remains either imprecise or inefficient when scaling up to the ever-growing complexity. The main culprit is the difficulty of resolving function pointers, as precise pointer analyses are cubic in nature and become exponential when considering calling contexts. This paper takes a practical stance by first conducting a comprehensive empirical study of function pointer manipulations in the wild. By investigating 5355 indirect calls in five popular open-source systems, we conclude that, instead of the past uniform treatments for function pointers, a cocktail approach can be more effective in “squeezing” the number of difficult pointers to a minimum using a potpourri of cheap methods. In particular, we decompose the costs of constructing highly precise call graphs of big code by tailoring several increasingly precise algorithms and synergizing them into a concerted workflow. As a result, many indirect calls can be precisely resolved in an efficient and principled fashion, thereby reducing the final, expensive refinements. This is, in spirit, similar to the well-known cocktail medical therapy.
The results are encouraging — our implemented prototype called Coral can achieve similar precision versus the previous field-, flow-, and context-sensitive Andersen-style call graph construction, yet scale up to millions of lines of code for the first time, to the best of our knowledge. Moreover, by evaluating the produced call graphs through the lens of downstream clients (i.e., use-after-free detection, thin slicing, and directed grey-box fuzzing), the results show that Coral can dramatically improve their effectiveness for better vulnerability hunting, understanding, and reproduction. More excitingly, we found twelve confirmed bugs (six impacted by indirect calls) in popular systems (e.g., MariaDB), spreading across multiple historical versions.
[ICCAD'23] SurgeFuzz: Surge-Aware Directed Fuzzing for CPU Designs
[paper] [project]
Click to see the abstract!
Various verification methods have been proposed for bug detection in central processing unit (CPU) designs, yet their effectiveness remains insufficient. We have observed that such CPU bugs often occur in exceptional handling, such as pipeline stalls and flushes. We found that corner cases in such exceptional handling can be effectively verified through situations we term a ’surge’. A surge refers to a situation where events leading to exceptional handling occur frequently over a short period of time. For instance, a surge caused by frequent queue insertions can eventually fill the capacity, triggering exceptional handling such as a pipeline stall. We propose a novel fuzzing method for CPU designs, named SurgeFuzz, that intentionally generates surges. SurgeFuzz mutates input instruction sequences based on annotations to increase the occurrence of surges. This results in a higher density of event occurrences, thereby enabling efficient verification of corner cases in exceptional handling. We evaluated SurgeFuzz on a large processor design and found several unknown hardware bugs that are difficult to find with existing methods.
[APSEC'23] On the Effectiveness of Synthetic Benchmarks for Evaluating Directed Grey-box Fuzzers
[paper]
Click to see the abstract!
Directed grey-box fuzzing is difficult to rigorously evaluate for several reasons. First, directed grey-box fuzzers are more prone to overfitting than undirected grey-box fuzzers as they are designed to explore specific paths in the program under test. Furthermore, existing benchmarks are mainly designed for evaluating undirected fuzzers. Hence, they do not provide any information about bug locations, and the difficulty of triggering bugs can substantially vary across different benchmarks. In this paper, we argue that one can address these challenges by automatically generating benchmarks with a bug synthesis technique. Notably, Fuzzle, a state-of-the-art bug synthesis tool, enables generation of arbitrarily many benchmarks, thereby preventing the overfitting problem. It is also well suited for evaluating directed grey-box fuzzers as it provides the exact location of the target bug in the generated benchmark with a guarantee that the bug is lurking deep in the program. With Fuzzle, we systematically evaluate existing state-of-theart directed fuzzers and study their strengths and weaknesses, which would be otherwise difficult to obtain with traditional benchmarks. To our knowledge, this is the first attempt to adopt a bug synthesis technique for evaluating directed fuzzers.
[ICSE'24-SEIP] MicroFuzz: An Efficient Fuzzing Framework for Microservices
[paper]
Click to see the abstract!
Fuzzing is a widely adopted technique in the software industry to enhance security and software quality. However, most existing fuzzers are specifically designed for monolithic software architectures and face significant limitations when it comes to serving distributed Microservices applications (Apps). These limitations primarily revolve around issues of inconsistency, communication, and applicability which arise due to the differences in monolithic and distributed software architecture. This paper presents a novel fuzzing framework, called MicroFuzz, specifically designed for Microservices. Mocking-Assisted Seed Execution, Distributed Tracing, Seed Refresh and Pipeline Parallelism approaches are adopted to address the environmental complexities and dynamics of Microservices and improve the efficiency of fuzzing. MicroFuzz has been successfully implemented and
deployed in AntGroup, a prominent FinTech company. Its performance has been evaluated in three distinct industrial scenarios: normalized fuzzing, iteration testing, and taint verification. Throughout five months of operation, MicroFuzz has diligently analyzed a substantial codebase, consisting of 261 Apps with over 74.6 million lines of code (LOC). The framework’s effectiveness is evident in its detection of 5,718 potential quality or security risks, with 1,764 of them confirmed and fixed as actual security threats by software specialists. Moreover, MicroFuzz significantly increased line coverage by 12.24% and detected new paths by 38.42% in the iteration testing.
[S&P'24] Everything is Good for Something: Counterexample-Guided Directed Fuzzing via Likely Invariant Inference
[paper]
Click to see the abstract!
Directed fuzzing demonstrates the potential to reproduce bug reports, verify patches, and debug vulnerabilities. State-of-the-art directed fuzzers prioritize inputs that are more likely to trigger the target vulnerability or filter irrelevant inputs unrelated to the targets. Despite these efforts, existing approaches struggle to reproduce specific vulnerabilities as most generated inputs are irrelevant. For instance, in the Magma benchmark, more than 94% of generated inputs miss the target vulnerability. We call this challenge the indirect input generation problem.
We propose to increase the yield of inputs that reach the target location by restraining input generation. Our key insight is to infer likely invariants from both reachable and unreachable executed inputs to constrain the search space of the subsequent input generation and produce more reachable inputs. Moreover, we propose two selection strategies to minimize the fraction of unnecessary inputs for efficient invariant inference and deprioritize imprecise invariants for effective input generation. Halo, our prototype implementation, outperforms state-of-the-art directed fuzzers with a 15.3x speedup in reproducing target vulnerabilities by generating 6.2x more reachable inputs. During our evaluation, we also detected ten previously unknown bugs involving seven incomplete fixes in the latest versions of well-fuzzed targets.
[S&P'24] LABRADOR: Response Guided Directed Fuzzing for Black-box IoT Devices
Click to see the abstract!
Fuzzing is a popular solution to finding vulnerabilities in software including IoT firmware. However, due to the challenges of emulating or rehosting firmware, some IoT devices (e.g., enterprise-level devices) can only be fuzzed in a black-box manner, which makes fuzzers blind and inefficient due to missing feedbacks (e.g., code coverage or distance). In this paper, we present a novel response guided directed fuzzing solution LABRADOR, able to test black-box IoT devices efficiently. Specifically, we leverage the network response to infer the execution trace of firmware and deduce the code coverage of testing. Second, we leverage the test case (i.e., request) and its response to estimate the distance to the target sensitive code (i.e., sink). Lastly, we further leverage the distance to guide test case mutation, which efficiently drives directed fuzzing toward candidate vulnerable code. We have implemented a prototype of LABRADOR and evaluated it on 14 different enterprise-level IoT devices. Results showed that LABRADOR significantly outperforms state-of-the-art (SOTA) solutions. It finds 44X more vulnerabilities than SNIPUZZ, BOOFUZZ and FIRM-AFL and 8.57X more vulnerabilities than SaTC. In total, it discovered 79 unknown vulnerabilities, of which 61 were assigned with CVEs.
[Usenix'24] SDFUZZ: Target States Driven Directed Fuzzing
[paper]
Click to see the abstract!
Directed fuzzers often unnecessarily explore program code and paths that cannot trigger the target vulnerabilities. We observe that the major application scenarios of directed fuzzing provide detailed vulnerability descriptions, from which highly-valuable program states (i.e., target states) can be derived, e.g., call traces when a vulnerability gets triggered. By driving to expose such target states, directed fuzzers can exclude massive unnecessary exploration. Inspired by the observation, we present SDFUZZ, an efficient directed fuzzing tool driven by target states. SDFUZZ first automatically extracts target states in vulnerability reports and static analysis results. SDFUZZ employs a selective instrumentation technique to reduce the fuzzing scope to the required code for reaching target states. SDFUZZ then early terminates the execution of a test case once SDFUZZ probes that the remaining execution cannot reach the target states. It further uses a new target state feedback and refines prior imprecise distance metric into a two-dimensional feedback mechanism to proactively drive the exploration towards the target states. We thoroughly evaluated SDFUZZ on known vulnerabilities and compared it to related works. The results show that SDFUZZ could improve vulnerability exposure capability with more vulnerability triggered and less time used, outperforming the state-of-the-art solutions. SDFUZZ could significantly improve the fuzzing throughput. Our application of SDFUZZ to automatically validate the static analysis results successfully discovered four new vulnerabilities in well-tested applications. Three of them have been acknowledged by developers.
[NDSS'24] DeepGo: Predictive Directed Greybox Fuzzing
[paper] [project]
Click to see the abstract!
Directed Greybox Fuzzing (DGF) is an effective approach designed to strengthen testing vulnerable code areas via predefined target sites. The state-of-the-art DGF techniques redefine and optimize the fitness metric to reach the target sites precisely and quickly. However, optimizations for fitness metrics are mainly based on heuristic algorithms, which usually rely on historical execution information and lack foresight on paths that have not been exercised yet. Thus, those hard-to-execute paths with complex constraints would hinder DGF from reaching the targets, making DGF less efficient.
In this paper, we propose DeepGo, a predictive directed greybox fuzzer that can combine historical and predicted information to steer DGF to reach the target site via an optimal path. We first propose the path transition model, which models DGF as a process of reaching the target site through specific path transition sequences. The new seed generated by mutation would cause the path transition, and the path corresponding to the high-reward path transition sequence indicates a high likelihood of reaching the target site through it. Then, to predict the path transitions and the corresponding rewards, we use deep neural networks to construct a Virtual Ensemble Environment (VEE), which gradually imitates the path transition model and predicts the rewards of path transitions that have not been taken yet. To determine the optimal path, we develop a Reinforcement Learning for Fuzzing (RLF) model to generate the transition sequences with the highest sequence rewards. The RLF model can combine historical and predicted path transitions to generate the optimal path transition sequences, along with the policy to guide the mutation strategy of fuzzing. Finally, to exercise the high-reward path transition sequence, we propose the concept of an action group, which comprehensively optimizes the critical steps of fuzzing to realize the optimal path to reach the target efficiently. We evaluated DeepGo on 2 benchmarks consisting of 25 programs with a total of 100 target sites. The experimental results show that DeepGo achieves 3.23×, 1.72×, 1.81×, and 4.83× speedup compared to AFLGo, BEACON, WindRanger, and ParmeSan, respectively in reaching target sites, and 2.61×, 3.32×, 2.43× and 2.53× speedup in exposing known vulnerabilities.
[FSE'24] Evaluating Directed Fuzzers: Are We Heading in the Right Direction?
[paper]
Click to see the abstract!
Directed fuzzing recently has gained significant attention due to its ability to reconstruct proof-of-concept (PoC) test cases for target code such as buggy lines or functions. Surprisingly, however, there has been no in-depth study on the way to properly evaluate directed fuzzers despite much progress in the field. In this paper, we present the first systematic study on the evaluation of directed fuzzers. In particular, we analyze common pitfalls in evaluating directed fuzzers with extensive experiments on five state-of-the-art tools, which amount to 30 CPU-years of computational effort, in order to confirm that different choices made at each step of the evaluation process can significantly impact the results. For example, we find that a small change in the crash triage logic can substantially affect the measured performance of a directed fuzzer, while the majority of the papers we studied do not fully disclose their crash triage scripts. We argue that disclosing the whole evaluation process is essential for reproducing research and facilitating future work in the field of directed fuzzing. In addition, our study reveals that several common evaluation practices in the current directed fuzzing literature can mislead the overall assessments. Thus, we identify such mistakes in previous papers and propose guidelines for evaluating directed fuzzers.
[SBFT'24] TuneFuzz: adaptively exploring target programs
[paper]
Click to see the abstract!
The trade-off between exploration and exploitation stages poses a major challenge to Greybox fuzzing. TUNEFUZZ addresses this challenge through a novel input prioritization algorithm that maximizes the reached and triggered sanitizer labels. Our multi-distance metric and dynamic target ranking improve both exploration and exploitation. TUNEFUZZ found 56 new bugs (38 CVEs) in well-tested open source software.