Question About Initial Input Seeds for Fuzzing Libraries in OSS-Fuzz #12422

fedecerno · 2024-08-29T09:54:34Z

I have some questions regarding the initial input seeds used for fuzzing these libraries.
The libraries in question are binutils, cairo, libzip, llvm, mupdf, and sqlite3.
I would like to know:

Are the initial input seeds used by OSS-Fuzz manually created by humans, or are they generated through fuzzing campaigns?
If there are human-made initial input seeds, can the most recent versions be accessed?
Alternatively, if only seeds from fuzzing campaigns are available, could I obtain the oldest initial input seeds that have undergone the fewest fuzzing iterations?

maflcko · 2024-08-29T10:48:33Z

The initial seeds are usually added in the build.sh of that project. For example:

oss-fuzz/projects/llvm/build.sh

Lines 187 to 189 in 802a321

 zip -j "${OUT}/clang-objc-fuzzer_seed_corpus.zip" $SRC/$LLVM/../clang/tools/clang-fuzzer/corpus_examples/objc/* 

 zip -j "${OUT}/clangd-fuzzer_seed_corpus.zip" $SRC/$LLVM/../clang-tools-extra/clangd/test/* 

 zip -j "${OUT}/clang-fuzzer_seed_corpus.zip" $SRC/llvm-project/clang/test/Parser/*.cpp

If you want to go back in time, you'll have to follow the git history of the build.sh, or the corresponding source of the inputs.

DavidKorczynski · 2024-09-04T10:49:58Z

Could you clarify what you mean by "created by humans"? I think for each OSS-Fuzz project there has been a human involved in setting up the seeds, however, there is perhaps a spectrum of involvement, e.g. whether the seed files were pre-existing and just copied out to the harness corpus folder, whether there were some involvement e.g. finding relevant pre-existing images that can be used as seeds, whether a human actively assembled the seeds in a programmatic manner like structured generation or whether the human assembled a given seed file byte-by-byte manually.

I don't think there are any cases of the latter, but there are many different variations of the three former -- do they constitute "created by humans" though?

There are no initial input seeds used by OSS-Fuzz that are "generated through fuzzing campaigns", at least not OSS-Fuzz running it -- it may be that a developer has run things locally and uploaded it, and that's not something OSS-Fuzz maintainers would be keeping track of.

fedecerno · 2024-09-05T12:47:24Z

By "created by humans," I mean that there was human involvement in creating the seeds, as opposed to the seeds simply resulting from a series of fuzzing campaigns where the "best" seeds from one campaign are taken as input for the next.

I think the last part of your message has clarified my doubts — OSS-Fuzz uses seeds uploaded by developers rather than seeds generated through automatically run fuzzing campaigns.
That said, it's possible that the input seeds currently being used result from prior fuzzing campaigns, but there's no way to know for sure.

DavidKorczynski · 2024-09-05T13:51:12Z

as opposed to the seeds simply resulting from a series of fuzzing campaigns where the "best" seeds from one campaign are taken as input for the next.

OSS-Fuzz naturally saves the corpus generated and carries it forward in iterations, which as far as I can tell is what you describe here. OSS-Fuzz also does corpus minimization to "narrow down the corpus to a set of optimal inputs" -- but that is all done by https:/google/clusterfuzz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question About Initial Input Seeds for Fuzzing Libraries in OSS-Fuzz #12422

Question About Initial Input Seeds for Fuzzing Libraries in OSS-Fuzz #12422

fedecerno commented Aug 29, 2024

maflcko commented Aug 29, 2024

DavidKorczynski commented Sep 4, 2024

fedecerno commented Sep 5, 2024

DavidKorczynski commented Sep 5, 2024

Question About Initial Input Seeds for Fuzzing Libraries in OSS-Fuzz #12422

Question About Initial Input Seeds for Fuzzing Libraries in OSS-Fuzz #12422

Comments

fedecerno commented Aug 29, 2024

maflcko commented Aug 29, 2024

DavidKorczynski commented Sep 4, 2024

fedecerno commented Sep 5, 2024

DavidKorczynski commented Sep 5, 2024