wip: Split fuzz binary #29010

dergoegge · 2023-12-06T13:56:22Z

Split each harness into its own file
Support compiling individual harness through CPPFLAGS="-DFUZZ_HARNESS=<harness name>"
Build individual binaries

The cumulative size of the individual binaries (compiled with LTO) is 3.6GB vs 14GB when search and replacing std::getenv("FUZZ") (like we do in oss-fuzz).

TODOs:

Introduce option for building individual binaries
Simplify building individual binaries (i.e. don't hard code each binary in Makefile.test.include)
Deal with test/fuzz/{tx_pool, tx_package_eval, deserialize}.cpp
Introduce a "one harness per file" linter

DrahtBot · 2023-12-06T13:56:25Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

dergoegge · 2023-12-06T13:59:28Z

cc @maflcko

turns out we can actually keep the map if we just don't add all the harness functions into it

maflcko · 2023-12-06T14:39:20Z

Nice, but I am not sure if we want to go down to route to put everyone back into the makefile hell, because this will just make writing new fuzz tests bloaty and impossible to maintain/review (Who is going to read those repetitive and ugly 1k LOC of build code). (Unrelated: Not even sure how this will interact with cmake)

It seems easier to just put a one-line bash command into the readme to achieve the same, for anyone that needs it (oss-fuzz, fuzz-introspector, afl-lto, etc ...).

Happy to provide more feedback, but for now it would be good to explain why each target needs to be in a separate file.

In C++ it should be possible to have more than one function in the same file. Also, given that FuzzFrameworkRegisterTarget isn't called for the function pointer, LTO should see this and nuke the function?

dergoegge · 2023-12-06T14:46:05Z

I split the harnesses into individual files so that we can use the file list in src/test/fuzz as the harness list. It should be possible to use that to have a loop (?) in the makefile to build each binary instead of the hard-coded list I have here (I haven't figured out how to do that yet). If that works, adding a new fuzz harness becomes easier as no makefile would need to be edited.

maflcko · 2023-12-06T15:06:22Z

turns out we can actually keep the map if we just don't add all the harness functions into it

Are you sure on this? I don't know how afl or fuzz-introspector detect reachable code paths, but I'd be surprised if the static analysis can follow a function pointer hidden in a map.

DrahtBot · 2023-12-06T15:55:28Z

🐙 This pull request conflicts with the target branch and needs rebase.

dergoegge · 2023-12-06T15:59:34Z

Are you sure on this? I don't know how afl or fuzz-introspector detect reachable code paths, but I'd be surprised if the static analysis can follow a function pointer hidden in a map.

Yea maybe not... should be quite easy to avoid the map.

I'll try to setup a fuzz-introspector instance at some point, I have a feeling it'll still be a while before we can use oss-fuzz's instance.

ajtowns · 2023-12-07T00:36:06Z

The cumulative size of the individual binaries (compiled with LTO) is 3.6GB vs 14GB when search and replacing std::getenv("FUZZ") (like we do in oss-fuzz).

What does this mean? I'm seeing file sizes of 448MB (with fuzzing enabled) and 146MB (with a normal build) for test/fuzz/fuzz.

This seems like it's reverting #20560?

maflcko · 2023-12-07T07:45:21Z

This seems like it's reverting #20560?

yeah, in the current form this is reverting that pull, so I don't think it can be merged. Though, I guess the goal is to somehow come up with magic makefile code (which can optionally be enabled) to extend the build code from the file names automatically.

maflcko · 2023-12-07T08:45:52Z

I guess the "one fuzz target per file" makes the "one fuzz binary per message/rpc type" idea a bit harder to implement: #28015 (comment) ?

dergoegge · 2023-12-07T10:50:19Z

What does this mean? I'm seeing file sizes of 448MB (with fuzzing enabled) and 146MB (with a normal build) for test/fuzz/fuzz.

If you build this pull request you should see a binary per fuzz harness in src/test/fuzz (e.g. src/test/fuzz/process_message), as well as the usual fuzz binary src/test/fuzz/fuzz. If you sum up the size of the individual per harness binaries (assuming you compile with LTO) you should end up with roughly 3.6GB (maybe this depends on compiler version etc.). The 14GB is the size of src/test/fuzz/fuzz multiplied by the number harnesses we have (in this PR 127 harnesses), since that is what we have in oss-fuzz at the moment.

How I compiled this PR to get these stats:

CC=afl-clang-lto CXX=afl-clang-lto++ ./configure --enable-fuzz
make

maflcko · 2023-12-07T10:57:42Z

src/test/fuzz/fuzz.h

+        name##_Before_Main()                                                           \
+        {                                                                              \
+            if constexpr (should_compile_harness(#name)) {                             \
+                FuzzFrameworkRegisterTarget(#name, name##_fuzz_target, {__VA_ARGS__}); \


Is the change in this header file needed for your approach? If there is only one fuzz target per file, and only one file is compiled, you wouldn't need to check whether it needs to be compiled, no?

Right, I think once building all the individual binaries is optional, this could be used to build just one of the individual binaries, e.g.:

CPPFLAGS="-DFUZZ_HARNESS=process_message" ./configure --enable-fuzz make

which would produce src/test/fuzz/fuzz that only has the process_message harness. Building and linking just one harness is faster than all of them, maybe useful when reproducing testcases?

Maybe there is a better way of accomplishing this?

Building and linking just one harness is faster than all of them, maybe useful when reproducing testcases?

Not sure. This isn't supported for any other tests (bench, unit, ...), so maybe leave as a follow-up.

If building takes a long time, you can use more CPUs or a populated ccache.

Also, it could be done easier inside the makefile by just skipping over the other files, if needed?

ajtowns · 2023-12-07T22:26:25Z

What does this mean? I'm seeing file sizes of 448MB (with fuzzing enabled) and 146MB (with a normal build) for test/fuzz/fuzz.

If you build this pull request you should see a binary per fuzz harness in src/test/fuzz (e.g. src/test/fuzz/process_message), as well as the usual fuzz binary src/test/fuzz/fuzz. If you sum up the size of the individual per harness binaries (assuming you compile with LTO) you should end up with roughly 3.6GB (maybe this depends on compiler version etc.).

Hmm, if I switch to this PR and try compiling, I get bunches of linker errors about multiple definitions, eg

/usr/bin/ld: test/fuzz/fuzz-addrman_serdeser.o: in function `std::iterator_traits<std::byte const*>::difference_type std::__distance<std::byte const*>(std::byte const*, std::byte const*, std::random_access_iterator_tag)':

./src/./tinyformat.h:666: multiple definition of `MakeV1Transport(long)'; test/fuzz/fuzz-addrman.o:./src/./tinyformat.h:666: first defined here

Dropping from 14GB to 3.6GB seems worthwhile for oss-fuzz (and similar), but not worthwhile otherwise to me -- it's not compiling so I can't say for sure, but 450MB bloating out to 3.6GB sounds terrible? Also LTO seems pretty slow, and not amenable to ccache...

Shouldn't it be possible to just setup test/fuzz/fuzz.cpp so it (alone) can be recompiled with FUZZ_HARNESS defined, and rely on LTO for everything else (rather than should_compile_harness)? Something like: https://github.com/ajtowns/bitcoin/tree/202312-fuzz-many-exes ?

You could maybe use something like:

$ find -name '*.cpp' | sort | while read a; do grep ^FUZZ_TARGET $a | sed 's/^FUZZ_TARGET[_A-Z]*[(] *\([^,) ]*\) *[,)].*/\1/'; done

to get a list of fuzz targets without having to compile the code first, or separate things into different files.

dergoegge · 2023-12-11T12:37:15Z

Hmm, if I switch to this PR and try compiling, I get bunches of linker errors about multiple definitions, eg

Yea weird, it compiles for me with afl-clang-lto but not with regular clang.

Dropping from 14GB to 3.6GB seems worthwhile for oss-fuzz (and similar), but not worthwhile otherwise to me

I listed more benefits in #28971. It is not gonna have a noticeable effect for contributors that aren't regularly fuzzing.

to get a list of fuzz targets without having to compile the code first, or separate things into different files.

The splitting is super straight forward though and the more maintainable thing IMO (your script does not work for e.g. test/fuzz/deserialize.cpp). If we are going to continue employing sed hacks, we can just keep doing that in oss-fuzz.

I guess the goal is to somehow come up with magic makefile code (which can optionally be enabled) to extend the build code from the file names automatically.

After consulting our build system gods, it looks like this is not possible with automake. It is apparently very easy to do with cmake though, so I guess we can consider this blocked until cmake is in.

dergoegge · 2024-02-01T11:55:58Z

I'll re-open this once we switched to cmake.

I've rebased the commits on the current cmake staging branch and building individual binaries using cmake is quite easy (assuming there is one harness per file):

  file(GLOB fuzz_harness_file_list RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "./*.cpp")
  # Remove fuzz.cpp and util.cpp as they don't contain fuzz harnesses
  list(REMOVE_ITEM fuzz_harness_file_list "fuzz.cpp")
  list(REMOVE_ITEM fuzz_harness_file_list "util.cpp")

  foreach(harness_file ${fuzz_harness_file_list})
    get_filename_component(harness ${harness_file} NAME_WLE)
    add_executable(fuzz_${harness} ${harness_file})
    target_compile_definitions(fuzz_${harness} PUBLIC SINGLE_FUZZ_HARNESS)
    target_link_libraries(fuzz_${harness}
      bitcoin_crypto # TODO ??? memory_cleanse undefined errors
      core_interface
      test_fuzz
      bitcoin_cli
      bitcoin_common
      minisketch
      leveldb
      univalue
      secp256k1
      Boost::headers
      libevent::libevent
      bitcoin_consensus
    )
  endforeach()

dergoegge added 17 commits December 5, 2023 14:07

refactor, fuzz: Rename fuzz harness files

8266eca

refactor, fuzz: Split mini_miner harnesses into separate files

f2d3d3b

refactor, fuzz: Split out addrman utils

2facf07

refactor, fuzz: Split addrman harnesses into separate files

ff94c48

refactor, fuzz: Split key fuzz harnesses into separate files

c26f1a0

refactor, fuzz: Move SimualtionTest to fuzz/util

2efafc0

refactor, fuzz: Split p2p transport harnesses into separate files

2e05eb4

refactor, fuzz: Split miniscript utils from harness

0867a2d

refactor, fuzz: Split miniscript harnesses into separate files

9776fc4

refactor, fuzz: Split ChaCha20SplitFuzz to fuzz crypto utils

897a65e

refactor, fuzz: Split chacha20 harnesses into separate files

36eec55

refactor, fuzz: Move TestDescriptor to descriptor fuzz utils

037470b

refactor, fuzz: Split descriptor parse harnesses into separate files

14263e0

refactor, fuzz: Split poly1305 harnesses into separate files

071bb3c

NO MERGE: Remove mempool and deser harnesses

0d13e49

fuzz: Support compiling only one harness

ff31cc1

NO MERGE: build individual fuzz binaries

8118b24

DrahtBot added the Needs rebase label Dec 6, 2023

maflcko reviewed Dec 7, 2023

View reviewed changes

dergoegge closed this Feb 1, 2024

dergoegge mentioned this pull request Sep 12, 2024

wip: Split fuzz binary (take 2) #30882

Closed

3 tasks

bitcoin locked and limited conversation to collaborators Jan 31, 2025

wip: Split fuzz binary #29010

wip: Split fuzz binary #29010

Uh oh!

Conversation

dergoegge commented Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage

Reviews

Uh oh!

dergoegge commented Dec 6, 2023

Uh oh!

maflcko commented Dec 6, 2023

Uh oh!

dergoegge commented Dec 6, 2023

Uh oh!

maflcko commented Dec 6, 2023

Uh oh!

DrahtBot commented Dec 6, 2023

Uh oh!

dergoegge commented Dec 6, 2023

Uh oh!

ajtowns commented Dec 7, 2023

Uh oh!

maflcko commented Dec 7, 2023

Uh oh!

maflcko commented Dec 7, 2023

Uh oh!

dergoegge commented Dec 7, 2023

Uh oh!

maflcko Dec 7, 2023

Choose a reason for hiding this comment

Uh oh!

dergoegge Dec 7, 2023

Choose a reason for hiding this comment

Uh oh!

maflcko Dec 7, 2023

Choose a reason for hiding this comment

Uh oh!

ajtowns commented Dec 7, 2023

Uh oh!

dergoegge commented Dec 11, 2023

Uh oh!

dergoegge commented Feb 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dergoegge commented Dec 6, 2023 •

edited

Loading

DrahtBot commented Dec 6, 2023 •

edited

Loading