Skip to content

Conversation

@dergoegge
Copy link
Member

@dergoegge dergoegge commented Dec 6, 2023

Closes #28971

  • Split each harness into its own file
  • Support compiling individual harness through CPPFLAGS="-DFUZZ_HARNESS=<harness name>"
  • Build individual binaries

The cumulative size of the individual binaries (compiled with LTO) is 3.6GB vs 14GB when search and replacing std::getenv("FUZZ") (like we do in oss-fuzz).

TODOs:

  • Introduce option for building individual binaries
  • Simplify building individual binaries (i.e. don't hard code each binary in Makefile.test.include)
  • Deal with test/fuzz/{tx_pool, tx_package_eval, deserialize}.cpp
  • Introduce a "one harness per file" linter

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 6, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

@dergoegge
Copy link
Member Author

cc @maflcko

turns out we can actually keep the map if we just don't add all the harness functions into it

@maflcko
Copy link
Member

maflcko commented Dec 6, 2023

Nice, but I am not sure if we want to go down to route to put everyone back into the makefile hell, because this will just make writing new fuzz tests bloaty and impossible to maintain/review (Who is going to read those repetitive and ugly 1k LOC of build code). (Unrelated: Not even sure how this will interact with cmake)

It seems easier to just put a one-line bash command into the readme to achieve the same, for anyone that needs it (oss-fuzz, fuzz-introspector, afl-lto, etc ...).

Happy to provide more feedback, but for now it would be good to explain why each target needs to be in a separate file.

In C++ it should be possible to have more than one function in the same file. Also, given that FuzzFrameworkRegisterTarget isn't called for the function pointer, LTO should see this and nuke the function?

@dergoegge
Copy link
Member Author

I split the harnesses into individual files so that we can use the file list in src/test/fuzz as the harness list. It should be possible to use that to have a loop (?) in the makefile to build each binary instead of the hard-coded list I have here (I haven't figured out how to do that yet). If that works, adding a new fuzz harness becomes easier as no makefile would need to be edited.

@maflcko
Copy link
Member

maflcko commented Dec 6, 2023

turns out we can actually keep the map if we just don't add all the harness functions into it

Are you sure on this? I don't know how afl or fuzz-introspector detect reachable code paths, but I'd be surprised if the static analysis can follow a function pointer hidden in a map.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 6, 2023

🐙 This pull request conflicts with the target branch and needs rebase.

@dergoegge
Copy link
Member Author

Are you sure on this? I don't know how afl or fuzz-introspector detect reachable code paths, but I'd be surprised if the static analysis can follow a function pointer hidden in a map.

Yea maybe not... should be quite easy to avoid the map.

I'll try to setup a fuzz-introspector instance at some point, I have a feeling it'll still be a while before we can use oss-fuzz's instance.

@ajtowns
Copy link
Contributor

ajtowns commented Dec 7, 2023

The cumulative size of the individual binaries (compiled with LTO) is 3.6GB vs 14GB when search and replacing std::getenv("FUZZ") (like we do in oss-fuzz).

What does this mean? I'm seeing file sizes of 448MB (with fuzzing enabled) and 146MB (with a normal build) for test/fuzz/fuzz.

This seems like it's reverting #20560?

@maflcko
Copy link
Member

maflcko commented Dec 7, 2023

This seems like it's reverting #20560?

yeah, in the current form this is reverting that pull, so I don't think it can be merged. Though, I guess the goal is to somehow come up with magic makefile code (which can optionally be enabled) to extend the build code from the file names automatically.

@maflcko
Copy link
Member

maflcko commented Dec 7, 2023

I guess the "one fuzz target per file" makes the "one fuzz binary per message/rpc type" idea a bit harder to implement: #28015 (comment) ?

@dergoegge
Copy link
Member Author

What does this mean? I'm seeing file sizes of 448MB (with fuzzing enabled) and 146MB (with a normal build) for test/fuzz/fuzz.

If you build this pull request you should see a binary per fuzz harness in src/test/fuzz (e.g. src/test/fuzz/process_message), as well as the usual fuzz binary src/test/fuzz/fuzz. If you sum up the size of the individual per harness binaries (assuming you compile with LTO) you should end up with roughly 3.6GB (maybe this depends on compiler version etc.). The 14GB is the size of src/test/fuzz/fuzz multiplied by the number harnesses we have (in this PR 127 harnesses), since that is what we have in oss-fuzz at the moment.

How I compiled this PR to get these stats:

CC=afl-clang-lto CXX=afl-clang-lto++ ./configure --enable-fuzz
make

name##_Before_Main() \
{ \
if constexpr (should_compile_harness(#name)) { \
FuzzFrameworkRegisterTarget(#name, name##_fuzz_target, {__VA_ARGS__}); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change in this header file needed for your approach? If there is only one fuzz target per file, and only one file is compiled, you wouldn't need to check whether it needs to be compiled, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think once building all the individual binaries is optional, this could be used to build just one of the individual binaries, e.g.:

CPPFLAGS="-DFUZZ_HARNESS=process_message"  ./configure --enable-fuzz
make

which would produce src/test/fuzz/fuzz that only has the process_message harness. Building and linking just one harness is faster than all of them, maybe useful when reproducing testcases?

Maybe there is a better way of accomplishing this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building and linking just one harness is faster than all of them, maybe useful when reproducing testcases?

Not sure. This isn't supported for any other tests (bench, unit, ...), so maybe leave as a follow-up.

If building takes a long time, you can use more CPUs or a populated ccache.

Also, it could be done easier inside the makefile by just skipping over the other files, if needed?

@ajtowns
Copy link
Contributor

ajtowns commented Dec 7, 2023

What does this mean? I'm seeing file sizes of 448MB (with fuzzing enabled) and 146MB (with a normal build) for test/fuzz/fuzz.

If you build this pull request you should see a binary per fuzz harness in src/test/fuzz (e.g. src/test/fuzz/process_message), as well as the usual fuzz binary src/test/fuzz/fuzz. If you sum up the size of the individual per harness binaries (assuming you compile with LTO) you should end up with roughly 3.6GB (maybe this depends on compiler version etc.).

Hmm, if I switch to this PR and try compiling, I get bunches of linker errors about multiple definitions, eg

/usr/bin/ld: test/fuzz/fuzz-addrman_serdeser.o: in function `std::iterator_traits<std::byte const*>::difference_type std::__distance<std::byte const*>(std::byte const*, std::byte const*, std::random_access_iterator_tag)':

./src/./tinyformat.h:666: multiple definition of `MakeV1Transport(long)'; test/fuzz/fuzz-addrman.o:./src/./tinyformat.h:666: first defined here

Dropping from 14GB to 3.6GB seems worthwhile for oss-fuzz (and similar), but not worthwhile otherwise to me -- it's not compiling so I can't say for sure, but 450MB bloating out to 3.6GB sounds terrible? Also LTO seems pretty slow, and not amenable to ccache...

Shouldn't it be possible to just setup test/fuzz/fuzz.cpp so it (alone) can be recompiled with FUZZ_HARNESS defined, and rely on LTO for everything else (rather than should_compile_harness)? Something like: https://github.com/ajtowns/bitcoin/tree/202312-fuzz-many-exes ?

You could maybe use something like:

$ find -name '*.cpp' | sort | while read a; do grep ^FUZZ_TARGET $a | sed 's/^FUZZ_TARGET[_A-Z]*[(] *\([^,) ]*\) *[,)].*/\1/'; done

to get a list of fuzz targets without having to compile the code first, or separate things into different files.

@dergoegge
Copy link
Member Author

Hmm, if I switch to this PR and try compiling, I get bunches of linker errors about multiple definitions, eg

Yea weird, it compiles for me with afl-clang-lto but not with regular clang.

Dropping from 14GB to 3.6GB seems worthwhile for oss-fuzz (and similar), but not worthwhile otherwise to me

I listed more benefits in #28971. It is not gonna have a noticeable effect for contributors that aren't regularly fuzzing.

to get a list of fuzz targets without having to compile the code first, or separate things into different files.

The splitting is super straight forward though and the more maintainable thing IMO (your script does not work for e.g. test/fuzz/deserialize.cpp). If we are going to continue employing sed hacks, we can just keep doing that in oss-fuzz.

I guess the goal is to somehow come up with magic makefile code (which can optionally be enabled) to extend the build code from the file names automatically.

After consulting our build system gods, it looks like this is not possible with automake. It is apparently very easy to do with cmake though, so I guess we can consider this blocked until cmake is in.

@dergoegge
Copy link
Member Author

I'll re-open this once we switched to cmake.

I've rebased the commits on the current cmake staging branch and building individual binaries using cmake is quite easy (assuming there is one harness per file):

  file(GLOB fuzz_harness_file_list RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "./*.cpp")
  # Remove fuzz.cpp and util.cpp as they don't contain fuzz harnesses
  list(REMOVE_ITEM fuzz_harness_file_list "fuzz.cpp")
  list(REMOVE_ITEM fuzz_harness_file_list "util.cpp")

  foreach(harness_file ${fuzz_harness_file_list})
    get_filename_component(harness ${harness_file} NAME_WLE)
    add_executable(fuzz_${harness} ${harness_file})
    target_compile_definitions(fuzz_${harness} PUBLIC SINGLE_FUZZ_HARNESS)
    target_link_libraries(fuzz_${harness}
      bitcoin_crypto # TODO ??? memory_cleanse undefined errors
      core_interface
      test_fuzz
      bitcoin_cli
      bitcoin_common
      minisketch
      leveldb
      univalue
      secp256k1
      Boost::headers
      libevent::libevent
      bitcoin_consensus
    )
  endforeach()

@dergoegge dergoegge closed this Feb 1, 2024
@dergoegge dergoegge mentioned this pull request Sep 12, 2024
3 tasks
@bitcoin bitcoin locked and limited conversation to collaborators Jan 31, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fuzz, brainstorm: Individual binaries per harness

4 participants