Skip to content

Conversation

@dergoegge
Copy link
Member

FuzzedSock has a few issues that block a fuzzer from making progress. See commit messages for details.

…k::Recv

See comment on FuzzedDataProvider::ConsumeRandomLengthString.
@DrahtBot
Copy link
Contributor

DrahtBot commented May 31, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK marcofleon, brunoerg
Approach ACK vasild

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@DrahtBot DrahtBot added the Tests label May 31, 2024
@dergoegge
Copy link
Member Author

These issues were discovered by Marco (@marcofleon) and me while he was working on #28803. He will have a PR for that up soon and this PR is a prerequisite (although it also helps with other harnesses that use FuzzedSock).

cc @vasild perhaps you are interested in reviewing this.

@brunoerg
Copy link
Contributor

brunoerg commented Jun 1, 2024

Concept ACK

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK 617d43e656816a532ca275707dffeb6e2a2e1fc7

Just one blocker below (about the overflow).

Thank you for the improvements!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the commit message:

FuzzedSock only supports peeking at one byte at a time, which is not
fuzzer friendly when trying to receive long data.

I am not sure how likely is this to happen, but at least it is not ruled out by the recv(2) documentation: a peek read of 10 bytes returns just 1 byte and repeating the peek read of 10 bytes returns the same 1 byte over and over again. The app can never peek past the 1 byte without actually consuming/reading that byte. I guess it must be ready for this behavior from the OS or from a mocked fuzzed socket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could only find the following in the docs and based the changes here on that.

This flag causes the receive operation to return data from
the beginning of the receive queue without removing that
data from the queue.  Thus, a subsequent receive call will
return the same data.

We could make it so that a peek read only returns one byte and a subsequent normal read return N bytes where the first byte corresponds to the peek byte. But I'm not sure if that would change much for our use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make it so that a peek read only returns one byte and a subsequent normal read return N bytes where the first byte corresponds to the peek byte.

This was the case before this PR, right?

But I'm not sure if that would change much for our use case.

Where can I see that use case? Is there some code that does not work without the changes to FuzzedSock::Recv() done in this PR?

The doc you cited, in my understanding, means that what I described in the first comment of this thread is not ruled out and might happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the case before this PR, right?

No, a peek read before this PR would return 1 byte and a normal read following that would also only return that one byte (with the exception of sometimes padding with a bunch of zeros). This is what causes the fuzz input to never be interpreted as one continuous sequence of data, which is why mutations that insert pieces of data (e.g. from a dictionary) are ineffective.

Where can I see that use case? Is there some code that does not work without the changes to FuzzedSock::Recv() done in this PR?

Yes the i2p harness is unable to cover most of the target code without these changes: #30230.

@dergoegge dergoegge force-pushed the 2024-05-fix-fuzzsock branch from 617d43e to 1296e3e Compare June 3, 2024 09:31
dergoegge and others added 2 commits June 3, 2024 10:32
FuzzedSock only supports peeking at one byte at a time, which is not
fuzzer friendly when trying to receive long data.

Fix this by supporting peek data of arbitrary length instead of only one
byte.
Currently, when the FuzzedDataProvider of a FuzzedSock runs out of data,
FuzzedSock::Wait and WaitMany will simulate endless waiting as the
requested events are never simulated as occured.

Fix this by simulating event occurence when ConsumeBool() returns false
(e.g. when the data provider runs out).

Co-authored-by: dergoegge <[email protected]>
@dergoegge dergoegge force-pushed the 2024-05-fix-fuzzsock branch from 1296e3e to 22d0f1a Compare June 3, 2024 09:33
Copy link
Contributor

@marcofleon marcofleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK 22d0f1a
In i2p the private key needs to be at least 387 bytes. When this key is generated (using FuzzedSock) the bytes were being broken up such that the length of the saved key was always less than 387. Having Recv return the length of a byte vector, instead of a single byte, when MSG_PEEK is enabled fixes this and allows the fuzzer to continue through the I2P target (#28803).

@DrahtBot DrahtBot requested review from brunoerg and vasild June 4, 2024 12:30
Copy link
Contributor

@brunoerg brunoerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 22d0f1a

@fanquake fanquake merged commit d39f15a into bitcoin:master Jun 4, 2024
Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was merged prematurely.

* `Recv()` call we must return the same data, thus we remember it here.
*/
mutable std::optional<uint8_t> m_peek_data;
mutable std::optional<std::vector<uint8_t>> m_peek_data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::vector<uint8_t> can be used instead of std::optional<std::vector<uint8_t>> and if the vector is empty that is the same as the optional without a value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to open a follow up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the optional in #30273.

Comment on lines +218 to +222
// `random_bytes` might exceed the size of `buf` if e.g. Recv is called with
// len=N and MSG_PEEK first and afterwards called with len=M (M < N) and
// without MSG_PEEK.
size_t recv_len{std::min(random_bytes.size(), len)};
std::memcpy(buf, random_bytes.data(), recv_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it will not overflow buf but it will lose the trailing N - M bytes from the peek.

To fix this, when assigning random_bytes earlier, it should only take/pop the first len bytes from m_peek_data.

I think all this may be unnecessary if the app has to work with peek only returning 1 byte, as discussed in #30211 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes were very much necessary for FuzzedSock to be fuzzer friendly. Feel free to open a follow up, if you have suggestion how further improve things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... will lose the trailing N - M bytes from the peek.

Fixed in #30273

fanquake added a commit that referenced this pull request Jun 12, 2024
193c748 fuzz: add I2P harness (marcofleon)

Pull request description:

  Addresses #28803. This updated harness sets mock time at the beginning of each iteration and deletes the private key file at the end of each iteration. Mock time is used to make the fuzz test more stable, as `GetTime` is called at points in `i2p`. Deleting the private key file ensures that each iteration is independent from the last. Now, a new key is generated in `i2p` every time, so the fuzzer can eventually make progress through the target code.

  Re-working this harness also led me and dergoegge to resolve a couple of issues in `FuzzedSock`, which allows for full coverage of the `i2p` code. Those changes can be seen in #30211.

  The SAM protocol for interacting with I2P requires some specifc inputs so it's best to use a dictionary when running this harness.

  <details>
  <summary>I2P dict</summary>

  ```
  "HELLO VERSION"
  "HELLO REPLY RESULT=OK VERSION="
  "HELLO REPLY RESULT=NOVERSION"
  "HELLO REPLY RESULT=I2P_ERROR"
  "SESSION CREATE"
  "SESSION STATUS RESULT=OK DESTINATION="
  "SESSION STATUS RESULT=DUPLICATED_ID"
  "SESSION STATUS RESULT=DUPLICATED_DEST"
  "SESSION STATUS RESULT=INVALID_ID"
  "SESSION STATUS RESULT=INVALID_KEY"
  "SESSION STATUS RESULT=I2P_ERROR MESSAGE="
  "SESSION ADD"
  "SESSION REMOVE"
  "STREAM CONNECT"
  "STREAM STATUS RESULT=OK"
  "STREAM STATUS RESULT=INVALID_ID"
  "STREAM STATUS RESULT=INVALID_KEY"
  "STREAM STATUS RESULT=CANT_REACH_PEER"
  "STREAM STATUS RESULT=I2P_ERROR MESSAGE="
  "STREAM ACCEPT"
  "STREAM FORWARD"
  "DATAGRAM SEND"
  "RAW SEND"
  "DEST GENERATE"
  "DEST REPLY PUB= PRIV="
  "DEST REPLY RESULT=I2P_ERROR"
  "NAMING LOOKUP"
  "NAMING REPLY RESULT=OK NAME= VALUE="
  "DATAGRAM RECEIVED DESTINATION= SIZE="
  "RAW RECEIVED SIZE="
  "NAMING REPLY RESULT=INVALID_KEY NAME="
  "NAMING REPLY RESULT=KEY_NOT_FOUND NAME="
  "MIN"
  "MAX"
  "STYLE"
  "ID"
  "SILENT"
  "DESTINATION"
  "NAME"
  "SIGNATURE_TYPE"
  "CRYPTO_TYPE"
  "SIZE"
  "HOST"
  "PORT"
  "FROM_PORT"
  "TRANSIENT"
  "STREAM"
  "DATAGRAM"
  "RAW"
  "MASTER"
  "true"
  "false"
  ```

  </details>

  I'll add this dict to qa-assets later on.

ACKs for top commit:
  dergoegge:
    tACK 193c748
  brunoerg:
    ACK 193c748
  vasild:
    ACK 193c748

Tree-SHA512: 09ae4b3fa0738aa6f159f4d920493bdbce786b489bc8148e7a135a881e9dba93d727b40f5400c9510e218dd2cfdccc7ce2d3ac9450654fb29c78aac59af92ec3
fanquake added a commit that referenced this pull request Jul 1, 2024
… read

4d81b4d fuzz: FuzzedSock::Recv() don't lose bytes from MSG_PEEK read (Vasil Dimov)
b51d75e fuzz: simplify FuzzedSock::m_peek_data (Vasil Dimov)

Pull request description:

  Problem:

  If `FuzzedSock::Recv(N, MSG_PEEK)` is called then `N` bytes would be
  retrieved from the fuzz provider, saved in `m_peek_data` and returned
  to the caller (ok).

  If after this `FuzzedSock::Recv(M, 0)` is called where `M < N`
  then the first `M` bytes from `m_peek_data` would be returned
  to the caller (ok), but the remaining `N - M` bytes in `m_peek_data`
  would be discarded/lost (not ok). They must be returned by a subsequent
  `Recv()`.

  To resolve this, only remove the head `N` bytes from `m_peek_data`.

  ---

  This is a followup to #30211, more specifically:

  #30211 (comment)
  #30211 (comment)

ACKs for top commit:
  marcofleon:
    ACK 4d81b4d. Tested this with the I2P fuzz target and there's no loss in coverage. I think overall this is an improvement in the robustness of `Recv` in `FuzzedSock`.
  dergoegge:
    Code review ACK 4d81b4d
  brunoerg:
    utACK 4d81b4d

Tree-SHA512: 73b5cb396784652447874998850e45899e8cba49dcd2cc96b2d1f63be78e48201ab88a76cf1c3cb880abac57af07f2c65d673a1021ee1a577d0496c3a4b0c5dd
PastaPastaPasta pushed a commit to DashCoreAutoGuix/dash that referenced this pull request May 16, 2025
193c748 fuzz: add I2P harness (marcofleon)

Pull request description:

  Addresses bitcoin#28803. This updated harness sets mock time at the beginning of each iteration and deletes the private key file at the end of each iteration. Mock time is used to make the fuzz test more stable, as `GetTime` is called at points in `i2p`. Deleting the private key file ensures that each iteration is independent from the last. Now, a new key is generated in `i2p` every time, so the fuzzer can eventually make progress through the target code.

  Re-working this harness also led me and dergoegge to resolve a couple of issues in `FuzzedSock`, which allows for full coverage of the `i2p` code. Those changes can be seen in bitcoin#30211.

  The SAM protocol for interacting with I2P requires some specifc inputs so it's best to use a dictionary when running this harness.

  <details>
  <summary>I2P dict</summary>

  ```
  "HELLO VERSION"
  "HELLO REPLY RESULT=OK VERSION="
  "HELLO REPLY RESULT=NOVERSION"
  "HELLO REPLY RESULT=I2P_ERROR"
  "SESSION CREATE"
  "SESSION STATUS RESULT=OK DESTINATION="
  "SESSION STATUS RESULT=DUPLICATED_ID"
  "SESSION STATUS RESULT=DUPLICATED_DEST"
  "SESSION STATUS RESULT=INVALID_ID"
  "SESSION STATUS RESULT=INVALID_KEY"
  "SESSION STATUS RESULT=I2P_ERROR MESSAGE="
  "SESSION ADD"
  "SESSION REMOVE"
  "STREAM CONNECT"
  "STREAM STATUS RESULT=OK"
  "STREAM STATUS RESULT=INVALID_ID"
  "STREAM STATUS RESULT=INVALID_KEY"
  "STREAM STATUS RESULT=CANT_REACH_PEER"
  "STREAM STATUS RESULT=I2P_ERROR MESSAGE="
  "STREAM ACCEPT"
  "STREAM FORWARD"
  "DATAGRAM SEND"
  "RAW SEND"
  "DEST GENERATE"
  "DEST REPLY PUB= PRIV="
  "DEST REPLY RESULT=I2P_ERROR"
  "NAMING LOOKUP"
  "NAMING REPLY RESULT=OK NAME= VALUE="
  "DATAGRAM RECEIVED DESTINATION= SIZE="
  "RAW RECEIVED SIZE="
  "NAMING REPLY RESULT=INVALID_KEY NAME="
  "NAMING REPLY RESULT=KEY_NOT_FOUND NAME="
  "MIN"
  "MAX"
  "STYLE"
  "ID"
  "SILENT"
  "DESTINATION"
  "NAME"
  "SIGNATURE_TYPE"
  "CRYPTO_TYPE"
  "SIZE"
  "HOST"
  "PORT"
  "FROM_PORT"
  "TRANSIENT"
  "STREAM"
  "DATAGRAM"
  "RAW"
  "MASTER"
  "true"
  "false"
  ```

  </details>

  I'll add this dict to qa-assets later on.

ACKs for top commit:
  dergoegge:
    tACK 193c748
  brunoerg:
    ACK 193c748
  vasild:
    ACK 193c748

Tree-SHA512: 09ae4b3fa0738aa6f159f4d920493bdbce786b489bc8148e7a135a881e9dba93d727b40f5400c9510e218dd2cfdccc7ce2d3ac9450654fb29c78aac59af92ec3
@bitcoin bitcoin locked and limited conversation to collaborators Jun 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants