-
Notifications
You must be signed in to change notification settings - Fork 38.7k
fuzz: Make FuzzedSock fuzz friendlier #30211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…k::Recv See comment on FuzzedDataProvider::ConsumeRandomLengthString.
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
|
These issues were discovered by Marco (@marcofleon) and me while he was working on #28803. He will have a PR for that up soon and this PR is a prerequisite (although it also helps with other harnesses that use cc @vasild perhaps you are interested in reviewing this. |
|
Concept ACK |
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK 617d43e656816a532ca275707dffeb6e2a2e1fc7
Just one blocker below (about the overflow).
Thank you for the improvements!
src/test/fuzz/util/net.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the commit message:
FuzzedSock only supports peeking at one byte at a time, which is not
fuzzer friendly when trying to receive long data.
I am not sure how likely is this to happen, but at least it is not ruled out by the recv(2) documentation: a peek read of 10 bytes returns just 1 byte and repeating the peek read of 10 bytes returns the same 1 byte over and over again. The app can never peek past the 1 byte without actually consuming/reading that byte. I guess it must be ready for this behavior from the OS or from a mocked fuzzed socket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could only find the following in the docs and based the changes here on that.
This flag causes the receive operation to return data from
the beginning of the receive queue without removing that
data from the queue. Thus, a subsequent receive call will
return the same data.
We could make it so that a peek read only returns one byte and a subsequent normal read return N bytes where the first byte corresponds to the peek byte. But I'm not sure if that would change much for our use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make it so that a peek read only returns one byte and a subsequent normal read return N bytes where the first byte corresponds to the peek byte.
This was the case before this PR, right?
But I'm not sure if that would change much for our use case.
Where can I see that use case? Is there some code that does not work without the changes to FuzzedSock::Recv() done in this PR?
The doc you cited, in my understanding, means that what I described in the first comment of this thread is not ruled out and might happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the case before this PR, right?
No, a peek read before this PR would return 1 byte and a normal read following that would also only return that one byte (with the exception of sometimes padding with a bunch of zeros). This is what causes the fuzz input to never be interpreted as one continuous sequence of data, which is why mutations that insert pieces of data (e.g. from a dictionary) are ineffective.
Where can I see that use case? Is there some code that does not work without the changes to FuzzedSock::Recv() done in this PR?
Yes the i2p harness is unable to cover most of the target code without these changes: #30230.
617d43e to
1296e3e
Compare
FuzzedSock only supports peeking at one byte at a time, which is not fuzzer friendly when trying to receive long data. Fix this by supporting peek data of arbitrary length instead of only one byte.
Currently, when the FuzzedDataProvider of a FuzzedSock runs out of data, FuzzedSock::Wait and WaitMany will simulate endless waiting as the requested events are never simulated as occured. Fix this by simulating event occurence when ConsumeBool() returns false (e.g. when the data provider runs out). Co-authored-by: dergoegge <[email protected]>
1296e3e to
22d0f1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK 22d0f1a
In i2p the private key needs to be at least 387 bytes. When this key is generated (using FuzzedSock) the bytes were being broken up such that the length of the saved key was always less than 387. Having Recv return the length of a byte vector, instead of a single byte, when MSG_PEEK is enabled fixes this and allows the fuzzer to continue through the I2P target (#28803).
brunoerg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 22d0f1a
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was merged prematurely.
| * `Recv()` call we must return the same data, thus we remember it here. | ||
| */ | ||
| mutable std::optional<uint8_t> m_peek_data; | ||
| mutable std::optional<std::vector<uint8_t>> m_peek_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<uint8_t> can be used instead of std::optional<std::vector<uint8_t>> and if the vector is empty that is the same as the optional without a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to open a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the optional in #30273.
| // `random_bytes` might exceed the size of `buf` if e.g. Recv is called with | ||
| // len=N and MSG_PEEK first and afterwards called with len=M (M < N) and | ||
| // without MSG_PEEK. | ||
| size_t recv_len{std::min(random_bytes.size(), len)}; | ||
| std::memcpy(buf, random_bytes.data(), recv_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it will not overflow buf but it will lose the trailing N - M bytes from the peek.
To fix this, when assigning random_bytes earlier, it should only take/pop the first len bytes from m_peek_data.
I think all this may be unnecessary if the app has to work with peek only returning 1 byte, as discussed in #30211 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes were very much necessary for FuzzedSock to be fuzzer friendly. Feel free to open a follow up, if you have suggestion how further improve things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... will lose the trailing
N - Mbytes from the peek.
Fixed in #30273
193c748 fuzz: add I2P harness (marcofleon) Pull request description: Addresses #28803. This updated harness sets mock time at the beginning of each iteration and deletes the private key file at the end of each iteration. Mock time is used to make the fuzz test more stable, as `GetTime` is called at points in `i2p`. Deleting the private key file ensures that each iteration is independent from the last. Now, a new key is generated in `i2p` every time, so the fuzzer can eventually make progress through the target code. Re-working this harness also led me and dergoegge to resolve a couple of issues in `FuzzedSock`, which allows for full coverage of the `i2p` code. Those changes can be seen in #30211. The SAM protocol for interacting with I2P requires some specifc inputs so it's best to use a dictionary when running this harness. <details> <summary>I2P dict</summary> ``` "HELLO VERSION" "HELLO REPLY RESULT=OK VERSION=" "HELLO REPLY RESULT=NOVERSION" "HELLO REPLY RESULT=I2P_ERROR" "SESSION CREATE" "SESSION STATUS RESULT=OK DESTINATION=" "SESSION STATUS RESULT=DUPLICATED_ID" "SESSION STATUS RESULT=DUPLICATED_DEST" "SESSION STATUS RESULT=INVALID_ID" "SESSION STATUS RESULT=INVALID_KEY" "SESSION STATUS RESULT=I2P_ERROR MESSAGE=" "SESSION ADD" "SESSION REMOVE" "STREAM CONNECT" "STREAM STATUS RESULT=OK" "STREAM STATUS RESULT=INVALID_ID" "STREAM STATUS RESULT=INVALID_KEY" "STREAM STATUS RESULT=CANT_REACH_PEER" "STREAM STATUS RESULT=I2P_ERROR MESSAGE=" "STREAM ACCEPT" "STREAM FORWARD" "DATAGRAM SEND" "RAW SEND" "DEST GENERATE" "DEST REPLY PUB= PRIV=" "DEST REPLY RESULT=I2P_ERROR" "NAMING LOOKUP" "NAMING REPLY RESULT=OK NAME= VALUE=" "DATAGRAM RECEIVED DESTINATION= SIZE=" "RAW RECEIVED SIZE=" "NAMING REPLY RESULT=INVALID_KEY NAME=" "NAMING REPLY RESULT=KEY_NOT_FOUND NAME=" "MIN" "MAX" "STYLE" "ID" "SILENT" "DESTINATION" "NAME" "SIGNATURE_TYPE" "CRYPTO_TYPE" "SIZE" "HOST" "PORT" "FROM_PORT" "TRANSIENT" "STREAM" "DATAGRAM" "RAW" "MASTER" "true" "false" ``` </details> I'll add this dict to qa-assets later on. ACKs for top commit: dergoegge: tACK 193c748 brunoerg: ACK 193c748 vasild: ACK 193c748 Tree-SHA512: 09ae4b3fa0738aa6f159f4d920493bdbce786b489bc8148e7a135a881e9dba93d727b40f5400c9510e218dd2cfdccc7ce2d3ac9450654fb29c78aac59af92ec3
… read 4d81b4d fuzz: FuzzedSock::Recv() don't lose bytes from MSG_PEEK read (Vasil Dimov) b51d75e fuzz: simplify FuzzedSock::m_peek_data (Vasil Dimov) Pull request description: Problem: If `FuzzedSock::Recv(N, MSG_PEEK)` is called then `N` bytes would be retrieved from the fuzz provider, saved in `m_peek_data` and returned to the caller (ok). If after this `FuzzedSock::Recv(M, 0)` is called where `M < N` then the first `M` bytes from `m_peek_data` would be returned to the caller (ok), but the remaining `N - M` bytes in `m_peek_data` would be discarded/lost (not ok). They must be returned by a subsequent `Recv()`. To resolve this, only remove the head `N` bytes from `m_peek_data`. --- This is a followup to #30211, more specifically: #30211 (comment) #30211 (comment) ACKs for top commit: marcofleon: ACK 4d81b4d. Tested this with the I2P fuzz target and there's no loss in coverage. I think overall this is an improvement in the robustness of `Recv` in `FuzzedSock`. dergoegge: Code review ACK 4d81b4d brunoerg: utACK 4d81b4d Tree-SHA512: 73b5cb396784652447874998850e45899e8cba49dcd2cc96b2d1f63be78e48201ab88a76cf1c3cb880abac57af07f2c65d673a1021ee1a577d0496c3a4b0c5dd
193c748 fuzz: add I2P harness (marcofleon) Pull request description: Addresses bitcoin#28803. This updated harness sets mock time at the beginning of each iteration and deletes the private key file at the end of each iteration. Mock time is used to make the fuzz test more stable, as `GetTime` is called at points in `i2p`. Deleting the private key file ensures that each iteration is independent from the last. Now, a new key is generated in `i2p` every time, so the fuzzer can eventually make progress through the target code. Re-working this harness also led me and dergoegge to resolve a couple of issues in `FuzzedSock`, which allows for full coverage of the `i2p` code. Those changes can be seen in bitcoin#30211. The SAM protocol for interacting with I2P requires some specifc inputs so it's best to use a dictionary when running this harness. <details> <summary>I2P dict</summary> ``` "HELLO VERSION" "HELLO REPLY RESULT=OK VERSION=" "HELLO REPLY RESULT=NOVERSION" "HELLO REPLY RESULT=I2P_ERROR" "SESSION CREATE" "SESSION STATUS RESULT=OK DESTINATION=" "SESSION STATUS RESULT=DUPLICATED_ID" "SESSION STATUS RESULT=DUPLICATED_DEST" "SESSION STATUS RESULT=INVALID_ID" "SESSION STATUS RESULT=INVALID_KEY" "SESSION STATUS RESULT=I2P_ERROR MESSAGE=" "SESSION ADD" "SESSION REMOVE" "STREAM CONNECT" "STREAM STATUS RESULT=OK" "STREAM STATUS RESULT=INVALID_ID" "STREAM STATUS RESULT=INVALID_KEY" "STREAM STATUS RESULT=CANT_REACH_PEER" "STREAM STATUS RESULT=I2P_ERROR MESSAGE=" "STREAM ACCEPT" "STREAM FORWARD" "DATAGRAM SEND" "RAW SEND" "DEST GENERATE" "DEST REPLY PUB= PRIV=" "DEST REPLY RESULT=I2P_ERROR" "NAMING LOOKUP" "NAMING REPLY RESULT=OK NAME= VALUE=" "DATAGRAM RECEIVED DESTINATION= SIZE=" "RAW RECEIVED SIZE=" "NAMING REPLY RESULT=INVALID_KEY NAME=" "NAMING REPLY RESULT=KEY_NOT_FOUND NAME=" "MIN" "MAX" "STYLE" "ID" "SILENT" "DESTINATION" "NAME" "SIGNATURE_TYPE" "CRYPTO_TYPE" "SIZE" "HOST" "PORT" "FROM_PORT" "TRANSIENT" "STREAM" "DATAGRAM" "RAW" "MASTER" "true" "false" ``` </details> I'll add this dict to qa-assets later on. ACKs for top commit: dergoegge: tACK 193c748 brunoerg: ACK 193c748 vasild: ACK 193c748 Tree-SHA512: 09ae4b3fa0738aa6f159f4d920493bdbce786b489bc8148e7a135a881e9dba93d727b40f5400c9510e218dd2cfdccc7ce2d3ac9450654fb29c78aac59af92ec3
FuzzedSockhas a few issues that block a fuzzer from making progress. See commit messages for details.