modules/zstd: add frame header parser#1168
Conversation
f1417d6 to
63bb702
Compare
|
@proppy we updated the code, please take a look:
C++ tests generate 250 test cases with pseudorandom valid FrameHeaders. Test cases run in parallel through Please do note that this branch is based on #1167 and it also includes commits cherry-picked from #1166. Please ignore those when reviewing this PR. EDIT: I also added next batch of parallelized tests. Those generate random vectors of bytes of length from 0 to 32 bytes and then run those vectors through |
3ae186d to
e6dc8d2
Compare
| strip_prefix = "zstd-1.4.7", | ||
| urls = ["https://github.com/facebook/zstd/releases/download/v1.4.7/zstd-1.4.7.tar.gz"], | ||
| build_file = "@//dependency_support/com_github_facebook_zstd:bundled.BUILD.bazel", | ||
| patches = ["@com_google_xls//dependency_support/com_github_facebook_zstd:decodecorpus.patch"], |
There was a problem hiding this comment.
can you add a comment about the patch? has it been submitted upstream?
There was a problem hiding this comment.
Added the comment.
Changes were not submitted upstream yet. The patch modifies decodecorpus utility from zstd library. Unmodified tool allows generating valid zstd frames with randomized contents and size. With our modifications it is possibile to generate only some parts of the zstd frame. In this case we modify decodecorpus so that it allows generating only frame headers.
| args[3] = "-p" + std::string(output_path); | ||
|
|
||
| XLS_ASSIGN_OR_RETURN(auto result, CallDecodecorpus(args)); | ||
| auto raw_data = ReadFileAsRawData(output_path); |
There was a problem hiding this comment.
any reason we need to use a real file, rather than handling in memory buffer returned by libstd?
There was a problem hiding this comment.
This was not possible in original decodecorpus and we wanted to provide the minimal changes required for generating only frame headers.
There was a problem hiding this comment.
Would it be a "big change" to patch decodecorpus further so that it exposes its intermediate functions (IIRC they are currently static)?
We could remove the main from the XLS build and directly call them and consume their outputs (rather than dealing with processes and files).
There was a problem hiding this comment.
I believe this was discussed with you on one of the meetings. We settled on leaving decodecorpus as it is.
|
|
||
| TEST_P(ZSTDFrameHeaderSeededTest, ParseMultipleFrameHeaders) { | ||
| auto seed = GetParam(); | ||
| auto frame_header = zstd::GenerateFrameHeader(seed, false); |
There was a problem hiding this comment.
is this redundant to what we're doing in GenerateRandomFrameHeader or does it have more coverage?
There was a problem hiding this comment.
GenerateFrameHeader() calls decodecorpus in order to generate frame header which is valid in case of zstd frame header specification. The contents and length of those frame headers are randomized but the header is always valid - it should always be correctly parsed both by libzstd and by our decoder (frame header parser).
GenerateRandomFrameHeader() generates completely random vector of bytes of randomly picked size from range 0 to arbitrary max size of 32 bytes. With this generator we are able to test more 'negative' test cases, meaning:
- Not enough data in buffer for finishing parsing
- Corrupted header - e.g.
reservedbit inFrame_Header_Descriptoris set - Handling of buffers larger than bare minimum for parsing frame header
There was a problem hiding this comment.
Make sense, thanks for the explanation!
Would it makes sense to add more docstring to the tests and helpers so that it's more obvious to future reader?
| true, true); | ||
| } | ||
|
|
||
| std::vector<uint8_t> GenerateRandomFrameHeader() { |
There was a problem hiding this comment.
I wonder if #476 would help doing this kind of things?
There was a problem hiding this comment.
This kind of testing approach would be great if we didn't have decodecorpus which always generate valid zstd frames/frame headers. We could say that decodecorpus has embedded in itself the properties that describe valid frame header.
Without it we would have to manually specify those properties in form of multiple RC_ASSERT() statements in rapidcheck test cases in order to generate valid test data. I think that when it comes to 'positive' tests (comparing against valid frame headers) it is better to use decodecorpus.
However, I'm not sure how would it look like for 'negative' tests (generating potentially invalid frame header data and comparing parsing through libzstd against simulation of our decoder).
Frame Header parsing should fail basically only in 2 situations:
- there was not enough data to finish the parsing
- reserved bit in frame header descriptor was set
Currently we generate random test vectors in order to check if decoder fails when those situations occur.
There was a problem hiding this comment.
maybe something like https://github.com/google/fuzztest could be useful; I'd be curious to know what @ericastor as a migrated a lot of "random generation tests" to fuzztest in 1cfc4cc.
There was a problem hiding this comment.
I agree that using GoogleFuzzTest here would probably result in better coverage - though it might end up discovering some valid headers!
There was a problem hiding this comment.
Thanks for the suggestion, I've replaced generating random frame headers with FuzzTest. It is true that it can discover valid headers. We can work on constraining the tested domain further to precisely test only invalid frame headers but I believe it is better to do a general check with valid and invalid headers.
600d73f to
95685e8
Compare
95685e8 to
d7e203c
Compare
77a2ce2 to
c114b98
Compare
|
Fixed errors in fuzz tests caused by differences in priority of reporting errors between |
c114b98 to
ce6194d
Compare
ce6194d to
1236ea3
Compare
|
I've noticed a failure in one of the CI runs caused by Here is the failure log: I've already investigated the issue and managed to fix it. It was caused by not discarding ZSTD frames with EDIT: Updated the PR with fixes |
1236ea3 to
5c44314
Compare
fb2294d to
070089f
Compare
070089f to
bdacd4d
Compare
This commit adds a DSLX Buffer library that provides the Buffer struct, and helper functions that can be used to operate on it. The Buffer is meant to be a storage for data coming from the channel. It acts like a FIFO, allowing data of any length to be put in or popped out of it. Provided DSLX tests verify the correct behaviour of the library. Internal-tag: [#50221] Signed-off-by: Robert Winkler <[email protected]>
This commit adds a simple test that shows, how one can use the Buffer struct inside a Proc. Internal-tag: [#50221] Signed-off-by: Robert Winkler <[email protected]>
Signed-off-by: Pawel Czarnecki <[email protected]>
This commit adds the library with functions for parsing a magic number and tests that verify its correctness. Internal-tag: [#50221] Signed-off-by: Robert Winkler <[email protected]>
This commit adds the library with functions for parsing a frame header. The provided tests verify the correcness of the library. Internal-tag: [#49967] Co-authored-by: Roman Dobrodii <[email protected]> Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]>
Internal-tag: [#53329] Signed-off-by: Pawel Czarnecki <[email protected]>
Required for expected_status inference in C++ tests for ZSTD decoder components Internal-tag: [#53465] Signed-off-by: Pawel Czarnecki <[email protected]>
Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]>
This commit adds a binary that calls decoding to generate data and loads it into a vector of bytes. Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]>
Internal-tag: [#50967] Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]>
bdacd4d to
e25cbe1
Compare
google#1168 modules/zstd: Add library for parsing magic number This commit adds the library with functions for parsing a magic number and tests that verify its correctness. Internal-tag: [#50221] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add library for parsing frame header This commit adds the library with functions for parsing a frame header. The provided tests verify the correcness of the library. Internal-tag: [#49967] Co-authored-by: Roman Dobrodii <[email protected]> Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]> modules/zstd/frame_header: Add benchmarking rules Internal-tag: [#53329] Signed-off-by: Pawel Czarnecki <[email protected]> dependency_support/libzstd: Make zstd_errors.h public Required for expected_status inference in C++ tests for ZSTD decoder components Internal-tag: [#53465] Signed-off-by: Pawel Czarnecki <[email protected]> dependency_support: Add decodecorpus binary Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add data generator library This commit adds a binary that calls decoding to generate data and loads it into a vector of bytes. Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add zstd frame header tests Internal-tag: [#50967] Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]>
google#1168 modules/zstd: Add library for parsing magic number This commit adds the library with functions for parsing a magic number and tests that verify its correctness. Internal-tag: [#50221] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add library for parsing frame header This commit adds the library with functions for parsing a frame header. The provided tests verify the correcness of the library. Internal-tag: [#49967] Co-authored-by: Roman Dobrodii <[email protected]> Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]> modules/zstd/frame_header: Add benchmarking rules Internal-tag: [#53329] Signed-off-by: Pawel Czarnecki <[email protected]> dependency_support/libzstd: Make zstd_errors.h public Required for expected_status inference in C++ tests for ZSTD decoder components Internal-tag: [#53465] Signed-off-by: Pawel Czarnecki <[email protected]> dependency_support: Add decodecorpus binary Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add data generator library This commit adds a binary that calls decoding to generate data and loads it into a vector of bytes. Internal-tag: [#50967] Signed-off-by: Robert Winkler <[email protected]> modules/zstd: Add zstd frame header tests Internal-tag: [#50967] Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]>
|
should we close this and focus on reviewing #1315 ? |
This PR adds Zstd frame header parsing machinery written in DSLX + tests for it.
NOTE: this is based on #1167 , please ignore commits from that branch when reviewing.