[bug-fix] Fix rare corruption bug affecting the block splitter#3517
Merged
terrelln merged 1 commit intofacebook:devfrom Feb 23, 2023
Merged
[bug-fix] Fix rare corruption bug affecting the block splitter#3517terrelln merged 1 commit intofacebook:devfrom
terrelln merged 1 commit intofacebook:devfrom
Conversation
8e13002 to
cbeb60f
Compare
embg
reviewed
Feb 22, 2023
Contributor
|
Thanks a lot, am I right it was introduced with zstd 1.5.0 when blocksplitter was released? |
Contributor
Author
Correct, the first affected version was zstd v1.5.0. |
The block splitter confuses sequences with literal length == 65536 that use a repeat offset code. It interprets this as literal length == 0 when deciding the meaning of the repeat offset, and corrupts the repeat offset history. This is benign, merely causing suboptimal compression performance, if the confused history is flushed before the end of the block, e.g. if there are 3 consecutive non-repeat code sequences after the mistake. It also is only triggered if the block splitter decided to split the block. All that to say: This is a rare bug, and requires quite a few conditions to trigger. However, the good news is that if you have a way to validate that the decompressed data is correct, e.g. you've enabled zstd's checksum or have a checksum elsewhere, the original data is very likely recoverable. So if you were affected by this bug please reach out. The fix is to remind the block splitter that the literal length is actually 64K. The test case is a bit tricky to set up, but I've managed to reproduce the issue. Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!
cbeb60f to
07ae766
Compare
embg
approved these changes
Feb 23, 2023
Contributor
|
@terrelln, could you tag a release version with this fix? |
Contributor
|
A |
|
@terrelln could you please share with us a snippet that reproduces this bug? I'd like to use it to check whether another zstd implementation (https://github.com/klauspost/compress/tree/master/zstd to be specific) has the same issue. |
ligurio
added a commit
to ligurio/nanodata
that referenced
this pull request
Mar 2, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version. The new version fixes a rare bug that was introduced in 1.5.0 [1]. A v1.5.5 release version will be produced in March. 1. facebook/zstd#3517 Fixes tarantool#8391 NO_DOC=build NO_TEST=build
locker
pushed a commit
to ligurio/nanodata
that referenced
this pull request
Mar 2, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version. The new version fixes a rare bug that was introduced in 1.5.0 [1]. A v1.5.5 release version will be produced in March. 1. facebook/zstd#3517 Fixes tarantool#8391 NO_DOC=build NO_TEST=build
locker
pushed a commit
to ligurio/nanodata
that referenced
this pull request
Mar 2, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version. The new version fixes a rare bug that was introduced in 1.5.0 [1]. A v1.5.5 release version will be produced in March. 1. facebook/zstd#3517 Fixes tarantool#8391 NO_DOC=build NO_TEST=build
ligurio
added a commit
to ligurio/nanodata
that referenced
this pull request
Mar 2, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version. The new version fixes a rare bug that was introduced in 1.5.0 [1]. A v1.5.5 release version will be produced in March. 1. facebook/zstd#3517 Fixes tarantool#8391 NO_DOC=build NO_TEST=build
locker
pushed a commit
to tarantool/tarantool
that referenced
this pull request
Mar 3, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version. The new version fixes a rare bug that was introduced in 1.5.0 [1]. A v1.5.5 release version will be produced in March. 1. facebook/zstd#3517 Fixes #8391 NO_DOC=build NO_TEST=build
locker
pushed a commit
to tarantool/tarantool
that referenced
this pull request
Mar 3, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version. The new version fixes a rare bug that was introduced in 1.5.0 [1]. A v1.5.5 release version will be produced in March. 1. facebook/zstd#3517 Fixes #8391 NO_DOC=build NO_TEST=build (cherry picked from commit ecbbc92)
james-rms
added a commit
to ros2/rosbag2
that referenced
this pull request
Apr 5, 2023
Facebook recently published a patch update to fix a data corruption bug in their zstd compression library. See the release notes here: https://github.com/facebook/zstd/releases/tag/v1.5.5 See also the bug-fix patch here: facebook/zstd#3517 Signed-off-by: James Smith <[email protected]>
herbygillot
added a commit
to macports/macports-ports
that referenced
this pull request
Apr 5, 2023
- remove rare corruption patch, as fix has been merged upstream (facebook/zstd#3517)
bob-beck
pushed a commit
to openbsd/ports
that referenced
this pull request
Apr 5, 2023
This release corrects a corruption bug in high compression mode. More information can be found at facebook/zstd#3517. Overview on other changes can be found at https://github.com/facebook/zstd/releases/tag/v1.5.5. Minor of SHLIB has been bumped because of addition of new symbols.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TLDR: This is a very rare bug affecting the block splitter. Only levels using the optimal parser (levels 13 and above depending on the source size), or users who explicitly enable the block splitter, are affected.
The block splitter confuses sequences with literal length == 65536 that use a repeat offset code. It interprets this as literal length == 0 when deciding the meaning of the repeat offset, and corrupts the repeat offset history. This is benign, merely causing suboptimal compression performance, if the confused history is flushed before the end of the block, e.g. if there are 3 consecutive non-repeat code sequences after the mistake. It also is only triggered if the block splitter decided to split the block.
All that to say: This is a rare bug, and requires quite a few conditions to trigger. However, the good news is that if you have a way to validate that the decompressed data is correct, e.g. you've enabled zstd's checksum or have a checksum elsewhere, the original data is very likely recoverable. So if you were affected by this bug please reach out.
The fix is to remind the block splitter that the literal length is actually 64K. The test case is a bit tricky to set up, but I've managed to reproduce the issue.
Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!