Merged
Conversation
Use 5 byte hash instead of 4 byte hash. This improves compression in most cases and will also yield faster decompression. Little to no performance impact. Before/after: ``` file out level insize outsize millis nyc-taxi-data-10M.csv gzkp 1 3325605752 922273214 14065 225.49 nyc-taxi-data-10M.csv gzkp 1 3325605752 846471964 14564 217.76 nyc-taxi-data-10M.csv gzkp 2 3325605752 883782053 15683 202.22 nyc-taxi-data-10M.csv gzkp 2 3325605752 815766227 15057 210.63 nyc-taxi-data-10M.csv gzkp 3 3325605752 878726683 17308 183.24 nyc-taxi-data-10M.csv gzkp 3 3325605752 807241782 17184 184.56 nyc-taxi-data-10M.csv gzkp 4 3325605752 789447233 20651 153.57 nyc-taxi-data-10M.csv gzkp 4 3325605752 789447233 20862 152.02 file out level insize outsize millis mb/s enwik9 gzkp 1 1000000000 382781160 5713 166.90 enwik9 gzkp 1 1000000000 374131553 5926 160.90 enwik9 gzkp 2 1000000000 371351753 6131 155.55 enwik9 gzkp 2 1000000000 361881529 6007 158.74 enwik9 gzkp 3 1000000000 364881746 6891 138.39 enwik9 gzkp 3 1000000000 355065173 7043 135.39 enwik9 gzkp 4 1000000000 342732211 8339 114.36 enwik9 gzkp 4 1000000000 342732211 8327 114.52 ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Use 5 byte hash instead of 4 byte hash.
This improves compression in most cases and will also yield faster decompression. Little to no performance impact.
Before/after: