-
Notifications
You must be signed in to change notification settings - Fork 1.4k
use LZ4 as default compression mode
#7797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on using the most efficient compression mode as default.
I have some concern on changing the default to L4Z_LENGTH_PREFIXED because it is recently added and can potentially cause problem if segment creator gets the new version first.
Let's wait after 0.9 is released and add this in 0.10.0. We should also consider changing the default writer version to 4 which is much more efficient than 2
The V4 chunk writer automatically upgrades |
647353b to
6b06c65
Compare
LZ4_LENGTH_PREFIXED as default compression modeLZ4 as default compression mode
…st coverage where SNAPPY was assumed previously
6b06c65 to
7d65dfb
Compare
Codecov Report
@@ Coverage Diff @@
## master #7797 +/- ##
============================================
- Coverage 71.70% 65.20% -6.51%
- Complexity 4078 4083 +5
============================================
Files 1578 1533 -45
Lines 80777 78907 -1870
Branches 12001 11798 -203
============================================
- Hits 57925 51449 -6476
- Misses 18961 23797 +4836
+ Partials 3891 3661 -230
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
Nice. When we had added support for ZSTD and LZ4 compression types in #6804, our perf testing also had similar results. LZ4 gives nearly same (and even better sometimes) the compression ratio compared to SNAPPY and the time to compress is much faster. ZSTD on the other hand as highest compression ratio (from our tests) but compression/decompression time was higher as well Here are the results for reference (I think these were addd to JMH benchmarks as well) https://docs.google.com/document/d/1ie4JmVofk2o6ryCX7T4aD7GsRPKS651mroXhDk41wys/edit |
This is consistent with my experience, but it’s worth noting LZ4 has 16 compression levels which trade speed for ratio, and we default to level 1, whereas beyond level 3 a high compression mode is used - if ratio is really important the level could be exposed as a config parameter. But right now the motivation is faster decompression by default, rather than smaller file sizes. |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LZ4everywhere, except when constructing readers for v1 chunks whereSNAPPYwas implicitSNAPPYwas assumed to make sure things still work with other compression settings.LZ4#7795