Skip to content

Conversation

@richardstartin
Copy link
Member

@richardstartin richardstartin commented Nov 18, 2021

  • adds a simple benchmark to compare compression modes (and chunk format v3 vs v4)
  • defaults to LZ4 everywhere, except when constructing readers for v1 chunks where SNAPPY was implicit
  • Parameterises a few tests where SNAPPY was assumed to make sure things still work with other compression settings.
  • See Change default raw index compression format to LZ4 #7795

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on using the most efficient compression mode as default.

I have some concern on changing the default to L4Z_LENGTH_PREFIXED because it is recently added and can potentially cause problem if segment creator gets the new version first.
Let's wait after 0.9 is released and add this in 0.10.0. We should also consider changing the default writer version to 4 which is much more efficient than 2

@richardstartin
Copy link
Member Author

I have some concern on changing the default to L4Z_LENGTH_PREFIXED because it is recently added and can potentially cause problem if segment creator gets the new version first.
Let's wait after 0.9 is released and add this in 0.10.0. We should also consider changing the default writer version to 4 which is much more efficient than 2

The V4 chunk writer automatically upgrades LZ4 to LZ4_LENGTH_PREFIXED so I could just change it LZ4 instead

@richardstartin richardstartin changed the title use LZ4_LENGTH_PREFIXED as default compression mode use LZ4 as default compression mode Nov 18, 2021
…st coverage where SNAPPY was assumed previously
@codecov-commenter
Copy link

codecov-commenter commented Nov 18, 2021

Codecov Report

Merging #7797 (7d65dfb) into master (b809043) will decrease coverage by 6.50%.
The diff coverage is 50.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7797      +/-   ##
============================================
- Coverage     71.70%   65.20%   -6.51%     
- Complexity     4078     4083       +5     
============================================
  Files          1578     1533      -45     
  Lines         80777    78907    -1870     
  Branches      12001    11798     -203     
============================================
- Hits          57925    51449    -6476     
- Misses        18961    23797    +4836     
+ Partials       3891     3661     -230     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 68.68% <50.00%> (+0.05%) ⬆️
unittests2 14.56% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rg/apache/pinot/core/minion/RawIndexConverter.java 0.00% <0.00%> (-56.61%) ⬇️
...ment/creator/impl/SegmentColumnarIndexCreator.java 84.72% <100.00%> (ø)
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/SegmentReloadMessage.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/core/data/manager/realtime/TimerService.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/minion/exception/TaskCancelledException.java 0.00% <0.00%> (-100.00%) ⬇️
...not/common/exception/HttpErrorStatusException.java 0.00% <0.00%> (-100.00%) ⬇️
... and 360 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b809043...7d65dfb. Read the comment docs.

@siddharthteotia
Copy link
Contributor

siddharthteotia commented Nov 19, 2021

Nice. When we had added support for ZSTD and LZ4 compression types in #6804, our perf testing also had similar results.

LZ4 gives nearly same (and even better sometimes) the compression ratio compared to SNAPPY and the time to compress is much faster.

ZSTD on the other hand as highest compression ratio (from our tests) but compression/decompression time was higher as well

Here are the results for reference (I think these were addd to JMH benchmarks as well) https://docs.google.com/document/d/1ie4JmVofk2o6ryCX7T4aD7GsRPKS651mroXhDk41wys/edit

@richardstartin
Copy link
Member Author

richardstartin commented Nov 19, 2021

LZ4 gives nearly same (and even better sometimes) the compression ratio compared to SNAPPY and the time to compress is much faster.

ZSTD on the other hand as highest compression ratio (from our tests) but compression/decompression time was higher as well

This is consistent with my experience, but it’s worth noting LZ4 has 16 compression levels which trade speed for ratio, and we default to level 1, whereas beyond level 3 a high compression mode is used - if ratio is really important the level could be exposed as a config parameter. But right now the motivation is faster decompression by default, rather than smaller file sizes.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jackie-Jiang Jackie-Jiang merged commit 6b33448 into apache:master Nov 20, 2021
kriti-sc pushed a commit to kriti-sc/incubator-pinot that referenced this pull request Dec 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants