Skip to content

Add prefetchCDictTables CCtxParam (+10-20% cold dict compression speed)#3177

Merged
embg merged 5 commits intofacebook:devfrom
embg:dms_prefetch2
Jun 24, 2022
Merged

Add prefetchCDictTables CCtxParam (+10-20% cold dict compression speed)#3177
embg merged 5 commits intofacebook:devfrom
embg:dms_prefetch2

Conversation

@embg
Copy link
Contributor

@embg embg commented Jun 23, 2022

Summary

  • Adds a new CCtxParam (prefetchCDictTables)
  • Exposes it in the largeNbDicts benchmark
  • Adds it to fuzzers

Description of the optimization

In some situations, zstd uses CDict tables in-place rather than copying them into the working context. (See docs on ZSTD_dictAttachPref_e for details). In such situations, compression speed is seriously impacted when CDict tables are "cold" (outside CPU cache).

This PR adds a CCtxParam (prefetchCDictTables) which instructs zstd to prefetch CDict tables when they are used in-place (specifically in level 1-4 dictMatchState matchfinders). For sufficiently small inputs, the cost of the prefetch will outweigh the benefit. For sufficiently large inputs, zstd will by default memcpy() CDict tables into the working context, so there is no need to prefetch. This parameter is targeted at a middle range of input sizes, where a prefetch is cheap enough to be useful but memcpy() is too expensive.

The exact range of input sizes where this makes sense is best determined by careful experimentation (see below for measurements on one particular machine / dataset which demonstrate 10-20% wins for a particular working set size and input size). Rather than enabling this param for all inputs, the code which calls ZSTD_compress2() should use a size cutoff (tuned via experimentation) to select the best prefetch strategy for each input.

Measurements

I measured the effect of this param on the HTML dataset. I benchmarked on a Intel(R) Xeon(R) D-2191A CPU @ 1.60GHz machine with core isolation and turbo disabled.

newplot (18)

We can see that the param is harmful for level3 even in the cold CDict scenario if the inputs are small enough (0-8K). For larger inputs (8-16K) at the same level, we see up to 20% wins. This demonstrates the need for selective application of this param.

Copy link
Contributor

@felixhandte felixhandte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple nits.

Copy link
Contributor

@felixhandte felixhandte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming tests pass, ship it!

@embg embg merged commit e9d6fc8 into facebook:dev Jun 24, 2022
@embg embg deleted the dms_prefetch2 branch June 24, 2022 15:24
@Cyan4973 Cyan4973 mentioned this pull request Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants