Skip to content

Row hash tag space initialization speed regression #3528

@yoniko

Description

@yoniko

This issue will be used to track the work that has been started in #2971 and #3426 and further work.

Context
Row hash is a fast SIMD-based hash used by various strategies in Zstd.
Other than the normal hash entries it requires an additional space for tags that are hash based and allow further filtration of entries in a bucket.

When streaming data of unknown size (for example, using ZSTD_compressStream) we don't have a good way to choose a hashlog and so we pick a large one. This, in turn, makes it so we need to initialize a large tag space.
This creates a noticeable regression when compressing small inputs.

A few attempts have been made to fix this, #2971 just removes the initialization but is problematic with Valgrind and might introduce another regression due to the consecutive compressions getting "false positives" from previous compressions' tags.

#3426 expands on #2971 and introduces memory regions that have been initialized at least once (thus not triggering Valgrind) and salts the hash to avoid collisions. However, it was rather complex.

Intended solution
Break down #3426 into multiple PRs and possibly remove some of the functionality introduced there.
This is a grandfather issue so we can tie the broken-down PRs together.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions