Skip to content

Zstd multithreaded output can depend on number of threads #2327

@terrelln

Description

@terrelln

Describe the bug
As reported by @animalize in Issue #2238:

When using ZSTD_e_end end directive and output buffer size >= ZSTD_compressBound() the job number is calculated by ZSTDMT_computeNbJobs() function. This function produces a different number of jobs depending on nbWorkers:

static unsigned
ZSTDMT_computeNbJobs(const ZSTD_CCtx_params* params, size_t srcSize, unsigned nbWorkers)
{
assert(nbWorkers>0);
{ size_t const jobSizeTarget = (size_t)1 << ZSTDMT_computeTargetJobLog(params);
size_t const jobMaxSize = jobSizeTarget << 2;
size_t const passSizeMax = jobMaxSize * nbWorkers;
unsigned const multiplier = (unsigned)(srcSize / passSizeMax) + 1;
unsigned const nbJobsLarge = multiplier * nbWorkers;
unsigned const nbJobsMax = (unsigned)(srcSize / jobSizeTarget) + 1;
unsigned const nbJobsSmall = MIN(nbJobsMax, nbWorkers);
return (multiplier>1) ? nbJobsLarge : nbJobsSmall;
} }

Expected behavior
The output of zstd multithreaded compression must be independent of the number of threads.

Fix

  • Make ZSTDMT_computeNbJobs() independent of nbWorkers.
  • Add a fuzz test that checks that the output of multithreaded zstd is always independent of the number of threads.

Workaround
If you need to work around this bug, don't start your streaming job with ZSTD_e_end. Pass at least one byte of input with ZSTD_e_continue before calling ZSTD_e_end, or ensure your output buffer is < ZSTD_compressBound(inputSize).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions