-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Labels
Description
Describe the bug
As reported by @animalize in Issue #2238:
When using ZSTD_e_end end directive and output buffer size >= ZSTD_compressBound() the job number is calculated by ZSTDMT_computeNbJobs() function. This function produces a different number of jobs depending on nbWorkers:
zstd/lib/compress/zstdmt_compress.c
Lines 1243 to 1255 in b706286
| static unsigned | |
| ZSTDMT_computeNbJobs(const ZSTD_CCtx_params* params, size_t srcSize, unsigned nbWorkers) | |
| { | |
| assert(nbWorkers>0); | |
| { size_t const jobSizeTarget = (size_t)1 << ZSTDMT_computeTargetJobLog(params); | |
| size_t const jobMaxSize = jobSizeTarget << 2; | |
| size_t const passSizeMax = jobMaxSize * nbWorkers; | |
| unsigned const multiplier = (unsigned)(srcSize / passSizeMax) + 1; | |
| unsigned const nbJobsLarge = multiplier * nbWorkers; | |
| unsigned const nbJobsMax = (unsigned)(srcSize / jobSizeTarget) + 1; | |
| unsigned const nbJobsSmall = MIN(nbJobsMax, nbWorkers); | |
| return (multiplier>1) ? nbJobsLarge : nbJobsSmall; | |
| } } |
Expected behavior
The output of zstd multithreaded compression must be independent of the number of threads.
Fix
- Make
ZSTDMT_computeNbJobs()independent ofnbWorkers. - Add a fuzz test that checks that the output of multithreaded zstd is always independent of the number of threads.
Workaround
If you need to work around this bug, don't start your streaming job with ZSTD_e_end. Pass at least one byte of input with ZSTD_e_continue before calling ZSTD_e_end, or ensure your output buffer is < ZSTD_compressBound(inputSize).
Reactions are currently unavailable