-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Labels
Description
Describe the bug
Training on a big sample set, first zstd attempts to load everything into memory, then it notices memory is not enough and only loads part of it, and then fails anyway because the sample size is over 4GB.
> zstd --train-cover split -r --maxdict 262144 -o split-zstd-dict/dictionary
Not enough memory; training on 14562 MB only...
Trying 82 different sets of parameters
Total samples size is too large (14562 MB), maximum size is 4095 MB
Failed to initialize context
dictionary training failed : Src size is incorrect
To Reproduce
- Sample size: ~1.5m files, 32KB/file
- Run the command in the listing above
Expected behavior
It would make much more sense to at least just load 4GB worth of samples, print a warning, ignore the rest and train on that.
(Ideally, training could proceed incrementally and not require loading all samples in memory at the same time. This would obviously also make the problem described here go away)
Desktop (please complete the following information):
- OS: Mac (x86-64)
- Version zstd 1.5.0 (via homebrew)
Reactions are currently unavailable