Skip to content

Conversation

@amueller
Copy link
Contributor

@amueller amueller commented Dec 3, 2019

Fixes #891.
Most of the large datasets in cc-18 are actually "image" dataset with entries from 0-255. Storing them as uint8 will drastically remove the storage space required as well as storing and loading times.

This is special-casing but I think it's a common and important special case. The CIFAR-10 dataset pickle went from 1.4G to 176MB with this patch.

@amueller
Copy link
Contributor Author

amueller commented Dec 3, 2019

Is there a small dataset to test this or should I mock something? Or any other good ideas for tests?

@codecov-io
Copy link

codecov-io commented Dec 3, 2019

Codecov Report

Merging #892 into develop will decrease coverage by 0.25%.
The diff coverage is 50%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #892      +/-   ##
===========================================
- Coverage    88.57%   88.31%   -0.26%     
===========================================
  Files           37       37              
  Lines         4324     4332       +8     
===========================================
- Hits          3830     3826       -4     
- Misses         494      506      +12
Impacted Files Coverage Δ
openml/datasets/dataset.py 86.19% <50%> (-0.84%) ⬇️
openml/exceptions.py 83.87% <0%> (-9.68%) ⬇️
openml/_api_calls.py 87.93% <0%> (-2.59%) ⬇️
openml/runs/functions.py 82.56% <0%> (-0.55%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 371911f...3225588. Read the comment docs.

Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is 20 small enough for a test? Or are you using dataset 1 for that now?
  • Could you please have a look at the failing test?

@mfeurer
Copy link
Collaborator

mfeurer commented Oct 29, 2020

Superseded by #983.

@mfeurer mfeurer closed this Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pickle for SVHN bigger than arff file

4 participants