Download all files #1188

PGijsbers · 2022-11-25T12:54:40Z

It allows you to turn on an experimental feature that downloads all files found in the minio bucket where the parquet file is stored. You can turn it on by specifying download_all_files=True in get_dataset. Example:

import openml
openml.datasets.get_dataset(44312, download_all_files=True)

will also download an additional archive and unzip it.

Prepare 11.0 release

Release 0.12

Create release 0.12.1

Release 0.12.2

Additionally, rename old cached files to the new filename format.

Otherwise the only way would be to delete the cache.

Additionally, rename old cached files to the new filename format.

Otherwise the only way would be to delete the cache.

…nml-python into download_all_files

In #1188 we changed the standard cache file convention from dataset.pq to dataset_{did}.pq. See also #1188.

* Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * ADD: download all files from minio bucket * Add note for openml#1184 * Fix pre-commit issues (mypy, flake) Co-authored-by: Matthias Feurer <[email protected]>

In openml#1188 we changed the standard cache file convention from dataset.pq to dataset_{did}.pq. See also openml#1188.

mfeurer and others added 29 commits October 25, 2020 20:00

Merge pull request openml#969 from openml/develop

bc87333

Prepare 11.0 release

Merge pull request openml#1043 from openml/develop

4a20d12

Release 0.12

Merge pull request openml#1055 from openml/develop

ee6ef60

Create release 0.12.1

Merge pull request openml#1087 from openml/develop

d2ccfe9

Release 0.12.2

Towards downloading buckets

7e06508

Download entire bucket instead of dataset file

b3544f3

Dont download arff, skip files already cached

28a3e4e

Automatically unzip any downloaded archives

702f87f

Make downloading the bucket optional

e528848

Additionally, rename old cached files to the new filename format.

Allow users to download the full bucket when pq is already cached

e668249

Otherwise the only way would be to delete the cache.

Add unit test stub

373bb3f

Remove redundant try/catch

1e1d0cc

Remove commented out print statement

dd60252

Still download arff

4f956bf

Towards downloading buckets

0775d21

Download entire bucket instead of dataset file

e08f95c

Dont download arff, skip files already cached

b0244d2

Automatically unzip any downloaded archives

ad4c7f8

Make downloading the bucket optional

b79206a

Additionally, rename old cached files to the new filename format.

Allow users to download the full bucket when pq is already cached

55ea151

Otherwise the only way would be to delete the cache.

Add unit test stub

5cb4479

Remove redundant try/catch

d7359b1

Remove commented out print statement

40574c8

Still download arff

574c0a5

Merge branch 'download_all_files' of https://github.com/PGijsbers/ope…

e9401cb

…nml-python into download_all_files

Merge branch 'develop' into download_all_files

31b8ae0

ADD: download all files from minio bucket

e9cdd80

Merge branch 'develop' into download_all_files

744f420

Add note for openml#1184

f6f9c49

PGijsbers requested a review from joaquinvanschoren November 25, 2022 13:01

Fix pre-commit issues (mypy, flake)

d9362cf

joaquinvanschoren approved these changes Nov 25, 2022

View reviewed changes

PGijsbers merged commit 580b536 into openml:develop Nov 25, 2022

PGijsbers added a commit that referenced this pull request Feb 20, 2023

Change the cached file to reflect new standard #1188

7ad8d92

In #1188 we changed the standard cache file convention from dataset.pq to dataset_{did}.pq. See also #1188.

PGijsbers mentioned this pull request Feb 20, 2023

Change the cached file to reflect new standard #1188 #1203

Merged

mfeurer pushed a commit that referenced this pull request Feb 21, 2023

Change the cached file to reflect new standard #1188 (#1203)

7d069a9

In #1188 we changed the standard cache file convention from dataset.pq to dataset_{did}.pq. See also #1188.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Download all files #1188

Download all files #1188

Uh oh!

PGijsbers commented Nov 25, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Download all files #1188

Download all files #1188

Uh oh!

Conversation

PGijsbers commented Nov 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PGijsbers commented Nov 25, 2022 •

edited

Loading