Release 0.13.1 #1232

mfeurer · 2023-03-22T09:52:49Z

No description provided.

* Add Windows to Github Action CI matrix * Fix syntax, disable Ubuntu tests Ubuntu tests only temporarily disabled for this PR, to avoid unnecessary computational costs/time. * Fix syntax for skip on install Python step * Explicitly add the OS to includes * Disable check for files left behind for Windows The check is bash script, which means it fails on a Windows machine. * Re-enable Ubuntu tests * Replace Appveyor with Github Actions for WindowsCI

Since it can stem from connectivity issues and it might not occur on a retry.

Currently parquet files are completely optional, so under no circumstance should the inability to download it raise an error to the user. Instead we log a warning and proceed without the parquet file.

Update function signatures for create_study|suite and allow for empty studies (i.e. with no runs).

* Add AttributeError as suspect for dependency issue Happens for example when loading a 1.3 dataframe with a 1.0 pandas.

Some ORCIDs are missing because I could not with certainty determine the ORCID of some co-authors.

* Correctly use regex to specify files * Add type hint * Add note of fixing pre-commit hook #1129

* Add easy way to retrieve run predictions * Log addition of ``predictions`` (#1103)

* Update to latest versions * Updated Black formatting Black was bumped from 19.10b0 to 22.6.0. Changes in the files are reduced to: - No whitespace at the start and end of a docstring. - All comma separated "lists" (for example in function calls) are now one item per line, regardless if they would fit on one line. * Update error code for "print" Changed in flake8-print 5.0.0: https://pypi.org/project/flake8-print/ * Shorten comment to observe line length codestyle * Install stubs for requests for mypy * Add dependency for mypy dateutil type stubs * Resolve mypy warnings * Add update pre-commit dependencies notice

* Improve the error message on out-of-sync flow ids * Add more meaningful messages on test fail

* Add scikit-learn 1.0 and 1.1 values for test DecisionTree and RandomForestRegressor have one less default hyperparameter: `min_impurity_split` * Remove min_impurity_split requirements for >=1.0 * Update KMeans checks for scikit-learn 1.0 and 1.1

We should only test code that we write.

* fix nonetype error during print for tasks without class labels * fix #1100/#1058 nonetype error Co-authored-by: Pieter Gijsbers <[email protected]>

The test server has minio urls disabled. This is because we currently do not have a setup that represents the live server in a test environment yet. So, we download from the production server instead.

The previous solution had two test conditions (strict and not strict) and several scikit-learn versions, because of two distinct changes within scikit-learn (the removal of min_impurity_split in 1.0, and the restructuring of public/private models in 0.24). I refactored out the separate test cases to greatly simplify the individual tests, and I added a test case for scikit-learn>=1.0, which was previously not covered.

…1178)

* n_iter is now keyword-only * Standardize sklearn pipeline description lookups * `priors` is no longer positional, and wasn't used in the first place * Remove loss=kneighbours from the complex pipelin

It looks like the predictions loaded from an arff file are read as floats by the arff reader, which results in a different type (float v int). Because "equality" of values is already checked, I figured dtype is not as imported. That said, I am not sure why there are so many redundant comparisons in the first place? Anyway, the difference should be due to pandas inference behavior, and if that is what we want to test, then we should make a small isolated test case instead of integrating it into every prediction unit test. Finally, over the next year we should move away from ARFF.

* feat(minio): Allow for proxies * fix: Declared proxy_client as None * refactor(proxy): Change to `str | None` with "auto"

* Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * ADD: download all files from minio bucket * Add note for #1184 * Fix pre-commit issues (mypy, flake) Co-authored-by: Matthias Feurer <[email protected]>

Those types changed in the switch to parquet, and we need to update the server parquet files and/or test expectations.

* Update flake8 repo from gitlab to github * Exclude `venv` * Numpy scalar aliases are removed in 1.24 Fix numpy for future 0.13 releases, then fix and bump as needed

* Relax error checking * Skip unit test due to server issue openml/OpenML#1180 * Account for rename parameter `base_estimator` to `estimator` in sk 1.2 * Update n_init parameter for sklearn 1.2 * Test for more specific exceptions

In #1188 we changed the standard cache file convention from dataset.pq to dataset_{did}.pq. See also #1188.

Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v2...v3) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update docker actions * Fix context * Specify tag for docker container to use strict python version (3.10) * Load OpenML in Docker file * load correct image * load correct image * Remove loading python again

* Drop upper bound on numpy version * Update changelog

* Allow unknown task types on the server * Applied black to openml/tasks/functions.py * Some more fixes

* Add sklearn marker * Mark tests that use scikit-learn * Only run scikit-learn tests multiple times The generic tests that don't use scikit-learn should only be tested once (per platform). * Rename for correct variable * Add sklearn mark for filesystem test * Remove quotes around sklearn * Instead include sklearn in the matrix definition * Update jobnames * Add explicit false to jobname * Remove space * Add function inside of expression? * Do string testing instead * Add missing ${{ * Add explicit true to old sklearn tests * Add instruction to add pytest marker for sklearn tests

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 4. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v2...v4) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

It provides a better repr and is less verbose.

…n run.data_content (#1209) * add test and fix for switch of ground truth and predictions * undo import optimization * fix bug with model passing to function * fix order in other tests * update progress.rst * new unit test for run consistency and bug fixed * clarify new assert * minor loop refactor * refactor default to None * directly test prediction data equal * Update tests/test_runs/test_run.py Co-authored-by: Pieter Gijsbers <[email protected]> * Mark sklearn tests (#1202) * Add sklearn marker * Mark tests that use scikit-learn * Only run scikit-learn tests multiple times The generic tests that don't use scikit-learn should only be tested once (per platform). * Rename for correct variable * Add sklearn mark for filesystem test * Remove quotes around sklearn * Instead include sklearn in the matrix definition * Update jobnames * Add explicit false to jobname * Remove space * Add function inside of expression? * Do string testing instead * Add missing ${{ * Add explicit true to old sklearn tests * Add instruction to add pytest marker for sklearn tests * add test and fix for switch of ground truth and predictions * undo import optimization * fix mask error resulting from rebase * make dummy classifier strategy consistent to avoid problems as a result of the random state problems for sklearn < 0.24 --------- Co-authored-by: Pieter Gijsbers <[email protected]>

* Fix documentation building * Fix numpy version * Fix two links

* Try Ubunte 20.04 for Python 3.6 * use old ubuntu for python 3.6

Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1 to 2. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](docker/setup-buildx-action@v1...v2) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update run.py * Update run.py updated description to not contain duplicate information. * Update run.py

* Refactor if-statements * Add explicit names to conditional expression * Add 'dependencies' to better mimic OpenMLFlow

* Install custom numpy version for specific combination of Python3.8 and numpy * Debug output * Change syntax * move to coverage action v3 * Remove test output

* added additional task agnostic local result to print of run * add PR to progress.rst * fix comment typo * Update openml/runs/run.py Co-authored-by: Matthias Feurer <[email protected]> * add a function to list available estimation procedures * refactor print to only work for supported task types and local measures * add test for pint out and update progress * added additional task agnostic local result to print of run * add PR to progress.rst * fix comment typo * Update openml/runs/run.py Co-authored-by: Matthias Feurer <[email protected]> * add a function to list available estimation procedures * refactor print to only work for supported task types and local measures * add test for pint out and update progress * Fix CI Python 3.6 (#1218) * Try Ubunte 20.04 for Python 3.6 * use old ubuntu for python 3.6 * Bump docker/setup-buildx-action from 1 to 2 (#1221) Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1 to 2. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](docker/setup-buildx-action@v1...v2) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update run.py (#1194) * Update run.py * Update run.py updated description to not contain duplicate information. * Update run.py * add type hint for new function * update add description * Refactor if-statements (#1219) * Refactor if-statements * Add explicit names to conditional expression * Add 'dependencies' to better mimic OpenMLFlow * Ci python 38 (#1220) * Install custom numpy version for specific combination of Python3.8 and numpy * Debug output * Change syntax * move to coverage action v3 * Remove test output * added additional task agnostic local result to print of run * add PR to progress.rst * fix comment typo * Update openml/runs/run.py Co-authored-by: Matthias Feurer <[email protected]> * add a function to list available estimation procedures * refactor print to only work for supported task types and local measures * add test for pint out and update progress * added additional task agnostic local result to print of run * add PR to progress.rst * add type hint for new function * update add description * fix run doc string --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Vishal Parmar <[email protected]> Co-authored-by: Pieter Gijsbers <[email protected]>

* add better error handling for checksum when downloading a file * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update usage of __is_checksum_equal * Update openml/_api_calls.py Co-authored-by: Pieter Gijsbers <[email protected]> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pieter Gijsbers <[email protected]>

* Correctly only clean up tests/files/ * Log to console for pytest invocation

…1060)

codecov-commenter · 2023-03-22T10:27:33Z

Codecov Report

Patch coverage: 80.93% and project coverage change: +0.09 🎉

Comparison is base (d2ccfe9) 85.14% compared to head (bb3793d) 85.24%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1232      +/-   ##
==========================================
+ Coverage   85.14%   85.24%   +0.09%     
==========================================
  Files          38       38              
  Lines        5008     5008              
==========================================
+ Hits         4264     4269       +5     
+ Misses        744      739       -5

Impacted Files	Coverage Δ
openml/base.py	`90.90% <ø> (ø)`
openml/cli.py	`0.00% <ø> (ø)`
openml/datasets/__init__.py	`100.00% <ø> (ø)`
openml/extensions/extension_interface.py	`91.66% <ø> (ø)`
openml/flows/flow.py	`92.71% <ø> (ø)`
openml/runs/__init__.py	`100.00% <ø> (ø)`
openml/study/study.py	`72.00% <ø> (ø)`
openml/tasks/__init__.py	`100.00% <ø> (ø)`
openml/tasks/split.py	`94.50% <ø> (ø)`
openml/evaluations/functions.py	`83.47% <8.33%> (ø)`
... and 19 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Main

mfeurer and others added 30 commits May 31, 2021 11:30

minor fixes to usage.rst (#1090)

f16ba08

Add ChunkedError to list of retry exception (#1118)

2984403

Since it can stem from connectivity issues and it might not occur on a retry.

Always ignore MaxRetryError but log with warning (#1119)

a6c0576

Currently parquet files are completely optional, so under no circumstance should the inability to download it raise an error to the user. Instead we log a warning and proceed without the parquet file.

Fix/1110 (#1117)

b4c868a

Update function signatures for create_study|suite and allow for empty studies (i.e. with no runs).

Add AttributeError as suspect for dependency issue (#1121)

aed5010

* Add AttributeError as suspect for dependency issue Happens for example when loading a 1.3 dataframe with a 1.0 pandas.

Add CITATION.cff (#1120)

db7bb9a

Some ORCIDs are missing because I could not with certainty determine the ORCID of some co-authors.

Precommit update (#1129)

493511a

* Correctly use regex to specify files * Add type hint * Add note of fixing pre-commit hook #1129

Predictions (#1128)

99a62f6

* Add easy way to retrieve run predictions * Log addition of ``predictions`` (#1103)

Use GET instead of POST for flow exist (#1147)

c911d6d

Replace removed file with new target for download test (#1158)

a8d96d5

Fix outdated docstring for list_tasks function (#1149)

ccb3e8e

Improve the error message on out-of-sync flow ids (#1171)

9ce2a6b

* Improve the error message on out-of-sync flow ids * Add more meaningful messages on test fail

Update Pipeline description for >=1.0 (#1170)

2fde8d5

Update URL to reflect new endpoint (#1172)

2ddae0f

Remove tests which only test scikit-learn functionality (#1169)

c17704e

We should only test code that we write.

fix nonetype error during print for tasks without class labels (#1148)

953f84e

* fix nonetype error during print for tasks without class labels * fix #1100/#1058 nonetype error Co-authored-by: Pieter Gijsbers <[email protected]>

Flow exists GET is deprecated, use POST (#1173)

6da0aac

Test get_parquet on production server (#1174)

22ee9cd

The test server has minio urls disabled. This is because we currently do not have a setup that represents the live server in a test environment yet. So, we download from the production server instead.

Provide clearer error when server provides bad data description XML (#…

e6250fa

…1178)

Update more sklearn tests (#1175)

75fed8a

* n_iter is now keyword-only * Standardize sklearn pipeline description lookups * `priors` is no longer positional, and wasn't used in the first place * Remove loss=kneighbours from the complex pipelin

feat(minio): Allow for proxies (#1184)

a909a0c

* feat(minio): Allow for proxies * fix: Declared proxy_client as None * refactor(proxy): Change to `str | None` with "auto"

Update __version__.py (#1189)

1dfe398

Skip tests that use arff reading optimization for typecheck (#1185)

5eb84ce

Those types changed in the switch to parquet, and we need to update the server parquet files and/or test expectations.

Update configs (#1199)

467f6eb

* Update flake8 repo from gitlab to github * Exclude `venv` * Numpy scalar aliases are removed in 1.24 Fix numpy for future 0.13 releases, then fix and bump as needed

PGijsbers and others added 23 commits February 20, 2023 13:25

Update tests for sklearn 1.2, server issue (#1200)

dd62f2b

* Relax error checking * Skip unit test due to server issue openml/OpenML#1180 * Account for rename parameter `base_estimator` to `estimator` in sk 1.2 * Update n_init parameter for sklearn 1.2 * Test for more specific exceptions

Version bump to dev and add changelog stub (#1190)

2a7ab17

Add: dependabot checks for workflow versions (#1155)

5f72e2e

Change the cached file to reflect new standard #1188 (#1203)

7d069a9

In #1188 we changed the standard cache file convention from dataset.pq to dataset_{did}.pq. See also #1188.

Update docker actions (#1211)

603fe60

* Update docker actions * Fix context * Specify tag for docker container to use strict python version (3.10) * Load OpenML in Docker file * load correct image * load correct image * Remove loading python again

Support new numpy (#1215)

17ff086

* Drop upper bound on numpy version * Update changelog

Allow unknown task types on the server (#1216)

d9850be

* Allow unknown task types on the server * Applied black to openml/tasks/functions.py * Some more fixes

Make OpenMLTraceIteration a dataclass (#1201)

c590b3a

It provides a better repr and is less verbose.

Fix documentation building (#1217)

b84536a

* Fix documentation building * Fix numpy version * Fix two links

Fix CI Python 3.6 (#1218)

5730669

* Try Ubunte 20.04 for Python 3.6 * use old ubuntu for python 3.6

Update run.py (#1194)

5dcb7a3

* Update run.py * Update run.py updated description to not contain duplicate information. * Update run.py

Refactor if-statements (#1219)

687a0f1

* Refactor if-statements * Add explicit names to conditional expression * Add 'dependencies' to better mimic OpenMLFlow

Ci python 38 (#1220)

c0a75bd

* Install custom numpy version for specific combination of Python3.8 and numpy * Debug output * Change syntax * move to coverage action v3 * Remove test output

Fix coverage (#1226)

24cbc5e

* Correctly only clean up tests/files/ * Log to console for pytest invocation

Issue 1028: public delete functions for run, task, flow and database (#…

3c00d7b

…1060)

Update changelog and version number for new release (#1230)

7127e9c

mfeurer requested a review from PGijsbers March 22, 2023 09:52

PGijsbers approved these changes Mar 22, 2023

View reviewed changes

PGijsbers mentioned this pull request Mar 22, 2023

Main #1233

Merged

Merge pull request #1233 from openml/main

bb3793d

Main

mfeurer merged commit 3380bbb into main Mar 22, 2023

github-actions bot pushed a commit that referenced this pull request Mar 22, 2023

Matthias Feurer: Merge pull request #1232 from openml/develop

f819812

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Release 0.13.1 #1232

Release 0.13.1 #1232

Uh oh!

mfeurer commented Mar 22, 2023

Uh oh!

codecov-commenter commented Mar 22, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Uh oh!

Release 0.13.1 #1232

Release 0.13.1 #1232

Uh oh!

Conversation

mfeurer commented Mar 22, 2023

Uh oh!

codecov-commenter commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

codecov-commenter commented Mar 22, 2023 •

edited

Loading