Skip to content

Conversation

@mfeurer
Copy link
Collaborator

@mfeurer mfeurer commented Mar 22, 2023

No description provided.

mfeurer and others added 30 commits May 31, 2021 11:30
* Add Windows to Github Action CI matrix

* Fix syntax, disable Ubuntu tests

Ubuntu tests only temporarily disabled for this PR, to avoid
unnecessary computational costs/time.

* Fix syntax for skip on install Python step

* Explicitly add the OS to includes

* Disable check for files left behind for Windows

The check is bash script, which means it fails on a Windows machine.

* Re-enable Ubuntu tests

* Replace Appveyor with Github Actions for WindowsCI
Since it can stem from connectivity issues and it might not occur on a
retry.
Currently parquet files are completely optional, so under no
circumstance should the inability to download it raise an error to the
user. Instead we log a warning and proceed without the parquet file.
Update function signatures for create_study|suite and allow for empty studies (i.e. with no runs).
* Add AttributeError as suspect for dependency issue

Happens for example when loading a 1.3 dataframe with a 1.0 pandas.
Some ORCIDs are missing because I could not with certainty determine the ORCID of some co-authors.
* Correctly use regex to specify files

* Add type hint

* Add note of fixing pre-commit hook #1129
* Add easy way to retrieve run predictions

* Log addition of ``predictions`` (#1103)
* Update to latest versions

* Updated Black formatting

Black was bumped from 19.10b0 to 22.6.0. Changes in the files are
reduced to:
 - No whitespace at the start and end of a docstring.
 - All comma separated "lists" (for example in function calls) are now
   one item per line, regardless if they would fit on one line.

* Update error code for "print"

Changed in flake8-print 5.0.0: https://pypi.org/project/flake8-print/

* Shorten comment to observe line length codestyle

* Install stubs for requests for mypy

* Add dependency for mypy dateutil type stubs

* Resolve mypy warnings

* Add update pre-commit dependencies notice
* Improve the error message on out-of-sync flow ids

* Add more meaningful messages on test fail
* Add scikit-learn 1.0 and 1.1 values for test

DecisionTree and RandomForestRegressor have one less default
hyperparameter: `min_impurity_split`

* Remove min_impurity_split requirements for >=1.0

* Update KMeans checks for scikit-learn 1.0 and 1.1
* fix nonetype error during print for tasks without class labels

* fix #1100/#1058 nonetype error

Co-authored-by: Pieter Gijsbers <[email protected]>
The test server has minio urls disabled. This is because we currently
do not have a setup that represents the live server in a test
environment yet. So, we download from the production server instead.
The previous solution had two test conditions (strict and not strict)
and several scikit-learn versions, because of two distinct changes
within scikit-learn (the removal of min_impurity_split in 1.0, and the
restructuring of public/private models in 0.24).
I refactored out the separate test cases to greatly simplify the
individual tests, and I added a test case for scikit-learn>=1.0,
which was previously not covered.
* n_iter is now keyword-only

* Standardize sklearn pipeline description lookups

* `priors` is no longer positional, and wasn't used in the first place

* Remove loss=kneighbours from the complex pipelin
It looks like the predictions loaded from an arff file are read as
floats by the arff reader, which results in a different type
(float v int). Because "equality" of values is already checked,
I figured dtype is not as imported. That said, I am not sure why
there are so many redundant comparisons in the first place?
Anyway, the difference should be due to pandas inference behavior,
and if that is what we want to test, then we should make a small
isolated test case instead of integrating it into every prediction
unit test. Finally, over the next year we should move away from ARFF.
* feat(minio): Allow for proxies

* fix: Declared proxy_client as None

* refactor(proxy): Change to `str | None` with "auto"
* Towards downloading buckets

* Download entire bucket instead of dataset file

* Dont download arff, skip files already cached

* Automatically unzip any downloaded archives

* Make downloading the bucket optional

Additionally, rename old cached files to the new filename format.

* Allow users to download the full bucket when pq is already cached

Otherwise the only way would be to delete the cache.

* Add unit test stub

* Remove redundant try/catch

* Remove commented out print statement

* Still download arff

* Towards downloading buckets

* Download entire bucket instead of dataset file

* Dont download arff, skip files already cached

* Automatically unzip any downloaded archives

* Make downloading the bucket optional

Additionally, rename old cached files to the new filename format.

* Allow users to download the full bucket when pq is already cached

Otherwise the only way would be to delete the cache.

* Add unit test stub

* Remove redundant try/catch

* Remove commented out print statement

* Still download arff

* ADD: download all files from minio bucket

* Add note for #1184

* Fix pre-commit issues (mypy, flake)

Co-authored-by: Matthias Feurer <[email protected]>
Those types changed in the switch to parquet, and we need to
update the server parquet files and/or test expectations.
* Update flake8 repo from gitlab to github

* Exclude `venv`

* Numpy scalar aliases are removed in 1.24

Fix numpy for future 0.13 releases, then fix and bump as needed
PGijsbers and others added 23 commits February 20, 2023 13:25
* Relax error checking

* Skip unit test due to server issue openml/OpenML#1180

* Account for rename parameter `base_estimator` to `estimator` in sk 1.2

* Update n_init parameter for sklearn 1.2

* Test for more specific exceptions
In #1188 we changed the standard cache file convention from
dataset.pq to dataset_{did}.pq. See also #1188.
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Update docker actions

* Fix context

* Specify tag for docker container to use strict python version (3.10)

* Load OpenML in Docker file

* load correct image

* load correct image

* Remove loading python again
* Drop upper bound on numpy version

* Update changelog
* Allow unknown task types on the server

* Applied black to openml/tasks/functions.py

* Some more fixes
* Add sklearn marker

* Mark tests that use scikit-learn

* Only run scikit-learn tests multiple times

The generic tests that don't use scikit-learn should only be tested once
(per platform).

* Rename for correct variable

* Add sklearn mark for filesystem test

* Remove quotes around sklearn

* Instead include sklearn in the matrix definition

* Update jobnames

* Add explicit false to jobname

* Remove space

* Add function inside of expression?

* Do string testing instead

* Add missing ${{

* Add explicit true to old sklearn tests

* Add instruction to add pytest marker for sklearn tests
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 4.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v2...v4)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
It provides a better repr and is less verbose.
…n run.data_content (#1209)

* add test and fix for switch of ground truth and predictions

* undo import optimization

* fix bug with model passing to function

* fix order in other tests

* update progress.rst

* new unit test for run consistency and bug fixed

* clarify new assert

* minor loop refactor

* refactor default to None

* directly test prediction data equal

* Update tests/test_runs/test_run.py

Co-authored-by: Pieter Gijsbers <[email protected]>

* Mark sklearn tests (#1202)

* Add sklearn marker

* Mark tests that use scikit-learn

* Only run scikit-learn tests multiple times

The generic tests that don't use scikit-learn should only be tested once
(per platform).

* Rename for correct variable

* Add sklearn mark for filesystem test

* Remove quotes around sklearn

* Instead include sklearn in the matrix definition

* Update jobnames

* Add explicit false to jobname

* Remove space

* Add function inside of expression?

* Do string testing instead

* Add missing ${{

* Add explicit true to old sklearn tests

* Add instruction to add pytest marker for sklearn tests

* add test and fix for switch of ground truth and predictions

* undo import optimization

* fix mask error resulting from rebase

* make dummy classifier strategy consistent to avoid problems as a result of the random state problems for sklearn < 0.24

---------

Co-authored-by: Pieter Gijsbers <[email protected]>
* Fix documentation building

* Fix numpy version

* Fix two links
* Try Ubunte 20.04 for Python 3.6

* use old ubuntu for python 3.6
Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1 to 2.
- [Release notes](https://github.com/docker/setup-buildx-action/releases)
- [Commits](docker/setup-buildx-action@v1...v2)

---
updated-dependencies:
- dependency-name: docker/setup-buildx-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Update run.py

* Update run.py

updated description to not contain duplicate information.

* Update run.py
* Refactor if-statements

* Add explicit names to conditional expression

* Add 'dependencies' to better mimic OpenMLFlow
* Install custom numpy version for specific combination of Python3.8 and numpy

* Debug output

* Change syntax

* move to coverage action v3

* Remove test output
* added additional task agnostic local result to print of run

* add PR to progress.rst

* fix comment typo

* Update openml/runs/run.py

Co-authored-by: Matthias Feurer <[email protected]>

* add a function to list available estimation procedures

* refactor print to only work for supported task types and local measures

* add test for pint out and update progress

* added additional task agnostic local result to print of run

* add PR to progress.rst

* fix comment typo

* Update openml/runs/run.py

Co-authored-by: Matthias Feurer <[email protected]>

* add a function to list available estimation procedures

* refactor print to only work for supported task types and local measures

* add test for pint out and update progress

* Fix CI Python 3.6 (#1218)

* Try Ubunte 20.04 for Python 3.6

* use old ubuntu for python 3.6

* Bump docker/setup-buildx-action from 1 to 2 (#1221)

Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1 to 2.
- [Release notes](https://github.com/docker/setup-buildx-action/releases)
- [Commits](docker/setup-buildx-action@v1...v2)

---
updated-dependencies:
- dependency-name: docker/setup-buildx-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update run.py (#1194)

* Update run.py

* Update run.py

updated description to not contain duplicate information.

* Update run.py

* add type hint for new function

* update add description

* Refactor if-statements (#1219)

* Refactor if-statements

* Add explicit names to conditional expression

* Add 'dependencies' to better mimic OpenMLFlow

* Ci python 38 (#1220)

* Install custom numpy version for specific combination of Python3.8 and numpy

* Debug output

* Change syntax

* move to coverage action v3

* Remove test output

* added additional task agnostic local result to print of run

* add PR to progress.rst

* fix comment typo

* Update openml/runs/run.py

Co-authored-by: Matthias Feurer <[email protected]>

* add a function to list available estimation procedures

* refactor print to only work for supported task types and local measures

* add test for pint out and update progress

* added additional task agnostic local result to print of run

* add PR to progress.rst

* add type hint for new function

* update add description

* fix run doc string

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Matthias Feurer <[email protected]>
Co-authored-by: Matthias Feurer <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Vishal Parmar <[email protected]>
Co-authored-by: Pieter Gijsbers <[email protected]>
* add better error handling for checksum when downloading a file

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update usage of __is_checksum_equal

* Update openml/_api_calls.py

Co-authored-by: Pieter Gijsbers <[email protected]>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pieter Gijsbers <[email protected]>
* Correctly only clean up tests/files/

* Log to console for pytest invocation
@mfeurer mfeurer requested a review from PGijsbers March 22, 2023 09:52
@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2023

Codecov Report

Patch coverage: 80.93% and project coverage change: +0.09 🎉

Comparison is base (d2ccfe9) 85.14% compared to head (bb3793d) 85.24%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1232      +/-   ##
==========================================
+ Coverage   85.14%   85.24%   +0.09%     
==========================================
  Files          38       38              
  Lines        5008     5008              
==========================================
+ Hits         4264     4269       +5     
+ Misses        744      739       -5     
Impacted Files Coverage Δ
openml/base.py 90.90% <ø> (ø)
openml/cli.py 0.00% <ø> (ø)
openml/datasets/__init__.py 100.00% <ø> (ø)
openml/extensions/extension_interface.py 91.66% <ø> (ø)
openml/flows/flow.py 92.71% <ø> (ø)
openml/runs/__init__.py 100.00% <ø> (ø)
openml/study/study.py 72.00% <ø> (ø)
openml/tasks/__init__.py 100.00% <ø> (ø)
openml/tasks/split.py 94.50% <ø> (ø)
openml/evaluations/functions.py 83.47% <8.33%> (ø)
... and 19 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@PGijsbers PGijsbers mentioned this pull request Mar 22, 2023
@mfeurer mfeurer merged commit 3380bbb into main Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants