Comparing changes

updates: - [github.com/astral-sh/ruff-pre-commit: v0.1.13 → v0.1.14](astral-sh/ruff-pre-commit@v0.1.13...v0.1.14) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* read file in read mode * cast parameters to expected types * Following PGijsbers proposal to ensure that avoid_duplicate_runs is a boolean after reading it from config_file * Add a test, move parsing of avoid_duplicate_runs --------- Co-authored-by: PGijsbers <[email protected]>

* Update 'sparse' parameter for OHE for sklearn >= 1.4 * Add compatability or skips for sklearn >= 1.4 * Change 'auto' to 'sqrt' for sklearn>1.3 as 'auto' is deprecated * Skip flaky test It is unclear how a condition where the test is supposed to pass is created. Even after running the test suite 2-3 times, it does not yet seem to pass. * Fix typo * Ignore description comparison for newer scikit-learn There are some minor changes to the docstrings. I do not know that it is useful to keep testing it this way, so for now I will disable the test on newer versions. * Adjust for scikit-learn 1.3 The loss has been renamed. The performance of the model also seems to have changed slightly for the same seed. So I decided to compare with the lower fidelity that was already used on Windows systems. * Remove timeout and reruns to better investigate CI failures * Fix typo in parametername * Add jobs for more recent scikit-learns * Expand the matrix with all scikit-learn 1.x versions * Fix for numpy2.0 compatibility (#1341) Numpy2.0 cleaned up their namespace. * Rewrite matrix and update numpy compatibility * Move comment in-line * Stringify name of new step to see if that prevented the action * Fix unspecified os for included jobs * Fix typo in version pinning for numpy * Fix version specification for sklearn skips * Output final list of installed packages for debugging purposes * Cap scipy version for older versions of scikit-learn There is a breaking change to the way 'mode' works, that breaks scikit-learn internals. * Update parameter base_estimator to estimator for sklearn>=1.4 * Account for changes to sklearn interface in 1.4 and 1.5 * Non-strict reinstantiation requires different scikit-learn version * Parameters were already changed in 1.4 * Fix race condition (I think) It seems to me that run.evaluations is set only when the run is fetched. Whether it has evaluations depends on server state. So if the server has resolved the traces between the initial fetch and the trace-check, you could be checking len(run.evaluations) where evaluations is None. * Use latest patch version of each minor release * Convert numpy types back to builtin types Scikit-learn or numpy changed the typing of the parameters (seen in a masked array, not sure if also outside of that). Convert these values back to Python builtins. * Specify versions with * instead to allow for specific patches * Flow_exists does not return None but False is the flow does not exist * Update new version definitions also installation step * Fix bug introduced in refactoring for np.generic support We don't want to serialize as the value np.nan, we want to include the nan directly. It is an indication that the parameter was left unset. * Add back the single-test timeout of 600s * [skip ci] Add note to changelog * Check that evaluations are present with None-check instead The default behavior if no evaluation is present is for it to be None. So it makes sense to check for that instead. As far as I can tell, run.evaluations should always contain some items if it is not None. But I added an assert just in case. * Remove timeouts again I suspect they "crash" workers. This of course introduces the risk of hanging processes... But I cannot reproduce the issue locally.

* Add HTTP headers to all requests This allows us to better understand the traffic we see to our API. It is not identifiable to a person. * Update unit test to pass even with user-agent in header

* Add packaging dependency * Change use of distutils to packaging * Update missed usage of distutils to packaging * Inline comparison to clear up confusion

* Prefer parquet over arff, do not load arff if not needed * Only download arff if needed * Test arff file is not set when downloading parquet from prod

* Add progress bar to downloading minio files * Do not redownload cached files There is now a way to force a cache clear, so always redownloading is not useful anymore. * Set typed values on dictionary to avoid TypeError from Config * Add regression test for parsing booleans

* Towards lazy-by-default for dataset loading * Isolate lazy behavior to pytest function outside of class * Solve concurrency issue where test would use same cache * Ensure metadata is downloaded to verify dataset is processed * Clean up to reflect new defaults and tests * Fix oversight from 1335 * Download data as was 0.14 behavior * Restore test * Formatting * Test obsolete, replaced by test_get_dataset_lazy_behavior

Sometime between 3.9 and 3.12 the stringification of ordered dicts changed from using a list of tuples to a dictionary.

* Remove archive after it is extracted to save disk space * Leave a marker after removing archive to avoid redownload * Automatic refresh if expected marker is absent * Be consistent about syntax use for path construction

* Pass kwargs through task to ```get_dataset``` Allows to follow the directions in the warning ```Starting from Version 0.15 `download_data`, `download_qualities`, and `download_features_meta_data` will all be ``False`` instead of ``True`` by default to enable lazy loading.``` * docs: explain that ```task.get_dataset``` passes kwargs * Update openml/tasks/task.py Remove Py3.8+ feature for backwards compatibility --------- Co-authored-by: Pieter Gijsbers <[email protected]>

* Change defaults for `get_task` * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linting errors * Add missing type annotation --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Expand 0.15.0 changelog with other PRs not yet added * Bump version number * Add newer Python versions since we are compatible * Revert "Add newer Python versions since we are compatible" This reverts commit 5088c80. * Add newer compatible versions of Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Comparing changes

Open a pull request

Commits on Jan 22, 2024

Commits on May 15, 2024

Commits on Jul 5, 2024

Commits on Jul 10, 2024

Commits on Sep 16, 2024

Commits on Sep 22, 2024

Commits on Sep 27, 2024

Commits on Sep 29, 2024

Commits on Oct 1, 2024

Commits on Oct 4, 2024

This comparison is taking too long to generate.

Uh oh!