add sklearn version to external version in sklearn flows, #742

amueller · 2019-07-22T18:51:54Z

closes #734.

explicitly handle extension in flow creation

mfeurer · 2019-07-23T08:46:36Z

Yep, that should solve it, thanks for the PR. However, it seems like you forgot to add the MyLR class.

amueller · 2019-07-23T16:40:15Z

damn, should be MyDummy now... I'll fix after lunch.

amueller · 2019-07-23T17:32:10Z

openml/flows/flow.py

        self.flow_id = flow_id
-
-        self._extension = get_extension_by_flow(self)
+        if extension is None:


Do you want me to remove this? Would work without it but I found it more explicit to directly give the extension.

This is totally fine by me.

amueller · 2019-07-23T17:33:56Z

probably fine now ;)

mfeurer

Thanks, this looks good now. However, I'm afraid that your unit test fails for scikit-learn 0.18.X:

self = <tests.test_extensions.test_sklearn_extension.test_sklearn_extension.TestSklearnExtensionRunFunctions testMethod=test_run_model_on_task>
    def test_run_model_on_task(self):
        class MyDummy(sklearn.dummy.DummyClassifier):
            pass
        task = openml.tasks.get_task(1)
>       openml.runs.run_model_on_task(MyDummy(), task)
/home/travis/build/openml/openml-python/tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py:1226: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/travis/build/openml/openml-python/openml/runs/functions.py:106: in run_model_on_task
    upload_flow=upload_flow,
/home/travis/build/openml/openml-python/openml/runs/functions.py:222: in run_flow_on_task
    add_local_measures=add_local_measures,
/home/travis/build/openml/openml-python/openml/runs/functions.py:446: in _run_task_get_arffcontent
    X_test=test_x,
/home/travis/build/openml/openml-python/openml/extensions/sklearn/extension.py:1248: in _run_model_on_fold
    pred_y = model_copy.predict(X_test)
/home/travis/miniconda/envs/testenv/lib/python3.6/site-packages/sklearn/dummy.py:174: in predict
    X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
/home/travis/miniconda/envs/testenv/lib/python3.6/site-packages/sklearn/utils/validation.py:407: in check_array
    _assert_all_finite(array)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
X = array([[nan,  0.,  1., ...,  0.,  0., nan],
       [ 8.,  0.,  1., ..., nan,  0., nan],
       [ 8.,  0.,  1., ..., na...nan,  0., nan],
       [nan,  0., nan, ..., nan,  0., nan],
       [ 8.,  0.,  0., ..., nan,  0., nan]], dtype=float32)
    def _assert_all_finite(X):
        """Like assert_all_finite, but only for ndarray."""
        X = np.asanyarray(X)
        # First try an O(n) time, O(1) space solution for the common case that
        # everything is finite; fall back to O(n) space np.isfinite to prevent
        # false positives from overflow in sum method.
        if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
                and not np.isfinite(X).all()):
            raise ValueError("Input contains NaN, infinity"
>                            " or a value too large for %r." % X.dtype)
E           ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
/home/travis/miniconda/envs/testenv/lib/python3.6/site-packages/sklearn/utils/validation.py:58: ValueError

Could you please check that?

… handle NaNs

amueller · 2019-07-24T18:34:33Z

I think the failures now are unrelated?

amueller · 2019-07-26T15:07:09Z

ping @PGijsbers ;)
This one is what prevents us from pushing flows with non-sklearn components

add sklearn version to external version in sklearn flows,

dcd8a35

explicitly handle extension in flow creation

amueller requested review from PGijsbers and mfeurer and removed request for mfeurer July 22, 2019 18:52

use dummy classifier instead of linear regression

17f7242

amueller added 2 commits July 23, 2019 13:30

use MyDummy instead of MyLR

f0e8214

typo aaah

6eccbb5

amueller commented Jul 23, 2019

View reviewed changes

all the typos

50065fa

mfeurer requested changes Jul 24, 2019

View reviewed changes

use custom pipeline instead of dummy class because sklearn 0.18 can't…

32f58bf

… handle NaNs

mfeurer approved these changes Jul 26, 2019

View reviewed changes

mfeurer merged commit 4c71d1d into openml:develop Jul 26, 2019

amueller mentioned this pull request Oct 15, 2019

Still not able to use non-sklearn estimators without wrapping them in a pipeline #734

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add sklearn version to external version in sklearn flows, #742

add sklearn version to external version in sklearn flows, #742

Uh oh!

amueller commented Jul 22, 2019

Uh oh!

mfeurer commented Jul 23, 2019

Uh oh!

amueller commented Jul 23, 2019

Uh oh!

amueller Jul 23, 2019

Uh oh!

mfeurer Jul 24, 2019

Uh oh!

amueller commented Jul 23, 2019

Uh oh!

mfeurer left a comment

Uh oh!

amueller commented Jul 24, 2019

Uh oh!

amueller commented Jul 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

add sklearn version to external version in sklearn flows, #742

add sklearn version to external version in sklearn flows, #742

Uh oh!

Conversation

amueller commented Jul 22, 2019

Uh oh!

mfeurer commented Jul 23, 2019

Uh oh!

amueller commented Jul 23, 2019

Uh oh!

amueller Jul 23, 2019

Choose a reason for hiding this comment

Uh oh!

mfeurer Jul 24, 2019

Choose a reason for hiding this comment

Uh oh!

amueller commented Jul 23, 2019

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

amueller commented Jul 24, 2019

Uh oh!

amueller commented Jul 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants