[MRG+1] fix _BaseComposition._set_params with nested parameters by amueller · Pull Request #9945 · scikit-learn/scikit-learn

amueller · 2017-10-17T21:12:50Z

This seemed the simplest fix.

This kind of means that the protection in _BaseComposition._set_params is not enough.
We could make that method work by grouping the things in BaseEstimator.set_params according to their prefixes and call set_params only once per prefix. That seems like a slightly cleaner solution, but not sure it's worth the effort.

This will go away in Python3.6 as iteration is guaranteed to be ordered then.

jnothman · 2017-10-17T22:25:05Z

I had intended this case to be handled by _BaseComposition._set_params in the case of Pipeline. Why isn't that working?? But this fix should be helpful for metaestimators that don't need to define their own set_params. Please add a test for that case...

jnothman · 2017-10-17T22:46:57Z

I've tried tracing the _BaseComposition._set_params logic, but still can't see why this bug should occur, or why this patch should fix it. Running the snippet, what's passed to set_params is {'a__b', 'a__b__alpha'}; then {'a__b'} is handled by _BaseComposition._set_params; then set_params is asked to set 'b__alpha' alone (and then 'alpha').

jnothman · 2017-10-18T01:47:49Z

I pushed a more extensive fix to your branch. I hope you don't mind.

codecov · 2017-10-18T02:52:56Z

Codecov Report

Merging #9945 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #9945      +/-   ##
==========================================
+ Coverage   96.17%   96.17%   +<.01%     
==========================================
  Files         336      336              
  Lines       62533    62540       +7     
==========================================
+ Hits        60138    60145       +7     
  Misses       2395     2395

Impacted Files	Coverage Δ
sklearn/tests/test_pipeline.py	`99.64% <100%> (ø)`	⬆️
sklearn/feature_selection/base.py	`94.82% <0%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c6483c...dd1b792. Read the comment docs.

jnothman

~~@amueller, let me know if you find this use of groupby illegible. Using defaultdict may be neater.~~

You can't determine even local parameters with only class name

lesteve

Not an expert on the Pipeline code, but this looks good to me, with a small comment.

I double-checked that the tests were failing on master Python 3.6.

lesteve · 2017-10-18T09:28:43Z

sklearn/tests/test_pipeline.py

+            ('b', DummyRegressor())
+        ]))
+    ])
+    params = {


Should we use a collections.OrderedDict to make this test more future-proof?

Shrug. I think it distorts the purpose of the test somewhat if we only test with an OrderedDict.

I think this is a bit clearer (without some of the subtlety of python version-specific dict ordering):

estimator.set_params(a__b__alpha=0.001, a__b=Lasso())

lesteve · 2017-10-18T10:01:37Z

I rename the PR title because it did not seem appropriate any more. Feel free to use a better one.

lesteve · 2017-10-18T13:18:08Z

Pushed a change that should fix the flake8 error.

amueller · 2017-10-18T18:46:41Z

@jnothman you understood the problem, I guess, because you pushed the other fix? As I said, that fix might be a bit cleaner.

Just for the record:
The problem is that these are two pipelines inside each other. The logic for handling this is inside the _BaseComposition._set_params. The outer pipeline does that, but then iterates over the parameters it delegates in an arbitrary order. That means on the inner pipeline set_params is called multiple times, with one parameter each time. That means this loop in the outer pipeline determines the sequence that the parameters are set, the inner pipeline has no control.
@jnothman's fix (which was my second suggested fix) is to pass all parameters to the inner pipeline at once, so that it can handle the order of the setting itself, which allows the logic in _BaseComposition._set_params to kick in.
My fix was to order the setting of the parameters, and in lexical ordering any prefix of a string comes before the string.

amueller · 2017-10-18T18:49:07Z

LGTM. This behavior is more natural.

jnothman · 2017-10-18T21:18:46Z

Your fix is not sufficient if, say, the user sets steps=[('a', Lasso())], a__alpha=1

…ikit-learn into sorted_param_setting

jnothman · 2017-10-18T21:31:38Z

I'd missed your original comment about this potential solution!

jnothman · 2017-10-18T21:33:10Z

I'm happy to merge when Travis says my new assertion is okay.

…#9945)

marcus-voss · 2017-10-25T09:02:38Z

Hey, can you have a look at my issue here:
https://stackoverflow.com/questions/46915855/scikit-learn-set-param-for-custom-estimator-sets-nested-parameter-before-comp

This very similar case still fails.

jnothman · 2017-10-25T09:08:47Z

yes, you've diagnosed it well. thanks, I'll try work on a solution

Issue where estimator is changed as well as its parameter: scikit-learn#9945 (comment)

marcus-voss · 2017-10-25T09:52:32Z

Thanks for that very quick fix! Indeed your fix does also work for my original SO minimal example. To close the issue there: Would you want to add an answer there referring to your fix, then I could accept that and close this issue there?

…#9945)

iterate over parameters in sorted order in set_params

16526cc

jnothman mentioned this pull request Oct 18, 2017

Pipeline in Pipeline seems to not work well with setting of parameters using .set_params #9944

Closed

Pass all params with same prefix together

a3b2bb8

Test that nested estimators get passed all params at once

dd1b792

jnothman reviewed Oct 18, 2017

View reviewed changes

jnothman force-pushed the sorted_param_setting branch from b2e5a7b to 5129522 Compare October 18, 2017 03:04

Improve legibility

1c07283

jnothman force-pushed the sorted_param_setting branch from 5129522 to 1c07283 Compare October 18, 2017 03:05

Remove debug code

0588ed5

jnothman added this to the 0.19.1 milestone Oct 18, 2017

Simplify

b7b4464

jnothman force-pushed the sorted_param_setting branch from 28a3235 to b7b4464 Compare October 18, 2017 03:11

jnothman added 2 commits October 18, 2017 15:31

Fix error message

7c9c453

Unify error messages

3497f8c

You can't determine even local parameters with only class name

lesteve reviewed Oct 18, 2017

View reviewed changes

lesteve changed the title ~~MRG iterate over parameters in sorted order in set_params~~ [MRG+1] iterate over parameters in sorted order in set_params Oct 18, 2017

lesteve changed the title ~~[MRG+1] iterate over parameters in sorted order in set_params~~ [MRG+1] fix _BaseComposition._set_params with nested parameters Oct 18, 2017

jnothman and others added 2 commits October 18, 2017 22:50

Perhaps clearer test

8f66c29

Removed unused parameter

b20b284

jnothman added 2 commits October 19, 2017 08:23

A test that would have failed Andy's patch

40094b3

Merge branch 'sorted_param_setting' of https://github.com/amueller/sc…

5c308d5

…ikit-learn into sorted_param_setting

jnothman merged commit 75763cf into scikit-learn:master Oct 18, 2017

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Oct 18, 2017

FIX _BaseComposition._set_params with nested parameters (scikit-learn…

1b7d370

…#9945)

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 25, 2017

FIX bug in nested set_params usage

c890fd1

Issue where estimator is changed as well as its parameter: scikit-learn#9945 (comment)

jnothman mentioned this pull request Oct 25, 2017

[MRG] FIX bug in nested set_params usage #9999

Merged

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

FIX _BaseComposition._set_params with nested parameters (scikit-learn…

e364831

…#9945)

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

FIX _BaseComposition._set_params with nested parameters (scikit-learn…

1e34ef3

…#9945)

This was referenced Jan 23, 2018

BaseEstimator.set_params(**params) is not truly recursive #10524

Closed

set_params fn for pipeline to maintain order of parameters #10439

Closed

Uh oh!

Conversation

amueller commented Oct 17, 2017

Uh oh!

jnothman commented Oct 17, 2017 via email

Uh oh!

jnothman commented Oct 17, 2017 via email

Uh oh!

jnothman commented Oct 18, 2017

Uh oh!

codecov bot commented Oct 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jnothman left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lesteve left a comment

Choose a reason for hiding this comment

Uh oh!

lesteve Oct 18, 2017

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 18, 2017

Choose a reason for hiding this comment

Uh oh!

lesteve Oct 18, 2017

Choose a reason for hiding this comment

Uh oh!

lesteve commented Oct 18, 2017

Uh oh!

lesteve commented Oct 18, 2017

Uh oh!

amueller commented Oct 18, 2017

Uh oh!

amueller commented Oct 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Oct 18, 2017

Uh oh!

jnothman commented Oct 18, 2017

Uh oh!

jnothman commented Oct 18, 2017

Uh oh!

marcus-voss commented Oct 25, 2017

Uh oh!

jnothman commented Oct 25, 2017 via email

Uh oh!

marcus-voss commented Oct 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Oct 18, 2017 •

edited

Loading

jnothman left a comment •

edited

Loading

amueller commented Oct 18, 2017 •

edited

Loading