Skip to content

Conversation

@tguillemot
Copy link
Contributor

I open a PR to let people talk about sample_props.

@amueller
Copy link
Member

amueller commented Jun 7, 2017

@tguillemot
Copy link
Contributor Author

Add alternative proposition at the end of the document.

|:----------------|:-----------------------------------------------------------------------------------------------|
| grid.fit | `{'weights': weights, 'cv__groups': cv_groups, split_groups': gs_groups}` |
| grid.score | `{'weights': weights, 'cv__groups': cv_groups, split_groups': gs_groups}` |
| grid.split | `{'weights': weights, 'groups': gs_groups, 'cv__groups': cv_groups, split_groups': gs_groups}` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grid.split isn't a function..?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally conceived of the routing idea as telling a meta-estimator where to pass each provided property, i.e. {'cv': ['sample_weight', 'groups']}, etc. But it seems instead we are replicating the data for each routing target. I've not yet thought of problems with this. IMO, it is a move towards deprecating sample_weight, which I think is a reasonable idea (although we should do it slowly, or in version 1.0). And it is somewhat consistent with Pipeline.fit kwargs semantics. I think I like it.

I've not read thoroughly enough to know if these are covered:

  • If sample_props is not provided, we will not attempt to pass a value for it.
  • If a sample_props key is unused/unwanted by an estimator, should it raise an error (like passing sample_weight when unwanted currently does)? If so, this behaviour must be enforced by common tests.
  • We seem to be creating naming conflicts with this design: we can no longer have a pipeline step named cv or score (should it be scoring?) in a grid search. Would we handle that with a warning?

I think this approach is worth trying. But I suggest that we keep the names of keys singular: 'weight', 'group'; not 'weights', 'groups'. In a similar vein, we need to choose the name sample_props, props, prop, etc. carefully.

@jnothman
Copy link
Member

jnothman commented Jun 8, 2017

I've just taken a quick look at the Google Doc, not that it's easy to read. I think an approach where routing is handled by meta-estimator is IMO more explicit and resilient to bugs / forwards compatibility issues than when the routing is handled by the base estimator / CV object / etc (akin to dependency injection). It might be easier to write code with the latter style, though.

@jnothman
Copy link
Member

jnothman commented Jun 8, 2017 via email

@amueller
Copy link
Member

amueller commented Jun 8, 2017 via email

@jnothman
Copy link
Member

jnothman commented Jun 8, 2017 via email

@tguillemot
Copy link
Contributor Author

@jnothman It's not great but I tried to summarize the discussion of the google doc # 4. Alternative propositions for sample_props (06.17.17). Not sure but maybe it can help.

@amueller
Copy link
Member

amueller commented Jun 8, 2017

basically because we would need kwargs in all of our fit methods, which is not great and can swallow all kinds of errors. It can also cause problems for depending projects that use fit args.

@jnothman
Copy link
Member

jnothman commented Jun 8, 2017 via email

@amueller
Copy link
Member

amueller commented Jun 8, 2017

hm... and we'd use the same / a similar routing mechanism? I think I like that.... and without routing it's passed to everything?

@jnothman
Copy link
Member

jnothman commented Jun 8, 2017 via email

@amueller
Copy link
Member

amueller commented Jun 8, 2017

well pipeline also has a default routing now with fit_params, right?

@jnothman
Copy link
Member

jnothman commented Jun 8, 2017 via email

@ogrisel
Copy link
Member

ogrisel commented Jun 10, 2017

We also need to take the RidgeCV, LogisticRegressionCV, LassoCV... into account: those need to have a way route sample probs info about the grouping to be used for leave-p group out CV to the cv object itself (as is the case for GridSearchCV et al).

@glemaitre
Copy link
Member

Speaking with @ogrisel IRL, TransformTargetRegressor could be considered as use case. It takes a transformer and a regressor. At fit, one has to choose how to handle the sample_weight: 2 different sample_weight could be passed for each estimator, a single for both, etc. For the moment, I could not find a transformer with a sample_weight but I assume that it could happen to be useful.

@jnothman
Copy link
Member

jnothman commented Jun 14, 2017 via email

@jnothman
Copy link
Member

jnothman commented Aug 8, 2017

I've decided this is a/the chief API priority and will try to put some focussed energy into it soon...

@jnothman
Copy link
Member

jnothman commented Aug 9, 2017

For the moment, I could not find a transformer with a sample_weight but I assume that it could happen to be useful.

It's be easy to invent one from, say SelectFromModel. Yes, it supports arbitrary fit_params.

@jnothman
Copy link
Member

jnothman commented Aug 9, 2017

The document here focuses on the solution without clearly analysing the problem. Our classic case here is sample_weight. The problems are then that:

  • we treat sample_weight as special, while there are other sample-aligned properties that should be potentially passed around by meta-estimators
  • implementing support for it can be a hassle or can be forgotten and added later, creating maintenance overhead
  • we still have some cases of kwargs to methods that are not sample aligned (e.g. return_std; and more in "scikit-learn-compatible" libraries)
  • we often create conditional structures like do_something(X, y, sample_weight=sample_weight) if sample_weight is not None else do_something(X, y) but that can be avoided by **props where props={'sample_weight': sample_weight}.
  • we don't know where to route it and have diverse specifications for routing. Sometimes we even use has_fit_parameter which I think should be outlawed (because it does not support meta-estimators readily).

There's some notion raised here that we need a dataframe-like object, but I contend that we've got almost no problems due to sample_weight being a kwarg, except perhaps that:

  • the method needs to protect against the possibility that sample_weight isn't passed when it should be, and vice versa.
  • sometimes it has been passed as a positional arg (not sure this is a problem for backwards compatibility)
  • if there were more props, it might take some effort to make sure everything is bundled up correctly and passed around if they are kwargs.

Advantages of sticking to kwargs include that:

  • most methods won't accept (many) props, and Python will deal with this automatically
  • backwards compatibility (and no deprecation)
  • don't need to support multiple dataframe-like formats
  • can still be operated with **my_dataframe

So, IMO the main problem is routing. Will come back with more on that soon. Starting by analysing what we have so far.

@jnothman
Copy link
Member

jnothman commented Aug 9, 2017

I should state the broader problem:
Our meta-estimation framework (basically pipeline + grid search) breaks for some kinds of problems (e.g. weighted scoring). When they break, users write their own, and we stop offering the benefits of consistent interface, compatibility, etc., as well as no longer protecting users from errors such as data leakage and best-practice cross-validation.

@jnothman
Copy link
Member

jnothman commented Aug 9, 2017

Existing sample props:

  • y: fit-like methods are required to support y as the second positional argument. Nonetheless, we sometimes have code paths like that for sample_weight where we do_something(X, y) if y is not None else do_something(X), as in TransformerMixin.fit_transform, cross-val fitting, BaseSearchCV refitting. We also have code which validates y as either being None or of the same length as X. If passed where unwanted, no error will be raised.
  • sample_weight: accepted (and always optional) essentially only in fit-style functions (including fit_predict, fit_transform and partial_fit) and scoring functions (including gradient boosting errors). It also forms part of a utils.dataset.*Dataset. If passed where unwanted an error will often be raised.
  • groups may be passed into *SearchCV.fit where it is validated, and then passed to the CV splitter mandatorily (i.e. all CV splitters must accept groups as a positional argument). If passed where unwanted, no error is raised. If not passed when wanted, an error is raised. No splitters currently use groups without requiring it.

Handling of **kwargs in ways that do not just pass through:

  • *SearchCV has fit_params attribute. Corresponding data is sliced up and passed to fit in cross validation.
  • parameters passed to Pipeline.fit{,_transform,_predict} are interpreted as using double-underscore to divide between the step name and the kwarg name. They are only passed to the respective steps' fit/fit_transform. Any names with an unknown step name prefix result in a KeyError. The step estimator has the responsibility deal with unknown names.
  • FeatureUnion.fit does not support kwargs, but FeatureUnion.fit_transform does. It passes all to the constituent transformers' fit or fit_transform methods (no use of double-underscore prefixing).

Principles of routing:

  • All meta-estimators (where fit or fit_* on a wrapped estimator is called by a parent estimator) should be able to support forwarding any props to the downstream fit.
  • Meta-estimators where another method (transform, predict, etc) is called should perhaps be able to pass props to those methods. In the case of something like Pipeline this would need to be done during Pipeline.fit and hence either needs to be passed to all steps in the pipeline, or the user must be able to control which steps (or the step can choose/indicate its acceptance or rejection of some prop).
  • It should be possible to supply sample_weight to a composite metaestimator, such as a Pipeline or FeatureUnion, but for only some of the downstream estimators to receive it (but this could be controlled on the recipient's side if we wish).
  • All estimators using cv.split must be able to receive groups in their fit and pass them on to cv.split.
  • In turn, this means that any meta-estimator using cv.split must be able to pass on groups to a downstream fit, despite the fact that many downstream estimators would not accept groups. This suggests the need for a switch to say whether or not groups should be passed on. Or maybe this should be something that the wrapped estimator indicates.
  • Similarly, all estimators using scoring must be able to receive sample_weight in their fit and pass it on to scoring. In turn, this means that any meta-estimators must be able to pass on sample_weight to a downstream scoring, but this may not always be desired behaviour. Similarly, we may want to support passing weights to scoring, but not to the downstream fit.

@jnothman
Copy link
Member

jnothman commented Aug 9, 2017

Further design issues in master:

  • prop may need to be repeated under multiple names to be passed into Pipeline.fit
  • contradictory/incompatible approach to props in FeatureUnion and Pipeline

Questions:

  • do we want to assume "one sample weight at a time" or, say, should we be allowed to have a different sample weight for fit and for score? How much does this cost us in usability?
  • should user prescribe routing (whether/where a particular prop should be passed) at: (a) meta-estimator construction time; (b) meta-estimator fit time, as in Pipeline now; (c) base estimator construction time? I feel (c) is a bad idea and I can try to expand on why if someone asks.

Solution alternatives:

  1. pass data and routing information (probably via parameter naming) to fit
  2. pass data to fit and routing information to meta-estimator constructor.

Within 2, there are a few further orthogonal design decisions:

  • DIRECTION: routing is specified:
    a. as a mapping from each source fit param name to a list of sinks; or
    b. as a mapping from each sink path to a single source fit param
  • RENAME: the routing sink:
    a. includes the parameter name that should be used in the destination, allowing parameters to be renamed in routing
    b. just specifies the sink and the parameter is not renamed (i.e. sample_weight has to be passed as sample weight, and it is impossible to specify a different weight for different parts of a pipeline or for fit vs score)

If DIRECTION:=b then there are further questions such as whether routing is specified as a single constructor parameter, or multiple / in a specialised way depending on the meta-estimator.

I'll try make these designs, and their benefits, more concrete with example syntaxes at some point soon.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after looking over parts of the proposal here, my main critiques are:

  • I don't think we benefit from having sample_props as a separate dict from kwargs.
  • I don't think we usually benefit from letting an estimator receive a sample property that it does not explicitly support or pass on, so we may not need seterr
  • I think we are over-using underscores, when routing paths can use other string delimiters (e.g. :), or rely on (named)tuples instead.

One challenge which I don't think is addressed so far here, is passing props to predict/transform, if we want to support that case. It's easy to imagine the use of some grouping information for a ranking or a time series problem... although those things could perhaps have been passed in as part of X if need be, or each sample could represent a group. Do we want to support props in predict? This becomes tricky in part because scorers call predict.

@amueller
Copy link
Member

I solemnly swear I'll catch up to this soon ;)

@jnothman
Copy link
Member

jnothman commented Aug 14, 2017 via email

@jnothman
Copy link
Member

I think the syntactic elegance of annotating an estimator with what it should receive, but it:

  • won't work for methods other than fit
  • won't work for scorers or cv defined as strings/numbers
  • may be hard to maintain backwards compatibility with.

My approach adds a prop_routing parameter to metaestimators. Default behaviour is to maintain backwards compatibility. The metaestimator defines a set of destinations and a router function (either determined from user specification, or the default) translates the set of destinations and the dict of props into a dict for each destination.

How to specify routing is still open. Let's imagine we want to route parameter foo to fit(..) of all steps of a pipeline, and bar only to the step named me, where it is to be received with the name baz. In all cases we call pipeline.fit(X, y, foo=[1,2,3], bar=[4,5,6])

  • {src_prop: [(dst_pattern, dst_prop), ...]} is what I've currently almost implemented at scikit-learn/scikit-learn@master...jnothman:route. E.g. Pipeline(prop_routing={'foo': [('*', '*')], 'bar': [('me', 'baz')]})
  • {dst_pattern: {src_prop: dst_prop}} may be nicer to work with. E.g. Pipeline(prop_routing={'*': {'foo': 'foo'}, 'me': {'bar': 'baz'}})
  • variants of the latter, like [(dst_pattern, {src_prop: dst_prop}), ...] and [(dst_pattern, src_prop, dst_prop]), ...] (e.g. Pipeline()) are also options

While I knew it was the case for Pipeline, I've belatedly realised that none of these allows us to neatly express the current routing policy for GridSearch, which is "pass everything but groups to estimator.fit; pass groups to cv".

One problem I have is forward compatibility if we allow destinations to be matched by patterns rather than exact name matches. If new destination possibilities (such as passing to transform in Pipeline) get added, this changes the meaning of the routing. This is one major advantage of annotating the destination with what it should receive. Alternatively, maybe string matching over destinations '' isn't what we want, but having destination aliases defined by the router, e.g. that destination '' in pipeline means "fit for all steps".

Question: Cases where we should raise errors:

  • where a destination pattern does not match one known: I think so.
  • where some props are not routed anywhere: not sure (but yes in Pipeline).

@jnothman
Copy link
Member

I'm off work sick today and seem to have produced a reasonable implementation of a variant at scikit-learn/scikit-learn#9566

@amueller
Copy link
Member

I think it would be great to have a list of application examples and then implement them in each of the alternative proposals. I tried to do that in the google doc, with more and more complex examples, i.e. nested cross-validation, pipelines where sample_weights are passed to some steps, or to score etc.

@jnothman
Copy link
Member

Yeah, I've been thinking more of the "deep" solution where there is no routing, just kwarg naming conventions. The main problems are that it's verbose (but easy to read) and hard to maintain backwards compatibility, particularly in FeatureUnion and in a small way through making reserved prefixes in cross validation.

@jnothman
Copy link
Member

Those things aren't issues if we don't use kwargs, but I really think we should stick to kwargs.

@jnothman
Copy link
Member

And yes, demonstrations are a good idea. I just won't have time for any of this until closer to November, I expect.

@amueller
Copy link
Member

I think we can wait another 2 month with this ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants