Pipeline: apply all transformations except the last classifier

`Pipeline` should provide a method to apply its transformations to an arbitrary dataset **without**  `transform` from the last classifier step.

## Use case:
Boosted tree models like `XGBoost` and `LightGBM` use a validation set for early stopping.
We can trivially apply the pipeline to train and test via fit and predict but not for the validation set.

----------

After raising the issue and proposing 2 ideas at LightGBM, https://github.com/Microsoft/LightGBM/issues/299 and XGBoost, https://github.com/dmlc/xgboost/issues/2039, I believe it **should be handled at `Scikit-learn` level.**

#### Idea 1, have a dummy `transform` method in `XGBClassifier` and `LGBMClassifier`

The `transform` method for pipeline/classifier is already extremely inconsistent :
* Failure because the classifier step does not implement transform
* Deprecated feature importance extraction for trees ensemble
* NN features proposition for MLPClassifier https://github.com/scikit-learn/scikit-learn/issues/8291
* Decision path proposition for trees ensemble https://github.com/scikit-learn/scikit-learn/issues/7907
Furthermore the issue will pop up again if the last classifier is an ensemble of multiple models

#### Idea 2, Implement a validation_split parameter for early stopping

Early stopping in [KerasClassifier](https://keras.io/models/sequential/#sequential-model-methods) is controlled by a validation_split parameter.
At first I thought that could be used in `XGBClassifier` and `LGBMClassifier` and everything else that would need a validation set for early stopping.
The issue here is that there is no control over the validation set and split. Furthermore if there is a need to inspect deeper into validation issues, I suppose it would be non-trivial to extract it from the classifier or provide an API for it.

---------

Hence I think `Scikit-learn` need a method or parameter in transform to ignore the last step or the last n steps.

If needed I can raise a related issue on having a consistent `transform` method for classifiers and keep this one focused on applying `transform` without classification on arbitrary data.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pipeline: apply all transformations except the last classifier #8414

Use case:

Idea 1, have a dummy `transform` method in `XGBClassifier` and `LGBMClassifier`

Idea 2, Implement a validation_split parameter for early stopping

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pipeline: apply all transformations except the last classifier #8414

Description

Use case:

Idea 1, have a dummy transform method in XGBClassifier and LGBMClassifier

Idea 2, Implement a validation_split parameter for early stopping

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Idea 1, have a dummy `transform` method in `XGBClassifier` and `LGBMClassifier`