Skip to content

_check_feature_names raising a false positive when fitting a GBDT and n_iter_no_change is not None #21618

@ArturoAmorQ

Description

@ArturoAmorQ

_check_feature_names is raising a false positive when fitting a GBDT and n_iter_no_change is not None. The following code

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

data, target = fetch_california_housing(return_X_y=True, as_frame=True)
data_train, data_test, target_train, target_test = train_test_split(
    data, target, random_state=0, test_size=0.5)

from sklearn.ensemble import GradientBoostingRegressor

gbdt = GradientBoostingRegressor(n_iter_no_change=5)
_ = gbdt.fit(data_train, target_train)

throws:

/home/arturoamor/miniforge3/envs/scikit-learn-course/lib/python3.9/site-packages/sklearn/base.py:445: UserWarning: X does not have valid feature names, but GradientBoostingRegressor was fitted with feature names.

My guess is that the problem is the call of _check_feature_names inside the _validate_data function.
Somehow n_iter_no_change is being interpreted as a fitted_feature_names.

Originally posted by @ArturoAmorQ in #21599 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions