-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Handle np.nan / missing values in SplineTransformer #26793
Description
I think it would be quite natural to add an option to SplineTransformer to accept inputs with missing values as follows:
handle_missing="error"the default (keep current behavior)handle_missing="zero"/"constant": encode missing values by setting all output features for that input column to 0 (or some other constant, see discussion below),handle_missing="indicator": append an extra binary feature as missingness indicator and encode missing values as 0 on the remaining output features.
Note that handle_missing="indicator" would be different and statistically more meaningful than SimpleImputer(strategy="mean", add_indicator=True) with SplineTransformer and furthermore would make for leaner ML pipelines (better UX).
I am not sure if we need to add the handle_missing="zero" option. It would break the property the sum of output values of a given SplineTransformer encoding always sum to 1 while handle_missing="indicator" would preserve this property (in addition to make missingness more explicit to the downstream model in case missingness is informative one way or another).
If we want to preserve the sum to 1 property while not adding an explicit missingness indicator feature, maybe we could instead provide handle_missing="constant" (not sure about the name) that would encode missing values as 1 / n_outputs to preserve the "sum to 1" property. Not entirely sure if this would result in a more interesting prior than the zero encoding.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status