sklearn.compose.ColumnTransformer do not keep transformers` desired dtype of output

Hi,

We are using ColumnTransformer as our unified preprocessor to transform the data. We have the following transformers:

```python
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, FunctionTransformer



mean_imputer_pipline = Pipeline(steps=[('imputer', SimpleImputer(strategy='mean'))])
constan_one_imputer_pipline = Pipeline(steps=[('imputer', SimpleImputer(strategy='constant', fill_value=1.0))])
constan_zero_imputer_pipline = Pipeline(steps=[('imputer', SimpleImputer(strategy='constant', fill_value=0.0))])

categorical_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
        ('onehot', OneHotEncoder(handle_unknown='ignore', dtype=np.int8))])

ordinal_categories = [determine_mapping(ordinal_feature) for ordinal_feature in ordinal_features]

ordinal_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
                                          ('ordinal', OrdinalEncoder(categories=ordinal_categories, dtype=np.int8))])
```

From them we create our ColumnTransformer object:
```python
from sklearn.compose import ColumnTransformer



preprocessor = ColumnTransformer(
        transformers=[
            ('num_mean', mean_imputer_pipline, mean_imputer_features),
            ('num_constant_one', constan_one_imputer_pipline, constant_one_imputer_features),
            ('num_constant_zero', constan_zero_imputer_pipline, constant_zero_imputer_features),
            ('cat', categorical_transformer, categorical_features),
            ('ordinal', ordinal_transformer, ordinal_features)],
        remainder="drop")
```

When we use the ```transform``` method the desired dtypes of output for  ```categorical_transformer``` and ```ordinal_transformer``` that should be ```np.int8``` isn't kept. What we are receiving is numpy's ndarray that its dtype is ```np.float64```.

We debuged the code, and found out that in order to join the outputs from the different transformers the ColumnTransformer is using the method ```np.hstack``` which creates a unified 1 ndarray with one dtype.

Is there a solution?
If not:
1. Please add a warning in the documentation of the ColumnTransformer that the concatenate results won't use the desired dtype of output of the transformers. 
2. Allow us an option to receive the result without concatenation, as a list of ndarrays. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

sklearn.compose.ColumnTransformer do not keep transformers` desired dtype of output #24182

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

sklearn.compose.ColumnTransformer do not keep transformers` desired dtype of output #24182

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions