Skip to content

DictVectorizer doesn't raise NotFittedError when using transform without prior fitting #24816

@LoHertel

Description

@LoHertel

Describe the bug

When trying to call the transform method of an unfitted DictVectorizer instance an AttributeError is raised instead of a NotFittedError .

Other transformers, such as StandardScaler, make use of check_is_fitted to raise a NotFittedError.

I'm willing to make a PR to add check_is_fitted to DictVectorizer. Please let me know, if that's ok.

Steps/Code to Reproduce

from sklearn.exceptions import NotFittedError
from sklearn.feature_extraction import DictVectorizer

feat_dict = [{'col1': 'a', 'col2': 'x'},{'col1': 'b', 'col2': 'y'}]
dv = DictVectorizer()

try:
    dv.transform(feat_dict)
except NotFittedError as e:
    print("DictVectorizer is not fitted yet.")

Expected Results

NotFittedError is raised, because transform method was called without prior fitting.

Actual Results

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [4], line 8
      5 dv = DictVectorizer()
      7 try:
----> 8     dv.transform(feat_dict)
      9 except NotFittedError as e:
     10     print("DictVectorizer is not fitted yet.")

File ~/../lib/python3.11/site-packages/sklearn/feature_extraction/_dict_vectorizer.py:373, in DictVectorizer.transform(self, X)
    356 def transform(self, X):
    357     """Transform feature->value dicts to array or sparse matrix.
    358 
    359     Named features not encountered during fit or fit_transform will be
   (...)
    371         Feature vectors; always 2-d.
    372     """
--> 373     return self._transform(X, fitting=False)

File ~/.../lib/python3.11/site-packages/sklearn/feature_extraction/_dict_vectorizer.py:207, in DictVectorizer._transform(self, X, fitting)
    205     vocab = {}
    206 else:
--> 207     feature_names = self.feature_names_
    208     vocab = self.vocabulary_
    210 transforming = True

AttributeError: 'DictVectorizer' object has no attribute 'feature_names_'

Versions

System:
    python: 3.11.0 (main, Oct 24 2022, 19:56:13) [GCC 11.2.0]
executable: /home/user/.cache/pypoetry/virtualenvs/sklean-dev-SgMhZBSc-py3.11/bin/python
   machine: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.1.3
          pip: 22.3
   setuptools: 65.5.0
        numpy: 1.23.4
        scipy: 1.9.3
       Cython: None
       pandas: 1.5.1
   matplotlib: 3.6.0
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions