Describe the workflow you want to enable
Currently, in DictVectorizer, the _transform function is limited to 32 bit indices, meaning that there is a limit of ~2B rows/cols in the resultant matrix.
This issue seeks to enable using DictVectorizer on larger datasets by increasing this to work with 64 bit values.
Describe your proposed solution
Update the dtypes for the indices in _dict_vectorizer.py as follows:
- from
np.int32 to np.int64
- from
np.intc to np.int_
- update from
array("i") to array("l") to get the 64 bit signed long instead of signed int