Skip to content

to_datetime removes _meta.index.name but does not remove the partition index names #4904

@bolliger32

Description

@bolliger32

This problem was noticed when trying to call to_parquet and getting a columns do not match the metadata ValueError. It is similar to #3003 but I believe that behavior was actually fixed for assign (?)

Example:

in.csv:

0,1900-01-01T00:00:00.000000000
1,2019-01-07T00:00:00.000000000
2,2012-04-18T00:00:00.000000000
3,2004-07-01T00:00:00.000000000
4,2010-05-12T00:00:00.000000000
import dask.dataframe as ddf
this_ddf = ddf.from_pandas(pd.read_csv('in.csv', index_col=0),npartitions=1)
this_ddf.dates = ddf.to_datetime(this_ddf.dates, format='%Y-%m-%dT%H:%M:%S')
print(this_ddf._meta.index.name, this_ddf.head().index.name)

The printed value is: (None, 'ix'). Expected result: ('ix', 'ix')

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions