Keep original filenames in dask.dataframe.read_csv

For the data I am reading, the path (directory name) is an important trait, and this would be useful to access (possibly as an additional column, ```path_as_column = True```) or at the very least in the collection of delayed objects
```
import dask.dataframe as dd
dd.read_csv('s3://bucket_name/*/*/*.csv', collection=True)
```

![image](https://user-images.githubusercontent.com/116120/31818992-d702cd1a-b59a-11e7-9636-dfca278ea320.png)

The collection version of the command comes closer

```
all_dfs = dd.read_csv('s3://bucket_name/*/*/*.csv', collection=True)
print('key:', all_dfs[0].key)
print('value:', all_dfs[0].compute())
```
but returns an internal code as the key and doesn't seem to have the path (s3://bucket_name/actual_folder/subfolder/fancyfile.csv) anywhere
```
key: pandas_read_text-35c2999796309c2c92e6438c0ebcbba4
value:    Unnamed: 0   T   N   M   count
0           0  T1  N0  M0  0.4454
1           5  T2  N0  M0  0.4076
2           6  T2  N0  M1  0.0666
3           1  T1  N0  M1  0.0612
4          10  T3  N0  M0  0.0054
```







Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Keep original filenames in dask.dataframe.read_csv #2802

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Keep original filenames in dask.dataframe.read_csv #2802

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions