-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Milestone
Description
Not sure if this is intended, but in the documentation it is mentioned the behaviour of bag.to_textfiles() should be
Write bag to disk, one filename per partition, one line per element
However, as reproduced in the following example, it actually stores everything in a line
import dask.bag as db
db.from_sequence(range(10), 10) \
.map(str) \
.to_textfiles('foo.*.txt')
and the output file reads
0123456789
After some tinkering, replacing this loop in the write() function (in dask/bag/core.py) from
for line in data:
f.write(line.encode(encoding))
to
for i, line in enumerate(data):
f.write((line if i == 0 else '\n{}'.format(line)).encode(encoding))
seems to work as mentioned in the documentation
Metadata
Metadata
Assignees
Labels
No labels