-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Description
Hi,
not sure if this is an issue.
I am trying to use dask to read a big csv file (10Gb) which is located on a private S3 repo. Strangely, dask hangs when I use the following command:
import dask.bag as dbg
bag = dbg.from_s3(s3_path,aws_access_key=aws_access_key,aws_secret_key=aws_secret_key)
bag.take(0)
while by just using pandas read_csv, i can instantaneously have access to the row i want, using these lines:
import pandas as pd
df = pd.read_csv(s3_path,chunksize=chunksize)
df.get_chunk(1)
with
s3_path = "s3://<bucket_name>/<key_name>"
any idea what am i doing wrong ?