Skip to content

from_s3 with a private bucket + take hangs #1115

@updiversity

Description

@updiversity

Hi,

not sure if this is an issue.

I am trying to use dask to read a big csv file (10Gb) which is located on a private S3 repo. Strangely, dask hangs when I use the following command:

import dask.bag as dbg
bag = dbg.from_s3(s3_path,aws_access_key=aws_access_key,aws_secret_key=aws_secret_key)
bag.take(0)

while by just using pandas read_csv, i can instantaneously have access to the row i want, using these lines:

import pandas as pd
df = pd.read_csv(s3_path,chunksize=chunksize)
df.get_chunk(1)

with

s3_path = "s3://<bucket_name>/<key_name>"

any idea what am i doing wrong ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions