Lock on bcolz

Hello,

Still playing with this package.

I have a question about the use of threads/locks on dataframe imports and bcolz. 

Take this section of code:

```
def locked_df_from_ctable(*args, **kwargs):
    with lock:
        result = dataframe_from_ctable(*args, **kwargs)
    return result
```

I _think_ what's happening here is that the lock prevents the dask threads from reading more than one bcolz file at a time. In my particular case, I have a file of about 7 million records in a bcolz dataset. When I just got rid of the lock... (i.e., got rid of the `with lock` statement and un-indented `result = ...`, the processing time was cut from around 90 seconds to 50 seconds. CPU usage never exceed 50%. (I presume the bottleneck was hard drive read speeds). 

So what's the purpose of this lock? Is it necessary in some legacy packages? I ask because I'd like to make a PR to get rid of it. In my test, I was able to just drop it and I got the same result.

Thanks. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Lock on bcolz #1033

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Lock on bcolz #1033

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions