ARROW-1417: [Python] Allow more generic filesystem objects to be passed to ParquetDataset#1032
ARROW-1417: [Python] Allow more generic filesystem objects to be passed to ParquetDataset#1032fjetter wants to merge 3 commits intoapache:masterfrom
Conversation
|
Great! I opened https://issues.apache.org/jira/browse/ARROW-1455 so we can set up some kind of testing arrangement so we can validate that we don't break our integrations with Dask before Arrow and Dask releases -- this would be a trunk vs trunk test so I'm not comfortable adding it directly to Arrow's CI in case there is transient instability that might cause broken builds. If we get really energetic we could set up an automated build to report on the state of the integration tests |
|
I'll make sure the S3 tests pass locally before merging this (using the |
|
Good idea to add these tests. Although, I would prefer to have them run automatically in the CI. We could add it to travis and allow build failures for these tests only, c.f. https://docs.travis-ci.com/user/customizing-the-build#Rows-that-are-Allowed-to-Fail |
|
If we can add them to Travis and it does not inflate build times significantly then I am all for it. Ideally we would run trunk-to-trunk. We are doing that with parquet-cpp, which does introduce occasional brittleness (but it's definitely been worth it) |
|
Rebased |
wesm
left a comment
There was a problem hiding this comment.
+1. I verified that the S3 tests still pass. Will merge once the builds complete
This way, the
ParquetDatasetaccepts bothS3FileSystemandLocalFileSystemobjects as they are used indask. By usingissubclass, external libraries may write their own FS wrappers by inheriting from the arrow FS.I tested the integration with dask and this will fix the issue blocking dask/dask#2527