Issue 2393: Support glob patterns for files#2394
Issue 2393: Support glob patterns for files#2394tustvold merged 27 commits intoapache:masterfrom timvw:ISSUE-2393
Conversation
|
Resolved failing clippy, lint and cargo fmt errors |
|
Not had time to look in detail, but I think the code now mixes blocking IO, as performed by glob, and non-blocking IO as performed by tokio::fs. I think we should consistently use one or the other... Perhaps we could use tokio::spawn_blocking if a runtime is available, and then use blocking IO within the closure (this is what tokio::fs does under the hood anyway). Perhaps checkout ParquetExec for an example of this? |
|
Had a look at the implementation of tokio fs operations, and copied their (non-public) asyncify method to run the globbing on... |
tustvold
left a comment
There was a problem hiding this comment.
Liking where this is headed 👍
|
Afaik this should be good to go now. |
|
Tests pass now on my windows machine as well :) |
|
Fixed clippy issues + documented the checks that are executed during a PR build (+ added script to execute them). |
|
Perhaps we could split out the formatting and CI changes into a separate PR, as I think they're someone parallel to the changes in this PR? |
…h_suffix list_file_with_suffix
data-access/src/object_store/mod.rs
Outdated
| /// Filters the file_stream to only contain files that end with suffix | ||
| fn filter_suffix(file_stream: FileMetaStream, suffix: &str) -> Result<FileMetaStream> { | ||
| let suffix = suffix.to_owned(); | ||
| Ok(Box::pin(file_stream.filter(move |fr| { |
There was a problem hiding this comment.
There was a problem hiding this comment.
This is the code that was already in place.
When I update the code to use try_filter, it will not pass-through the errors anymore (not sure if anyone was actually doing something with them though..)
There was a problem hiding this comment.
Rebased to handle the merge conflict
There was a problem hiding this comment.
it will not pass-through the errors anymore
It should do? "All errors are passed through without filtering in this combinator."
Co-authored-by: Raphael Taylor-Davies <[email protected]>
|
Thanks again 👍 |
Which issue does this PR close?
Closes #2393.
What changes are included in this PR?
When the provided prefix is not a valid path (file/directory), we will now attempt to "glob" over the prefix.
Eg: when the path is
/testdata/blah.txt -> return /testdata/blah.txt (as before)
/testdata -> return all files that exist in /testdata (as before)
/testdata/x_z -> return all files that match the given pattern (assuming that there is no file/directory '/testdata/x_z')