Add ObjectStore::get_ranges (#2293)#2336
Conversation
alamb
left a comment
There was a problem hiding this comment.
Code looks good to me -- I had some suggestion about coalscing in the default impl and some additional test cases that I think would be valuable to consider
cc @Ted-Jiang and @thinkharderdev
object_store/src/lib.rs
Outdated
| let mut out = Vec::with_capacity(ranges.len()); | ||
| for range in ranges { | ||
| out.push(self.get_range(location, range.clone()).await?) | ||
| } | ||
| Ok(out) |
There was a problem hiding this comment.
Why not use coalsce_ranges here as the default implementation? That would likely perform better for user supplied object store implementations, and this PR could likely get smaller too as AWS, GCP, etc would not have to be modified
| let mut out = Vec::with_capacity(ranges.len()); | |
| for range in ranges { | |
| out.push(self.get_range(location, range.clone()).await?) | |
| } | |
| Ok(out) | |
| coalesce_ranges( | |
| ranges, | |
| |range| self.get_range(location, range), | |
| OBJECT_STORE_COALESCE_DEFAULT, | |
| ) | |
| .await |
There was a problem hiding this comment.
Yea, coalsce_ranges looks should outperform current default implementation. Looks good to use it as default one.
There was a problem hiding this comment.
I guess I was concerned that some implementations may not benefit from this, but then again perhaps those cases should override this 🤔
object_store/src/util.rs
Outdated
| } | ||
|
|
||
| /// Range requests with a gap less than or equal to this, | ||
| /// will be coalesced into a single request |
There was a problem hiding this comment.
| /// will be coalesced into a single request | |
| /// will be coalesced into a single request by [`coalesce_ranges`] |
object_store/src/util.rs
Outdated
| /// Range requests with a gap less than or equal to this, | ||
| /// will be coalesced into a single request | ||
| #[cfg(any(feature = "azure", feature = "aws", feature = "gcp"))] | ||
| pub const OBJECT_STORE_COALESCE_DEFAULT: usize = 10 * 1024; |
There was a problem hiding this comment.
- As suggested above, I think this would be useful for all implementations rather than just the built in ones
azure,awsandgcp(and thus we could avoid this feature flag) - I am not sure if there is any value to exposing this constant -- I would recommend initially keeping it
pub(crate)
There was a problem hiding this comment.
agree, like 'HDFS' usually fetch 64MB as one batch.
There was a problem hiding this comment.
Perhaps I should bump this to 1MB?
| let fetches = do_fetch(vec![0..1, 56..72, 73..75], 1).await; | ||
| assert_eq!(fetches, vec![0..1, 56..75]); | ||
|
|
||
| let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await; |
| fetches | ||
| }; | ||
|
|
||
| let fetches = do_fetch(vec![0..1, 1..2], 0).await; |
There was a problem hiding this comment.
I recommend also testing:
ranges (do_fetch(vec![]))
"off by one" coalsce":
fetches, vec![0..2, 3..5]), // no coalscefetches, vec![0..2, 2..5]), // no coalsce
|
Benchmark runs are scheduled for baseline = 0c828a9 and contender = 9a630a1. 9a630a1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
* Add ObjectStore::get_ranges (#2293) * Review feedback
Which issue does this PR close?
Closes apache/arrow-rs-object-store#239
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?