-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When i implementing apache/datafusion#4512
I found in AsyncFileReader(df used) can not get the specific RowGroupReader
If i got the RowGroupReader then call get_column_bloom_filter will return the bloomFilter
arrow-rs/parquet/src/file/reader.rs
Lines 77 to 94 in f78a9be
| pub trait FileReader: Send + Sync { | |
| /// Get metadata information about this file. | |
| fn metadata(&self) -> &ParquetMetaData; | |
| /// Get the total number of row groups for this file. | |
| fn num_row_groups(&self) -> usize; | |
| /// Get the `i`th row group reader. Note this doesn't do bound check. | |
| fn get_row_group(&self, i: usize) -> Result<Box<dyn RowGroupReader + '_>>; | |
| /// Get full iterator of `Row`s from a file (over all row groups). | |
| /// | |
| /// Iterator will automatically load the next row group to advance. | |
| /// | |
| /// Projected schema can be a subset of or equal to the file schema, when it is None, | |
| /// full file schema is assumed. | |
| fn get_row_iter(&self, projection: Option<SchemaType>) -> Result<RowIter>; | |
| } |
async version:
arrow-rs/parquet/src/arrow/async_reader/mod.rs
Lines 128 to 148 in de9f826
| pub trait AsyncFileReader: Send { | |
| /// Retrieve the bytes in `range` | |
| fn get_bytes(&mut self, range: Range<usize>) -> BoxFuture<'_, Result<Bytes>>; | |
| /// Retrieve multiple byte ranges. The default implementation will call `get_bytes` sequentially | |
| fn get_byte_ranges( | |
| &mut self, | |
| ranges: Vec<Range<usize>>, | |
| ) -> BoxFuture<'_, Result<Vec<Bytes>>> { | |
| async move { | |
| let mut result = Vec::with_capacity(ranges.len()); | |
| for range in ranges.into_iter() { | |
| let data = self.get_bytes(range).await?; | |
| result.push(data); | |
| } | |
| Ok(result) | |
| } | |
| .boxed() | |
| } |
I think they should be consistent, Is there any other reason not supported?
Describe the solution you'd like
So i try to create a new struct AsyncRowGroupReader
Describe alternatives you've considered
Additional context
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog