Skip to content

Support get_row_group in AsyncFileReader #3851

@Ted-Jiang

Description

@Ted-Jiang

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When i implementing apache/datafusion#4512
I found in AsyncFileReader(df used) can not get the specific RowGroupReader

If i got the RowGroupReader then call get_column_bloom_filter will return the bloomFilter

pub trait FileReader: Send + Sync {
/// Get metadata information about this file.
fn metadata(&self) -> &ParquetMetaData;
/// Get the total number of row groups for this file.
fn num_row_groups(&self) -> usize;
/// Get the `i`th row group reader. Note this doesn't do bound check.
fn get_row_group(&self, i: usize) -> Result<Box<dyn RowGroupReader + '_>>;
/// Get full iterator of `Row`s from a file (over all row groups).
///
/// Iterator will automatically load the next row group to advance.
///
/// Projected schema can be a subset of or equal to the file schema, when it is None,
/// full file schema is assumed.
fn get_row_iter(&self, projection: Option<SchemaType>) -> Result<RowIter>;
}

async version:
pub trait AsyncFileReader: Send {
/// Retrieve the bytes in `range`
fn get_bytes(&mut self, range: Range<usize>) -> BoxFuture<'_, Result<Bytes>>;
/// Retrieve multiple byte ranges. The default implementation will call `get_bytes` sequentially
fn get_byte_ranges(
&mut self,
ranges: Vec<Range<usize>>,
) -> BoxFuture<'_, Result<Vec<Bytes>>> {
async move {
let mut result = Vec::with_capacity(ranges.len());
for range in ranges.into_iter() {
let data = self.get_bytes(range).await?;
result.push(data);
}
Ok(result)
}
.boxed()
}

I think they should be consistent, Is there any other reason not supported?
Describe the solution you'd like

So i try to create a new struct AsyncRowGroupReader

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions