-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In DataFusion, @devinjdangelo is using the append_column API to write parquet files in parallel (apache/datafusion#7562)
However, when trying to copy the RowGroupMetadata to the API to copy any bloom filters / page offsets, or others is awkward
Describe the solution you'd like
I would like a way to to call the append_column api given a RowGroupMetaData object from the existing file
Ideally there would be an API that produced a ColumnCloseResult from a RowGroupMetaData or some convenience API that took a reader + RowGroupMetadata from another file and did the necessary copy
Perhaps something like
impl SerializedRowGroupWriter {
...
/// appends an entire RowGroup from the specified reader, including all
/// metadata, to the in progress parquet file.
pub fn append_row_group(&mut self, rg: Box<dyn RowGroupReader>) -> Result<...> {
...
}
}Describe alternatives you've considered
Additional context