-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the enhancement requested
Currently GetRecordBatchReader API accepts row_group_indices and column_indices. It would be nice to extend the API to accept one more parameter: A row_ranges indicating a subset of rows to be retrieved. With the provided row_ranges, RecordBatchReader can skip unnecessary pages (by comparing the row_ranges with the might-exist page index) as well as unwanted rows.
- original:
::arrow::Status GetRecordBatchReader(const std::vector<int>& row_group_indices,
const std::vector<int>& column_indices,
std::shared_ptr<::arrow::RecordBatchReader>* out);- proposal:
::arrow::Status GetRecordBatchReader(
const std::vector<int>& row_group_indices, const std::vector<int>& column_indices,
const std::shared_ptr<std::map<int, RowRangesPtr>>& row_ranges_map, # a row_ranges per Row Group
std::shared_ptr<::arrow::RecordBatchReader>* out);API clients can query page index or other kinds of index (e.g. external secondary index) to construct the row_ranges.
Component(s)
C++
Reactions are currently unavailable