-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem?
Recently we've added read_parquet() SQL function which allows one to read external Apache Parquet files:
https://questdb.io/docs/reference/function/parquet/#read_parquet
The limitation is that the function is backed with a single-threaded ReadParquetRecordCursorFactory factory while we want read_parquet() queries to run parallel.
To make this happen, we can rewrite ReadParquetRecordCursorFactory factory to implement page frame cursors (see PageFrameRecordCursorFactory). This way the rest of our query engine will consider it "normal" table with Parquet partitions (sans time order and, hence, time intrinsics) and all parallel factories, like filter and group by ones, will kick in automatically. As a result, we'll get parallel read_parquet() execution with everything we have in the query engine.
Describe the solution you'd like.
No response
Describe alternatives you've considered.
No response
Full Name:
Andrei Pechkurov
Affiliation:
QuestDB
Additional context
No response