Skip to content

feat(sql): multi-threaded read_parquet() execution#5256

Merged
bluestreak01 merged 13 commits intomasterfrom
puzpuzpuz_parallel_read_parquet
Jan 15, 2025
Merged

feat(sql): multi-threaded read_parquet() execution#5256
bluestreak01 merged 13 commits intomasterfrom
puzpuzpuz_parallel_read_parquet

Conversation

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@puzpuzpuz puzpuzpuz commented Dec 17, 2024

Closes #5250

read_parquet() SQL function now supports parallel execution. This means that parallel filtering and aggregation supported by QuestDB's query engine now apply to read_parquet() as if it was a native table.

Note: we don't support projection for virtual table functions like read_parquet(), so if you have, say, 1000 columns in your parquet, but you query only a single column, we'll end up decoding all 1000 columns. This is something to improve in future. Tracked in #5280

Parallel read_parquet() execution can be disabled with new cairo.sql.parallel.read.parquet.enabled configuration property.

Also includes the following:

  • Removes redundant methods from the PageFrameCursor interface
  • Fixes incorrect error handling in parquet-to-native partition conversion

TODOs:

  • Try extracting physical table-only methods from PageFrameCursor into a child TablePageFrameCursor interface and use it where appropriate.
  • Add a prop to enable/disable parallel read_parquet().

@puzpuzpuz puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Dec 17, 2024
@puzpuzpuz puzpuzpuz self-assigned this Dec 17, 2024
@nwoolmer nwoolmer self-requested a review December 18, 2024 11:42
@puzpuzpuz puzpuzpuz marked this pull request as ready for review January 2, 2025 09:28
@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

@nwoolmer the PR should be ready for review now. PTAL

ideoma
ideoma previously approved these changes Jan 14, 2025
Copy link
Copy Markdown
Member

@bluestreak01 bluestreak01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still under review

mtopolnik
mtopolnik previously approved these changes Jan 14, 2025
@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

@mtopolnik thanks for the review!

bluestreak01
bluestreak01 previously approved these changes Jan 15, 2025
@bluestreak01 bluestreak01 dismissed stale reviews from mtopolnik and themself via 2bb4445 January 15, 2025 14:51
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 145 / 164 (88.41%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/vm/MemoryPARWImpl.java 0 6 00.00%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetPageFrameRecordCursorFactory.java 20 23 86.96%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java 15 17 88.24%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java 61 69 88.41%
🔵 io/questdb/cutlass/pgwire/modern/PGPipelineEntry.java 1 1 100.00%
🔵 io/questdb/PropServerConfiguration.java 3 3 100.00%
🔵 io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java 4 4 100.00%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetFunctionFactory.java 5 5 100.00%
🔵 io/questdb/cairo/sql/PageFrameMemoryRecord.java 1 1 100.00%
🔵 io/questdb/cairo/vm/api/MemoryCR.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/bind/Long256BindVariable.java 1 1 100.00%
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/cast/CastSymbolToLong256FunctionFactory.java 1 1 100.00%
🔵 io/questdb/PropertyKey.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/PageFrameRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/conditional/CoalesceFunctionFactory.java 2 2 100.00%
🔵 io/questdb/cairo/TableWriter.java 15 15 100.00%
🔵 io/questdb/griffin/engine/table/FilterOnExcludedValuesRecordCursorFactory.java 2 2 100.00%
🔵 io/questdb/std/Numbers.java 6 6 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 1 1 100.00%
🔵 io/questdb/cutlass/pgwire/PGConnectionContext.java 1 1 100.00%
🔵 io/questdb/griffin/SqlUtil.java 1 1 100.00%

@bluestreak01 bluestreak01 merged commit 8b9ab3a into master Jan 15, 2025
@bluestreak01 bluestreak01 deleted the puzpuzpuz_parallel_read_parquet branch January 15, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel read_parquet() SQL function execution

5 participants