feat(sql): support array column type in parquet partitions by puzpuzpuz · Pull Request #5925 · questdb/questdb

puzpuzpuz · 2025-07-09T10:20:31Z

Adds array column type support for table partitions in Apache Parquet format. This means that tables with array columns now can be converted to/from Parquet format.

CREATE TABLE x ( arr DOUBLE[], ts TIMESTAMP ) TIMESTAMP(ts) PARTITION BY DAY;

INSERT INTO x VALUES (ARRAY[1, 2, 3], '2000-01-01T00:00');
-- create a new latest partition (this partition won't be converted to Parquet)
INSERT INTO x VALUES (ARRAY[1, 2, 3], '2025-01-01T00:00');

-- convert the older partition to Parquet
ALTER TABLE x CONVERT PARTITION TO PARQUET where ts in '2000';

-- data from all partitions can be queried
SELECT * FROM x WHERE arr[1] = 1;

By default, arrays are exported as lists of double values. The Parquet field layout implements the requirements for lists. As a more lightweight, but less compatible with 3rd-party SW alternative, arrays can be exported in native binary format, i.e. as byte arrays. To do that, cairo.partition.encoder.parquet.raw.array.encoding.enabled=true config prop should be specified.

Other than that, includes the following:

Tests to verify that read_parquet() SQL function is able to read DuckDB-generated arrays (lists)
Designated timestamp column is now exported with required repetition. Also, fixes the sorting column index to be parquet file-local (table writer index was used).
Decoded parquet metadata now includes QDB column indexes, if they're present, - before this fix we were always returning parquet file-local indexes as column ids.

…in_parquet

coderabbitai · 2025-08-01T12:05:23Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds Parquet array support end-to-end, including raw array encoding. Introduces schema/type encoding for arrays, read/write paths for arrays, JNI plumbing, configuration flag, and extensive tests. Updates Parquet writer/reader utilities, schema generation, encoding APIs (hybrid RLE), and metadata handling. Bumps “CreatedBy” to version 9.0 and adjusts toolchain and minor formatting.

Changes

Cohort / File(s)	Summary
Java config: Parquet raw array encoding flag `core/src/main/java/io/questdb/PropServerConfiguration.java`, `core/src/main/java/io/questdb/PropertyKey.java`, `core/src/main/java/io/questdb/cairo/CairoConfiguration.java`, `core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java`, `core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java`	Adds boolean property CAIRO_PARTITION_ENCODER_PARQUET_RAW_ARRAY_ENCODING_ENABLED, plumbs through CairoConfiguration API and wrappers.
Java Parquet writer/encoder/JNI plumbing `core/src/main/java/io/questdb/cairo/O3PartitionJob.java`, `core/src/main/java/io/questdb/cairo/TableWriter.java`, `core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionEncoder.java`, `core/src/main/java/io/questdb/griffin/engine/table/parquet/PartitionUpdater.java`, `core/rust/qdbr/src/parquet_write/jni.rs`	Threads rawArrayEncoding flag and timestamp index through Java and JNI to native writer; updates encodeWithOptions and native signatures; determines designated timestamp; passes flag into ParquetWriter.
Java Parquet reader: arrays `core/src/main/java/io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java`	Implements getArray via BorrowedArray buffers; adds resource cleanup and adjusts binary access offsets.
Java SQL/compiler adjustment `core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java`	Removes restriction blocking Parquet conversion for array columns.
Java tests (Parquet/arrays and API updates) `core/src/test/java/io/questdb/test/griffin/ParquetTest.java`, `core/src/test/java/io/questdb/test/cairo/ArrayTest.java`, `core/src/test/java/io/questdb/test/griffin/AlterTableConvertPartitionTest.java`, `core/src/test/java/io/questdb/test/griffin/engine/table/parquet/PartitionEncoderTest.java`, `core/src/test/java/io/questdb/test/griffin/engine/table/parquet/PartitionUpdaterTest.java`, `core/src/test/java/io/questdb/test/griffin/ParallelFilterTest.java`, `core/src/test/java/io/questdb/test/griffin/engine/table/parquet/ReadParquetFunctionTest.java`, `core/src/test/java/io/questdb/test/ServerMainTest.java`, `core/src/test/resources/sqllogictest/test/parquet/array_duckdb.test`	Adds array tests and raw-array variants; updates expected outputs; adjusts calls to new encodeWithOptions signature; adds duckdb array sqllogictest; updates server parameter expectations. Formatting tweaks in some tests.
Compat tests `compat/src/test/java/io/questdb/compat/ParquetTest.java`	Adds array tests (1D/2D) for V1/V2; introduces raw array encoding test variants; updates schema/metadata expectations to include array columns; updates CreatedBy to version 9.0.
Rust writer: arrays, schema, options `core/rust/qdbr/src/parquet_write/array.rs`, `.../file.rs`, `.../schema.rs`, `.../util.rs`, `.../primitive.rs`, `.../binary.rs`, `.../fixed_len_bytes.rs`, `.../string.rs`, `.../symbol.rs`, `.../boolean.rs`, `.../update.rs`, `.../mod.rs`	Adds array page builders (nested and raw), array stats, level encoders (primitive/group) with explicit lengths, raw_array_encoding option in WriteOptions/ParquetWriter/Updater, schema support for arrays (nested or raw) and designated_timestamp; migrates stats to BinaryMaxMinStats; replaces encode_bool_iter with encode_primitive_def_levels; adapts bit-width/encoders to lengthed APIs.
Rust reader: arrays and decoding refactor `core/rust/qdbr/src/parquet_read/decode.rs`, `.../meta.rs`, `.../column_sink/var.rs`, `.../slicer/rle.rs`, `.../util.rs`, `core/rust/qdbr/src/parquet/mod.rs`	Adds RawArrayColumnSink and array decode path (plain/delta length); computes page row counts; removes explicit Version parameter; integrates LevelsIterator; adds ARRAY_NDIMS_LIMIT and align8b; refactors RLE repeat iterators.
Rust core types `core/rust/qdb-core/src/col_type.rs`, `core/rust/qdb-core/src/col_driver/array.rs`	Adds array encoding to ColumnType with dimensionality/element accessors and encode_array_type; exposes ArrayAuxEntry and its accessors.
Rust parquet2 API adjustments `core/rust/qdbr/parquet2/src/deserialize/native.rs`, `.../deserialize/utils.rs`, `.../encoding/hybrid_rle/bitmap.rs`, `.../encoding/hybrid_rle/encoder.rs`, `.../tests/it/write/binary.rs`, `.../tests/it/write/primitive.rs`	Introduces explicit lifetimes for decoders; encode_bool/encode_u32 now require explicit length; updates call sites/tests to pass lengths.
Rust misc updates `core/rust/qdbr/src/parquet/error.rs`, `.../qdb_metadata.rs`, `.../allocator.rs`, `.../lib.rs`, `core/rust/qdbr/rust-toolchain.toml`	Formatting changes for Display; removes backtrace display wrapper; minor error text updates; thread name formatting; toolchain bump to nightly-2025-02-07.

Sequence Diagram(s)

sequenceDiagram
  participant Java as Java caller
  participant PE as PartitionEncoder
  participant JNI as Native (JNI)
  participant PW as ParquetWriter
  participant S as Schema
  participant AW as Array Writer

  Java->>PE: encodeWithOptions(..., statistics, rawArrayEncoding, ...)
  PE->>JNI: encodePartition(..., statistics, rawArrayEncoding, ...)
  JNI->>PW: ParquetWriter::new(...).with_raw_array_encoding(rawArrayEncoding)
  PW->>S: to_parquet_schema(partition, rawArrayEncoding)
  alt rawArrayEncoding = true
    S-->>PW: ByteArray schema for arrays (raw)
    PW->>AW: array_to_raw_page(aux,data,...)
  else rawArrayEncoding = false
    S-->>PW: Nested LIST schema for arrays
    PW->>AW: array_to_page(primitive,dim,levels,...)
  end
  PW-->>JNI: pages written
  JNI-->>PE: success

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested labels

storage

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch puzpuzpuz_arrays_in_parquet

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…in_parquet

bluestreak01 · 2025-08-14T12:01:18Z

hey @puzpuzpuz the review is of the changes made in the PR:

puzpuzpuz · 2025-08-14T12:22:49Z

hey @puzpuzpuz the review is of the changes made in the PR:

My bad, I was looking into Outside diff range comments (12).

Removed unused encode_data_plain function that was marked as dead code and kept only for compatibility. The streaming version encode_data_plain_streaming is now used instead. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

glasstiger · 2025-08-15T14:56:40Z

[PR Coverage check]

😍 pass : 1402 / 1570 (89.30%)

file detail

	path	covered line	new line	coverage
🔵	lib.rs	0	1	00.00%
🔵	parquet/qdb_metadata.rs	0	2	00.00%
🔵	parquet_write/binary.rs	3	9	33.33%
🔵	parquet_write/varchar.rs	3	9	33.33%
🔵	parquet_write/string.rs	3	8	37.50%
🔵	allocator.rs	1	2	50.00%
🔵	parquet/error.rs	3	5	60.00%
🔵	parquet_read/column_sink/var.rs	25	37	67.57%
🔵	parquet_read/decode.rs	199	231	86.15%
🔵	parquet_write/boolean.rs	7	8	87.50%
🔵	parquet_write/update.rs	7	8	87.50%
🔵	parquet_read/meta.rs	64	71	90.14%
🔵	parquet_write/array.rs	674	741	90.96%
🔵	parquet_write/file.rs	187	205	91.22%
🔵	parquet_write/schema.rs	77	82	93.90%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java	23	24	95.83%
🔵	parquet_write/jni.rs	20	21	95.24%
🔵	parquet_read/slicer/rle.rs	5	5	100.00%
🔵	parquet_write/fixed_len_bytes.rs	4	4	100.00%
🔵	io/questdb/PropServerConfiguration.java	2	2	100.00%
🔵	parquet_write/util.rs	74	74	100.00%
🔵	io/questdb/cairo/ColumnType.java	1	1	100.00%
🔵	parquet_write/symbol.rs	5	5	100.00%
🔵	io/questdb/cairo/DefaultCairoConfiguration.java	1	1	100.00%
🔵	io/questdb/griffin/engine/table/parquet/PartitionUpdater.java	1	1	100.00%
🔵	io/questdb/cairo/O3PartitionJob.java	1	1	100.00%
🔵	io/questdb/PropertyKey.java	1	1	100.00%
🔵	parquet_write/mod.rs	5	5	100.00%
🔵	io/questdb/cairo/TableWriter.java	1	1	100.00%
🔵	io/questdb/cairo/CairoConfigurationWrapper.java	1	1	100.00%
🔵	parquet/util.rs	3	3	100.00%
🔵	parquet_write/primitive.rs	1	1	100.00%

chore(sql): support array column type in parquet partitions

c8dc8ab

puzpuzpuz self-assigned this Jul 9, 2025

puzpuzpuz added the SQL Issues or changes relating to SQL execution label Jul 9, 2025

puzpuzpuz added 14 commits July 9, 2025 15:16

Fix aux size and add tests

b1c469c

Fix array support in read_parquet, add more tests

4f24417

Align offset with driver

fbabcdb

Reuse aux struct from driver

2a6e209

Bump version in parquet schema description

0c20583

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_arrays_…

d364216

…in_parquet

Add decode methods for array dim and type

20a3b17

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_arrays_…

a521924

…in_parquet

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_arrays_…

9a4f7eb

…in_parquet

Fix compilation

4f138cf

Add global raw array encoding flag

4c4159a

Schema for nested lists

5fe376e

Add encoding param to tests

3382a3d

Half-done rep levels

c414634

questdb deleted a comment from coderabbitai bot Jul 31, 2025

puzpuzpuz added 2 commits July 31, 2025 13:45

Add null count-only stats

f4302e0

Add compat test for 1d array

c341e24

questdb deleted a comment from coderabbitai bot Jul 31, 2025

puzpuzpuz added 4 commits July 31, 2025 17:04

tmp

5b4ce4d

Proper rep levels

a424bd0

Add array dimensionality checks

c27e9ca

Simplify rep levels iterator

601b7bb

puzpuzpuz added 4 commits August 1, 2025 17:50

Proper def levels iterator

d97d753

Address a few todos

d83c029

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_arrays_…

ec098d1

…in_parquet

Clean up code

046fd22

puzpuzpuz and others added 3 commits August 14, 2025 13:54

Make rabbit happy

7a91b25

Happy rabbit, happy me

c6b1233

Rebuild Rust libraries

6972c50

puzpuzpuz and others added 2 commits August 14, 2025 15:13

More ai feedback

9b4741d

Rebuild Rust libraries

cd04d66

puzpuzpuz and others added 2 commits August 14, 2025 15:33

Improve tests

2592d63

Rebuild Rust libraries

2ae2a2d

bluestreak01 changed the title ~~chore(sql): support array column type in parquet partitions~~ feat(sql): support array column type in parquet partitions Aug 14, 2025

bluestreak01 and others added 8 commits August 14, 2025 14:16

Rebuild Rust libraries

0509fd4

Merge branch 'master' into puzpuzpuz_arrays_in_parquet

f3cd25e

Optimize general case

63fe5f6

Rebuild Rust libraries

0f531a2

Add tests for more dimensions

2bfac9c

Merge branch 'master' into puzpuzpuz_arrays_in_parquet

6899ba5

nits

72706f4

bluestreak01 approved these changes Aug 15, 2025

View reviewed changes

Merge branch 'master' into puzpuzpuz_arrays_in_parquet

c7628a8

bluestreak01 merged commit 22088ea into master Aug 15, 2025
36 of 37 checks passed

bluestreak01 deleted the puzpuzpuz_arrays_in_parquet branch August 15, 2025 16:21

coderabbitai bot mentioned this pull request Aug 28, 2025

perf(sql): recognize data sorted by timestamp in read_parquet() SQL function #6079

Merged

coderabbitai bot mentioned this pull request Sep 11, 2025

fix(pgwire): improve support for bind variables as function arguments #5999

Merged

coderabbitai bot mentioned this pull request Oct 3, 2025

feat(sql): add exporting PARQUET functionality to COPY #6008

Merged

32 tasks

coderabbitai bot mentioned this pull request Jan 14, 2026

perf(core): streaming parquet export #6300

Merged

1 task

This was referenced Jan 24, 2026

perf(sql): optimize parquet decode rowgroup performance #6632

Merged

feat(core): optimize parquet partition read with late materialization, zero-copy page reading, and use raw array encoding #6675

Merged

coderabbitai bot mentioned this pull request Feb 3, 2026

feat(parquet): add decimal type support for parquet read/write #6725

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sql): support array column type in parquet partitions#5925

feat(sql): support array column type in parquet partitions#5925
bluestreak01 merged 87 commits intomasterfrom
puzpuzpuz_arrays_in_parquet

puzpuzpuz commented Jul 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Aug 1, 2025 •

edited

Loading

Other AI code review bot(s) detected

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

bluestreak01 commented Aug 14, 2025

Uh oh!

puzpuzpuz commented Aug 14, 2025 •

edited

Loading

Uh oh!

glasstiger commented Aug 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

puzpuzpuz commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

bluestreak01 commented Aug 14, 2025

Uh oh!

puzpuzpuz commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glasstiger commented Aug 15, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

puzpuzpuz commented Jul 9, 2025 •

edited

Loading

coderabbitai bot commented Aug 1, 2025 •

edited

Loading

puzpuzpuz commented Aug 14, 2025 •

edited

Loading