feat(sql): add per-column parquet encoding/compression config by RaphDal · Pull Request #6843 · questdb/questdb

RaphDal · 2026-03-03T17:04:01Z

Tandem PR: https://github.com/questdb/questdb-enterprise/pull/932, questdb/documentation#384

Summary

Adds per-column Parquet encoding and compression configuration via CREATE TABLE and ALTER TABLE SQL syntax
Introduces a global min_compression_ratio parameter that discards compressed pages when the compression ratio (uncompressed/compressed) falls below a threshold, storing them uncompressed instead
Add RLE dictionary encoding support for all column types except Boolean and Array
Improve varchar dict encoding performance by switching from HashMap to RapidHashMap and storing indices directly in a Vec<u32> instead of fetching them from the hashmap
Enable lto and strip debuginfo from qdbr library (~50% file size reduction).

Per-column Parquet encoding and compression

Users can now specify Parquet encoding and compression on a per-column basis. This only applies to Parquet partitions and is ignored for native partitions.

SQL syntax

`CREATE TABLE`

CREATE TABLE sensors (
    ts TIMESTAMP,
    temperature DOUBLE PARQUET(rle_dictionary, zstd(3)),
    humidity FLOAT PARQUET(rle_dictionary),
    device_id VARCHAR PARQUET(default, lz4_raw),
    status INT
) TIMESTAMP(ts) PARTITION BY DAY;

The syntax is PARQUET(encoding [, compression[(level)]]). Both encoding and compression are optional — use default for the encoding when specifying compression only. When omitted entirely, the column uses the global defaults (type-based encoding and the server-wide compression codec).

`ALTER TABLE`

Setting per-column config on existing tables:

-- Set encoding only
ALTER TABLE sensors ALTER COLUMN temperature SET PARQUET(rle_dictionary);

-- Set compression only (with optional level)
ALTER TABLE sensors ALTER COLUMN temperature SET PARQUET(default, zstd(3));

-- Set both
ALTER TABLE sensors ALTER COLUMN temperature SET PARQUET(rle_dictionary, zstd(3));

-- Reset to defaults
ALTER TABLE sensors ALTER COLUMN temperature DROP PARQUET;

`SHOW CREATE TABLE`

Per-column Parquet config appears in SHOW CREATE TABLE output:

CREATE TABLE 'sensors' (
  ts TIMESTAMP,
  temperature DOUBLE PARQUET(rle_dictionary, zstd(3)),
  humidity FLOAT PARQUET(rle_dictionary),
  device_id VARCHAR PARQUET(default, lz4_raw),
  status INT
) timestamp(ts) PARTITION BY DAY BYPASS WAL;

Supported encodings

Encoding	SQL keyword	Valid column types
Plain	`plain`	All except SYMBOL and VARCHAR
RLE Dictionary	`rle_dictionary`	All except BOOLEAN and ARRAY
Delta Length Byte Array	`delta_length_byte_array`	STRING, BINARY, VARCHAR
Delta Binary Packed	`delta_binary_packed`	BYTE, SHORT, CHAR, INT, LONG, DATE, TIMESTAMP, IPv4, GEOBYTE, GEOSHORT, GEOINT, GEOLONG

The SQL parser also accepts byte_stream_split, but the Rust encoder does not implement it yet — columns configured with this encoding silently fall back to the type's default encoding.

When no encoding is specified, QuestDB picks a type-appropriate default: rle_dictionary for SYMBOL and VARCHAR, delta_length_byte_array for STRING and BINARY, and plain for everything else.

Supported compression codecs

Codec	SQL keyword	Level range
Uncompressed	`uncompressed`	--
Snappy	`snappy`	--
Gzip	`gzip`	0-9
Brotli	`brotli`	0-11
Zstd	`zstd`	1-22
LZ4 Raw	`lz4_raw`	--

When no per-column compression is specified, the column uses the global compression setting (cairo.partition.encoder.parquet.compression.codec).

Storage format

The per-column encoding config is stored as a packed 32-bit integer in the column metadata:

Bits 0-7: encoding ID
Bits 8-15: compression codec ID
Bits 16-23: compression level
Bit 24: explicit flag (1 = user-specified override, 0 = use defaults)

Minimum compression ratio

A new server configuration property controls whether compressed pages are worth keeping:

cairo.partition.encoder.parquet.min.compression.ratio=1.2

Semantics: the ratio is uncompressed_size / compressed_size. A threshold of 1.2 means "only keep compressed output if it achieves at least ~17% size reduction." When a compressed column chunk fails to meet this threshold, the encoder discards the compressed output and stores the column chunk as uncompressed instead.

Default: 1.2
A value of 0.0 (or any value <= 1.0) disables the check entirely, always keeping compressed output (backward-compatible behavior for the CairoConfiguration interface default).

The ratio check applies to both data pages and dictionary pages, and works with all compression codecs. It runs after compression, so the CPU cost of compression is still incurred -- this setting only avoids the I/O and storage penalty of keeping pages that barely compress.

The value flows from server.conf -> CairoConfiguration -> PartitionEncoder/PartitionUpdater -> JNI -> Rust WriteOptions -> the compress() function in parquet2.

Files changed

Java -- SQL parsing and metadata

SqlParser.java -- Parses PARQUET ENCODING ... COMPRESSION ... in column definitions
SqlCompilerImpl.java -- Handles ALTER TABLE ... ALTER COLUMN ... SET/DROP PARQUET ENCODING/COMPRESSION
SqlKeywords.java -- Adds parquet, encoding, compression keywords
CreateTableColumnModel.java -- Stores per-column parquet encoding/compression in the column model
CreateTableOperationBuilderImpl.java / CreateTableOperationImpl.java -- Threads encoding config through table creation
AlterOperation.java / AlterOperationBuilder.java -- New alter operation type for parquet encoding changes
ShowCreateTableRecordCursorFactory.java -- Emits PARQUET ENCODING ... COMPRESSION ... in SHOW CREATE TABLE
ParquetEncoding.java -- New file: encoding constants and validation (type-compatibility checks)
ParquetCompression.java -- Compression constants, level validation, and codec name resolution

Java -- metadata storage

TableColumnMetadata.java -- New parquetEncodingConfig field
TableUtils.java -- Pack/unpack helpers for the 32-bit encoding config, read/write from metadata memory
TableWriter.java / TableWriterMetadata.java -- Store and propagate encoding config
TableReaderMetadata.java -- Read encoding config from metadata
MetadataService.java / MetadataServiceStub.java -- Interface for setting column encoding config
CairoColumn.java -- Encoding config in column descriptor

Java -- encoder plumbing

PartitionEncoder.java -- New minCompressionRatio parameter on encodeWithOptions(), encodePartition() native, and createStreamingParquetWriter() native
PartitionUpdater.java -- New minCompressionRatio parameter on of() and create() native
PartitionDescriptor.java -- Passes per-column encoding config to Rust
TableWriter.java -- Reads minCompressionRatio from config and passes to encoder
O3PartitionJob.java -- Same for partition updater path

Java -- config

PropertyKey.java -- New CAIRO_PARTITION_ENCODER_PARQUET_MIN_COMPRESSION_RATIO property
PropServerConfiguration.java -- Loads the property (default: 1.2)
CairoConfiguration.java -- New getPartitionEncoderParquetMinCompressionRatio() method (interface default: 0.0)

Rust -- compression ratio check

parquet2/src/write/compression.rs -- Adds min_compression_ratio parameter to compress(), compress_data(), compress_dict(), and the Compressor struct. After compressing a page, checks the ratio and falls back to uncompressed if the threshold is not met.

Rust -- per-column encoding/compression

src/parquet_write/schema.rs -- to_encodings() and to_compressions() extract per-column overrides from the packed config integer. encoding_from_config() and compression_from_config() decode the packed format. validate_encoding() allows RleDictionary for all column types except Boolean and Array.
src/parquet_write/file.rs -- column_compression() selects per-column compression when available, falling back to the global setting. WriteOptions gains min_compression_ratio. ParquetWriter gains with_min_compression_ratio(). All compress()/Compressor::new() call sites thread the ratio through. column_chunk_to_dict_pages() dispatches dict encoding for all supported types. Multi-partition writes fall back to the type's default encoding for non-Symbol dict columns (to avoid invalid multi-DictPage column chunks).
src/parquet_write/update.rs -- ParquetUpdater gains min_compression_ratio field.
src/parquet_write/jni.rs -- All three JNI entry points (encodePartition, createStreamingParquetWriter, PartitionUpdater_create) accept the new min_compression_ratio parameter.

Rust -- RLE dictionary encoding for all types

The writer now supports RLE dictionary encoding for all column types except Boolean and Array. Previously only Symbol and Varchar had dict encoding. The new encoders build a RapidHashMap for value deduplication, emit a DictPage with unique values, and a DataPage with definition levels and RLE-encoded dictionary keys.

src/parquet_write/primitive.rs -- slice_to_dict_pages_simd() for i32/i64/f32/f64 SIMD types (Int, Long, Date, Timestamp, Float, Double). int_slice_to_dict_pages_nullable() for narrower nullable types (GeoByte/Short/Int/Long, IPv4). int_slice_to_dict_pages_notnull() for non-nullable types (Byte, Short, Char). decimal_slice_to_dict_pages() for Decimal types with Int32/Int64 physical representation.
src/parquet_write/string.rs -- string_to_dict_pages() converts QuestDB's UTF-16 string format to UTF-8, deduplicates, and emits length-prefixed ByteArray dict entries.
src/parquet_write/binary.rs -- binary_to_dict_pages() deduplicates binary blobs and emits length-prefixed ByteArray dict entries.
src/parquet_write/fixed_len_bytes.rs -- bytes_to_dict_pages() for fixed-length byte arrays (UUID, Long128, Long256, Decimal FLBA types). Handles byte reversal for UUID.
src/parquet_write/util.rs -- dict_pages_iter() shared helper that assembles DictPage + DataPage into a DynIter<Page>.
src/parquet_write/mod.rs -- Bench re-exports for all new dict functions. Writer-side roundtrip tests for dict encoding (Int, Long, Double, Byte notnull, all-nulls).

Tests

SqlParserTest.java -- Syntax tests for CREATE TABLE with per-column encoding/compression
AlterTableAlterColumnTest.java -- Tests for ALTER TABLE SET/DROP PARQUET ENCODING/COMPRESSION, including error cases (invalid encoding for column type, invalid codec, invalid level)
ShowCreateTableTest.java -- Tests that SHOW CREATE TABLE correctly round-trips encoding config
PartitionEncoderTest.java, PartitionUpdaterTest.java, ReadParquetFunctionTest.java, ParallelFilterTest.java -- Updated call sites
O3ParquetPartitionFuzzTest.java, WalWriterFuzzTest.java -- Fuzz tests include per-column encoding operations
Rust parquet_write/schema.rs tests -- Unit tests for packing/unpacking and per-column override logic. Updated RleDictionary validation tests for all supported types.
Rust parquet_write/mod.rs tests -- Writer-side roundtrip tests for dict encoding: Int, Long, Double, Byte (notnull), and all-nulls columns. Verifies DictPage + DataPage structure and correct encoding metadata.
Rust benches/encode_page.rs -- Dict encoding benchmarks for all supported types across 4 cardinalities (10, 100, 256, 1000) and null percentages (0%, 20%). Covers SIMD types, non-nullable int types, nullable int types, String, Binary, fixed-length byte arrays (Long128, UUID, Long256), and Decimal variants.

Varchar dict encoding performance improvements

Changing hasher to RapidHashMap (for varchar dict encoding)

Default

time:   [7.4973 ms 7.5934 ms 7.7161 ms]
thrpt:  [12.960 Melem/s 13.169 Melem/s 13.338 Melem/s]

FxHashMap

time:   [2.2165 ms 2.2247 ms 2.2345 ms]
thrpt:  [44.753 Melem/s 44.950 Melem/s 45.115 Melem/s]

RapidHashMap

time:   [2.0327 ms 2.0368 ms 2.0411 ms]
thrpt:  [48.992 Melem/s 49.097 Melem/s 49.195 Melem/s]

Storing indices directly in a `Vec<u32>` instead of fetching them from the hashmap

Before

time:   [2.0327 ms 2.0368 ms 2.0411 ms]
thrpt:  [48.992 Melem/s 49.097 Melem/s 49.195 Melem/s]

After

time:   [1.1375 ms 1.1406 ms 1.1439 ms]
thrpt:  [87.420 Melem/s 87.671 Melem/s 87.912 Melem/s]

coderabbitai · 2026-03-03T17:04:08Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 54619b55-b924-444f-b64b-f77a283f1934

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch rd_parquet_encoders

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kafka1991 · 2026-03-05T09:25:06Z

The cognitive cost for users to understand different encodings and pick the right one per column is high, and it's hard for them to quantify the effect without extensive benchmarking. Ideally, we'd explore an adaptive encoding approach where the engine automatically selects the general-better encoding per column based on data characteristics.

RaphDal · 2026-03-05T14:39:58Z

The cognitive cost for users to understand different encodings and pick the right one per column is high, and it's hard for them to quantify the effect without extensive benchmarking. Ideally, we'd explore an adaptive encoding approach where the engine automatically selects the general-better encoding per column based on data characteristics.

Indeed, we could use a sample of the data to automatically select the right encoding and compression (cc @puzpuzpuz ) but I believe that it's a bit out-of-scope for this PR.

Instead, this PR focuses on bringing the circuitry to have this feature in the future by:

Adding support for more encoders (RLE dictionary for most types).
Storing the encoding/compression on a per-column basis.

Note that when automatically selecting the encoding/compression, we're making a trade-off between column chunks sizes and decoding throughput. Some columns might be frequently accessed and might benefit from better decoding speed whereas others might be rarely accessed. For them, having a stronger compression might be more cost-effective.
Thus, having the ability to override those parameters is still beneficial.

…column handling

…ecks in SQL compiler

…LTER TABLE command

…oding

…ableOperation

…rror reporting for invalid encodings

…ly to the writer

…to use constants for clarity

… clarity

…pagation

…culation

…tionDescriptor and update related tests

glasstiger · 2026-03-19T13:36:40Z

[PR Coverage check]

😍 pass : 7001 / 7301 (95.89%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/cairo/wal/seq/MetadataServiceStub.java	0	2	00.00%
🔵	qdbr/tests/common/types/symbol.rs	0	6	00.00%
🔵	qdbr/src/parquet_read/decode/array.rs	1	5	20.00%
🔵	qdbr/tests/common/types/varchar.rs	7	24	29.17%
🔵	qdbr/src/parquet_read/slicer/mod.rs	18	30	60.00%
🔵	io/questdb/cairo/TableWriter.java	25	37	67.57%
🔵	qdbr/src/lib.rs	10	14	71.43%
🔵	qdbr/src/parquet_read/decoders/rle_dictionary.rs	14	18	77.78%
🔵	qdbr/src/parquet_read/slicer/rle.rs	14	18	77.78%
🔵	qdbr/src/parquet_read/decoders/rle.rs	36	46	78.26%
🔵	qdbr/src/parquet_read/decode.rs	9	11	81.82%
🔵	qdbr/src/parquet_write/binary.rs	74	86	86.05%
🔵	qdbr/src/parquet_write/string.rs	77	87	88.51%
🔵	io/questdb/griffin/engine/ops/AlterOperation.java	19	21	90.48%
🔵	qdbr/tests/encode_varchar.rs	36	39	92.31%
🔵	qdbr/src/parquet_write/update.rs	96	103	93.20%
🔵	qdbr/tests/encode_binary.rs	43	46	93.48%
🔵	qdbr/tests/encode_decimal.rs	256	274	93.43%
🔵	qdbr/src/parquet_write/file.rs	274	293	93.52%
🔵	qdbr/tests/encode_strings.rs	40	43	93.02%
🔵	qdbr/src/parquet_write/primitive.rs	269	288	93.40%
🔵	io/questdb/griffin/engine/ops/CreateTableOperationBuilderImpl.java	18	19	94.74%
🔵	io/questdb/griffin/engine/table/parquet/ParquetCompression.java	16	17	94.12%
🔵	qdbr/src/parquet_write/util.rs	87	92	94.57%
🔵	io/questdb/cairo/CairoColumn.java	18	19	94.74%
🔵	io/questdb/griffin/SqlParser.java	43	45	95.56%
🔵	qdbr/src/parquet_read/decoders/plain.rs	21	22	95.45%
🔵	qdbr/tests/encode_arrays.rs	95	100	95.00%
🔵	qdbr/src/parquet_write/schema.rs	353	369	95.66%
🔵	io/questdb/griffin/engine/table/parquet/ParquetEncoding.java	30	31	96.77%
🔵	io/questdb/griffin/SqlCompilerImpl.java	73	76	96.05%
🔵	qdbr/src/parquet_write/varchar.rs	26	27	96.30%
🔵	qdbr/benches/decode_rle.rs	236	244	96.72%
🔵	qdbr/src/parquet_write/jni.rs	31	32	96.88%
🔵	qdbr/parquet2/src/encoding/hybrid_rle/encoder.rs	184	189	97.35%
🔵	qdbr/parquet2/tests/it/write/mod.rs	239	244	97.95%
🔵	qdbr/tests/encode_symbol.rs	83	85	97.65%
🔵	qdbr/src/parquet_write/fixed_len_bytes.rs	68	70	97.14%
🔵	qdbr/src/parquet_write/mod.rs	2045	2100	97.38%
🔵	qdbr/tests/encode_primitives.rs	235	238	98.74%
🔵	qdbr/tests/decode_primitives.rs	171	174	98.28%
🔵	qdbr/benches/encode_page.rs	1083	1089	99.45%
🔵	qdbr/tests/common/types/binary.rs	8	8	100.00%
🔵	qdbr/parquet2/src/encoding/hybrid_rle/mod.rs	27	27	100.00%
🔵	io/questdb/cairo/TableReaderMetadata.java	8	8	100.00%
🔵	qdbr/src/parquet_read/decoders/rle_dict_varchar_slice.rs	1	1	100.00%
🔵	qdbr/parquet2/src/encoding/delta_bitpacked/encoder.rs	15	15	100.00%
🔵	qdbr/src/parquet_read/decode/decimal.rs	4	4	100.00%
🔵	io/questdb/griffin/engine/table/parquet/PartitionDescriptor.java	2	2	100.00%
🔵	io/questdb/cairo/DefaultCairoConfiguration.java	1	1	100.00%
🔵	io/questdb/cairo/O3PartitionJob.java	3	3	100.00%
🔵	io/questdb/griffin/model/ExportModel.java	1	1	100.00%
🔵	io/questdb/cairo/TableWriterMetadata.java	11	11	100.00%
🔵	io/questdb/PropertyKey.java	1	1	100.00%
🔵	qdbr/src/parquet_read/slicer/tests.rs	34	34	100.00%
🔵	qdbr/src/parquet_read/meta.rs	1	1	100.00%
🔵	qdbr/tests/common/types/primitives.rs	20	20	100.00%
🔵	qdbr/src/parquet_read/column_sink/var.rs	15	15	100.00%
🔵	io/questdb/griffin/engine/table/parquet/PartitionEncoder.java	5	5	100.00%
🔵	io/questdb/cairo/TableColumnMetadata.java	3	3	100.00%
🔵	qdbr/parquet2/src/page/mod.rs	5	5	100.00%
🔵	io/questdb/cairo/CairoConfigurationWrapper.java	1	1	100.00%
🔵	qdbr/src/parquet_read/decoders/dictionary.rs	1	1	100.00%
🔵	qdbr/parquet2/src/write/compression.rs	64	64	100.00%
🔵	io/questdb/cairo/MetadataCache.java	2	2	100.00%
🔵	io/questdb/griffin/engine/table/ShowCreateTableRecordCursorFactory.java	14	14	100.00%
🔵	io/questdb/griffin/engine/ops/AlterOperationBuilder.java	14	14	100.00%
🔵	io/questdb/PropServerConfiguration.java	2	2	100.00%
🔵	qdbr/benches/decode_page.rs	2	2	100.00%
🔵	io/questdb/cairo/TableUtils.java	9	9	100.00%
🔵	io/questdb/griffin/engine/ops/CreateTableOperationImpl.java	14	14	100.00%
🔵	qdbr/tests/common/encode.rs	200	200	100.00%
🔵	io/questdb/cairo/TableStructure.java	1	1	100.00%
🔵	io/questdb/griffin/model/CreateTableColumnModel.java	21	21	100.00%
🔵	qdbr/tests/common/types/strings.rs	18	18	100.00%

…384) Tandem PR for questdb/questdb#6843. This pull request introduces comprehensive documentation and configuration support for per-column Parquet encoding and compression in QuestDB, along with a new server property to control Parquet page compression efficiency. The changes add detailed SQL syntax, configuration options, and usage examples for both table creation and schema alteration, and update relevant documentation and diagrams to reflect these enhancements. **Per-column Parquet encoding and compression support:** - Added documentation for specifying per-column Parquet `ENCODING` and `COMPRESSION` in `CREATE TABLE` statements, including supported encodings/codecs, syntax diagrams, and usage examples. [[1]](diffhunk://#diff-c9da9f95b272b064bca67e838e1530428257a9518a36089fbb5f3947f301783dR364-R414) [[2]](diffhunk://#diff-527f6eb5b3052d4a7d0eb03d48c95f52e405a72b7b0400de35fbcf76c1a9b28aR382-R384) - Introduced a new SQL reference page for `ALTER TABLE ALTER COLUMN SET/DROP PARQUET ENCODING/COMPRESSION`, allowing users to modify or reset per-column Parquet settings on existing tables. [[1]](diffhunk://#diff-617fb65a1352e7ddd311473fca86299a9784de206f346444ccf648114bebd672R1-R48) [[2]](diffhunk://#diff-60de99a7b532d59e719f3478da78de81ef2d1457d291c9d824133ccdc8918b7eR267) - Updated the `SHOW CREATE TABLE` documentation to display per-column Parquet overrides in table definitions. **Parquet compression configuration:** - Documented the new `cairo.partition.encoder.parquet.min.compression.ratio` property, which determines whether a compressed Parquet page is stored compressed or uncompressed based on its compression ratio. Provided usage guidance and default values. [[1]](diffhunk://#diff-679d0b511f89caaaba52af97dc2d690639e90855763a244e0239234003ed5eebR187-R215) [[2]](diffhunk://#diff-cfeb04b172b8674ddd93a5b73bd064049049751437f417df1d28897aa7bb3c86R493-R496) These updates make it easier for users to fine-tune Parquet export behavior and understand the available configuration options.

questdb-butler · 2026-03-25T01:56:14Z

⚠️ Enterprise CI Failed

The enterprise test suite failed for this PR.

Build: View Details
Tested Commit: c83d0d4bd85802dc0daa38cf73a47b4e53a56581

Please investigate the failure before merging.

RaphDal force-pushed the rd_parquet_encoders branch from cf56dae to 2ec7238 Compare March 4, 2026 08:27

RaphDal marked this pull request as ready for review March 4, 2026 08:48

RaphDal added the DO NOT MERGE These changes should not be merged to main branch label Mar 4, 2026

RaphDal force-pushed the rd_parquet_encoders branch 3 times, most recently from 4c3ceaa to 5d39a8a Compare March 4, 2026 18:15

RaphDal force-pushed the rd_parquet_encoders branch from ab71679 to 8e22655 Compare March 5, 2026 09:53

RaphDal force-pushed the rd_parquet_encoders branch 4 times, most recently from e7a1bb4 to d11e7b9 Compare March 6, 2026 09:16

RaphDal mentioned this pull request Mar 6, 2026

docs: add per-column Parquet encoding and compression documentation questdb/documentation#384

Merged

RaphDal force-pushed the rd_parquet_encoders branch 9 times, most recently from 931041c to fcf7da8 Compare March 12, 2026 07:34

RaphDal removed the DO NOT MERGE These changes should not be merged to main branch label Mar 12, 2026

RaphDal force-pushed the rd_parquet_encoders branch 3 times, most recently from 24f807b to d1bf99c Compare March 13, 2026 13:48

RaphDal added DO NOT MERGE These changes should not be merged to main branch New feature Feature requests labels Mar 13, 2026

RaphDal added 5 commits March 17, 2026 19:17

tests: more regression tests.

072d650

fix: update PartitionEncoder to use TableColumnMetadata for improved …

6561910

…column handling

fix: enhance error handling for parquet encoding and authorization ch…

0b77ae9

…ecks in SQL compiler

test: add per-column encoding test with ZSTD compression for parquet

b665e72

fix: improve error message for invalid parquet compression codec in A…

6cd8646

…LTER TABLE command

RaphDal force-pushed the rd_parquet_encoders branch from f11c827 to 6cd8646 Compare March 17, 2026 18:18

Merge branch 'master' into rd_parquet_encoders

c83d0d4

RaphDal changed the title ~~feat(sql): add per-column encoding/compression configuration for parquet~~ feat(sql): add per-column parquet encoding/compression config Mar 18, 2026

RaphDal and others added 16 commits March 18, 2026 11:25

fix: improve error handling for string aux column data in parquet enc…

bb6250d

…oding

fix: add parquet encoding configuration to column metadata in CreateT…

c76981a

…ableOperation

fix: update parquet drop operation to use defined flags and enhance e…

7c25cf0

…rror reporting for invalid encodings

fix: streamline buffer usage in encode_u32 function by writing direct…

0ca9f3d

…ly to the writer

fix: update parquet encoding assertions in AlterTableAlterColumnTest …

88e46f8

…to use constants for clarity

Rebuild Rust libraries

46461a4

fix: remove unnecessary whitespace in SqlCompilerImpl to improve code…

9ddd158

… clarity

fix: add test for CREATE TABLE AS SELECT to verify parquet config pro…

4904f5b

…pagation

fix: update getParquetEncodingConfig to use a constant for offset cal…

25866f0

…culation

Merge branch 'master' into rd_parquet_encoders

f369d4d

Rebuild Rust libraries

052f7d4

fix: propagate parquet encoding configuration in O3PartitionJob

cad41c2

feat(parquet): add parquet encoding configuration to OwnedMemoryParti…

4389b72

…tionDescriptor and update related tests

fix(test): correct expected disk size in CopyExportTest for parquet file

bfa6241

Merge branch 'master' into rd_parquet_encoders

493534a

Merge branch 'master' into rd_parquet_encoders

d1857b9

bluestreak01 approved these changes Mar 20, 2026

View reviewed changes

bluestreak01 merged commit e8387cd into master Mar 20, 2026
53 checks passed

bluestreak01 deleted the rd_parquet_encoders branch March 20, 2026 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sql): add per-column parquet encoding/compression config#6843

feat(sql): add per-column parquet encoding/compression config#6843
bluestreak01 merged 102 commits intomasterfrom
rd_parquet_encoders

RaphDal commented Mar 3, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

Review skipped

Uh oh!

kafka1991 commented Mar 5, 2026 •

edited

Loading

Uh oh!

RaphDal commented Mar 5, 2026 •

edited

Loading

Uh oh!

glasstiger commented Mar 19, 2026

Uh oh!

Uh oh!

questdb-butler commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

RaphDal commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Per-column Parquet encoding and compression

SQL syntax

CREATE TABLE

ALTER TABLE

SHOW CREATE TABLE

Supported encodings

Supported compression codecs

Storage format

Minimum compression ratio

Files changed

Java -- SQL parsing and metadata

Java -- metadata storage

Java -- encoder plumbing

Java -- config

Rust -- compression ratio check

Rust -- per-column encoding/compression

Rust -- RLE dictionary encoding for all types

Tests

Varchar dict encoding performance improvements

Changing hasher to RapidHashMap (for varchar dict encoding)

Default

FxHashMap

RapidHashMap

Storing indices directly in a Vec<u32> instead of fetching them from the hashmap

Before

After

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

kafka1991 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RaphDal commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glasstiger commented Mar 19, 2026

[PR Coverage check]

file detail

Uh oh!

Uh oh!

questdb-butler commented Mar 25, 2026

⚠️ Enterprise CI Failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

RaphDal commented Mar 3, 2026 •

edited

Loading

`CREATE TABLE`

`ALTER TABLE`

`SHOW CREATE TABLE`

Storing indices directly in a `Vec<u32>` instead of fetching them from the hashmap

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

kafka1991 commented Mar 5, 2026 •

edited

Loading

RaphDal commented Mar 5, 2026 •

edited

Loading