feat(sql): add per-column parquet encoding/compression config#6843
feat(sql): add per-column parquet encoding/compression config#6843bluestreak01 merged 102 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
cf56dae to
2ec7238
Compare
4c3ceaa to
5d39a8a
Compare
|
The cognitive cost for users to understand different encodings and pick the right one per column is high, and it's hard for them to quantify the effect without extensive benchmarking. Ideally, we'd explore an adaptive encoding approach where the engine automatically selects the general-better encoding per column based on data characteristics. |
ab71679 to
8e22655
Compare
Indeed, we could use a sample of the data to automatically select the right encoding and compression (cc @puzpuzpuz ) but I believe that it's a bit out-of-scope for this PR. Instead, this PR focuses on bringing the circuitry to have this feature in the future by:
Note that when automatically selecting the encoding/compression, we're making a trade-off between column chunks sizes and decoding throughput. Some columns might be frequently accessed and might benefit from better decoding speed whereas others might be rarely accessed. For them, having a stronger compression might be more cost-effective. |
e7a1bb4 to
d11e7b9
Compare
931041c to
fcf7da8
Compare
24f807b to
d1bf99c
Compare
…ecks in SQL compiler
…LTER TABLE command
f11c827 to
6cd8646
Compare
…rror reporting for invalid encodings
…to use constants for clarity
…tionDescriptor and update related tests
[PR Coverage check]😍 pass : 7001 / 7301 (95.89%) file detail
|
…384) Tandem PR for questdb/questdb#6843. This pull request introduces comprehensive documentation and configuration support for per-column Parquet encoding and compression in QuestDB, along with a new server property to control Parquet page compression efficiency. The changes add detailed SQL syntax, configuration options, and usage examples for both table creation and schema alteration, and update relevant documentation and diagrams to reflect these enhancements. **Per-column Parquet encoding and compression support:** - Added documentation for specifying per-column Parquet `ENCODING` and `COMPRESSION` in `CREATE TABLE` statements, including supported encodings/codecs, syntax diagrams, and usage examples. [[1]](diffhunk://#diff-c9da9f95b272b064bca67e838e1530428257a9518a36089fbb5f3947f301783dR364-R414) [[2]](diffhunk://#diff-527f6eb5b3052d4a7d0eb03d48c95f52e405a72b7b0400de35fbcf76c1a9b28aR382-R384) - Introduced a new SQL reference page for `ALTER TABLE ALTER COLUMN SET/DROP PARQUET ENCODING/COMPRESSION`, allowing users to modify or reset per-column Parquet settings on existing tables. [[1]](diffhunk://#diff-617fb65a1352e7ddd311473fca86299a9784de206f346444ccf648114bebd672R1-R48) [[2]](diffhunk://#diff-60de99a7b532d59e719f3478da78de81ef2d1457d291c9d824133ccdc8918b7eR267) - Updated the `SHOW CREATE TABLE` documentation to display per-column Parquet overrides in table definitions. **Parquet compression configuration:** - Documented the new `cairo.partition.encoder.parquet.min.compression.ratio` property, which determines whether a compressed Parquet page is stored compressed or uncompressed based on its compression ratio. Provided usage guidance and default values. [[1]](diffhunk://#diff-679d0b511f89caaaba52af97dc2d690639e90855763a244e0239234003ed5eebR187-R215) [[2]](diffhunk://#diff-cfeb04b172b8674ddd93a5b73bd064049049751437f417df1d28897aa7bb3c86R493-R496) These updates make it easier for users to fine-tune Parquet export behavior and understand the available configuration options.
|
Tandem PR: https://github.com/questdb/questdb-enterprise/pull/932, questdb/documentation#384
Summary
CREATE TABLEandALTER TABLESQL syntaxmin_compression_ratioparameter that discards compressed pages when the compression ratio (uncompressed/compressed) falls below a threshold, storing them uncompressed insteadHashMaptoRapidHashMapand storing indices directly in aVec<u32>instead of fetching them from the hashmapltoand stripdebuginfofrom qdbr library (~50% file size reduction).Per-column Parquet encoding and compression
Users can now specify Parquet encoding and compression on a per-column basis. This only applies to Parquet partitions and is ignored for native partitions.
SQL syntax
CREATE TABLEThe syntax is
PARQUET(encoding [, compression[(level)]]). Both encoding and compression are optional — usedefaultfor the encoding when specifying compression only. When omitted entirely, the column uses the global defaults (type-based encoding and the server-wide compression codec).ALTER TABLESetting per-column config on existing tables:
SHOW CREATE TABLEPer-column Parquet config appears in
SHOW CREATE TABLEoutput:Supported encodings
plainrle_dictionarydelta_length_byte_arraydelta_binary_packedThe SQL parser also accepts
byte_stream_split, but the Rust encoder does not implement it yet — columns configured with this encoding silently fall back to the type's default encoding.When no encoding is specified, QuestDB picks a type-appropriate default:
rle_dictionaryfor SYMBOL and VARCHAR,delta_length_byte_arrayfor STRING and BINARY, andplainfor everything else.Supported compression codecs
uncompressedsnappygzipbrotlizstdlz4_rawWhen no per-column compression is specified, the column uses the global compression setting (
cairo.partition.encoder.parquet.compression.codec).Storage format
The per-column encoding config is stored as a packed 32-bit integer in the column metadata:
Minimum compression ratio
A new server configuration property controls whether compressed pages are worth keeping:
Semantics: the ratio is
uncompressed_size / compressed_size. A threshold of1.2means "only keep compressed output if it achieves at least ~17% size reduction." When a compressed column chunk fails to meet this threshold, the encoder discards the compressed output and stores the column chunk as uncompressed instead.1.20.0(or any value <= 1.0) disables the check entirely, always keeping compressed output (backward-compatible behavior for theCairoConfigurationinterface default).The ratio check applies to both data pages and dictionary pages, and works with all compression codecs. It runs after compression, so the CPU cost of compression is still incurred -- this setting only avoids the I/O and storage penalty of keeping pages that barely compress.
The value flows from
server.conf->CairoConfiguration->PartitionEncoder/PartitionUpdater-> JNI -> RustWriteOptions-> thecompress()function inparquet2.Files changed
Java -- SQL parsing and metadata
SqlParser.java-- ParsesPARQUET ENCODING ... COMPRESSION ...in column definitionsSqlCompilerImpl.java-- HandlesALTER TABLE ... ALTER COLUMN ... SET/DROP PARQUET ENCODING/COMPRESSIONSqlKeywords.java-- Addsparquet,encoding,compressionkeywordsCreateTableColumnModel.java-- Stores per-column parquet encoding/compression in the column modelCreateTableOperationBuilderImpl.java/CreateTableOperationImpl.java-- Threads encoding config through table creationAlterOperation.java/AlterOperationBuilder.java-- New alter operation type for parquet encoding changesShowCreateTableRecordCursorFactory.java-- EmitsPARQUET ENCODING ... COMPRESSION ...inSHOW CREATE TABLEParquetEncoding.java-- New file: encoding constants and validation (type-compatibility checks)ParquetCompression.java-- Compression constants, level validation, and codec name resolutionJava -- metadata storage
TableColumnMetadata.java-- NewparquetEncodingConfigfieldTableUtils.java-- Pack/unpack helpers for the 32-bit encoding config, read/write from metadata memoryTableWriter.java/TableWriterMetadata.java-- Store and propagate encoding configTableReaderMetadata.java-- Read encoding config from metadataMetadataService.java/MetadataServiceStub.java-- Interface for setting column encoding configCairoColumn.java-- Encoding config in column descriptorJava -- encoder plumbing
PartitionEncoder.java-- NewminCompressionRatioparameter onencodeWithOptions(),encodePartition()native, andcreateStreamingParquetWriter()nativePartitionUpdater.java-- NewminCompressionRatioparameter onof()andcreate()nativePartitionDescriptor.java-- Passes per-column encoding config to RustTableWriter.java-- ReadsminCompressionRatiofrom config and passes to encoderO3PartitionJob.java-- Same for partition updater pathJava -- config
PropertyKey.java-- NewCAIRO_PARTITION_ENCODER_PARQUET_MIN_COMPRESSION_RATIOpropertyPropServerConfiguration.java-- Loads the property (default:1.2)CairoConfiguration.java-- NewgetPartitionEncoderParquetMinCompressionRatio()method (interface default:0.0)Rust -- compression ratio check
parquet2/src/write/compression.rs-- Addsmin_compression_ratioparameter tocompress(),compress_data(),compress_dict(), and theCompressorstruct. After compressing a page, checks the ratio and falls back to uncompressed if the threshold is not met.Rust -- per-column encoding/compression
src/parquet_write/schema.rs--to_encodings()andto_compressions()extract per-column overrides from the packed config integer.encoding_from_config()andcompression_from_config()decode the packed format.validate_encoding()allows RleDictionary for all column types except Boolean and Array.src/parquet_write/file.rs--column_compression()selects per-column compression when available, falling back to the global setting.WriteOptionsgainsmin_compression_ratio.ParquetWritergainswith_min_compression_ratio(). Allcompress()/Compressor::new()call sites thread the ratio through.column_chunk_to_dict_pages()dispatches dict encoding for all supported types. Multi-partition writes fall back to the type's default encoding for non-Symbol dict columns (to avoid invalid multi-DictPage column chunks).src/parquet_write/update.rs--ParquetUpdatergainsmin_compression_ratiofield.src/parquet_write/jni.rs-- All three JNI entry points (encodePartition,createStreamingParquetWriter,PartitionUpdater_create) accept the newmin_compression_ratioparameter.Rust -- RLE dictionary encoding for all types
The writer now supports RLE dictionary encoding for all column types except Boolean and Array. Previously only Symbol and Varchar had dict encoding. The new encoders build a
RapidHashMapfor value deduplication, emit a DictPage with unique values, and a DataPage with definition levels and RLE-encoded dictionary keys.src/parquet_write/primitive.rs--slice_to_dict_pages_simd()for i32/i64/f32/f64 SIMD types (Int, Long, Date, Timestamp, Float, Double).int_slice_to_dict_pages_nullable()for narrower nullable types (GeoByte/Short/Int/Long, IPv4).int_slice_to_dict_pages_notnull()for non-nullable types (Byte, Short, Char).decimal_slice_to_dict_pages()for Decimal types with Int32/Int64 physical representation.src/parquet_write/string.rs--string_to_dict_pages()converts QuestDB's UTF-16 string format to UTF-8, deduplicates, and emits length-prefixed ByteArray dict entries.src/parquet_write/binary.rs--binary_to_dict_pages()deduplicates binary blobs and emits length-prefixed ByteArray dict entries.src/parquet_write/fixed_len_bytes.rs--bytes_to_dict_pages()for fixed-length byte arrays (UUID, Long128, Long256, Decimal FLBA types). Handles byte reversal for UUID.src/parquet_write/util.rs--dict_pages_iter()shared helper that assembles DictPage + DataPage into aDynIter<Page>.src/parquet_write/mod.rs-- Bench re-exports for all new dict functions. Writer-side roundtrip tests for dict encoding (Int, Long, Double, Byte notnull, all-nulls).Tests
SqlParserTest.java-- Syntax tests forCREATE TABLEwith per-column encoding/compressionAlterTableAlterColumnTest.java-- Tests forALTER TABLE SET/DROP PARQUET ENCODING/COMPRESSION, including error cases (invalid encoding for column type, invalid codec, invalid level)ShowCreateTableTest.java-- Tests thatSHOW CREATE TABLEcorrectly round-trips encoding configPartitionEncoderTest.java,PartitionUpdaterTest.java,ReadParquetFunctionTest.java,ParallelFilterTest.java-- Updated call sitesO3ParquetPartitionFuzzTest.java,WalWriterFuzzTest.java-- Fuzz tests include per-column encoding operationsparquet_write/schema.rstests -- Unit tests for packing/unpacking and per-column override logic. Updated RleDictionary validation tests for all supported types.parquet_write/mod.rstests -- Writer-side roundtrip tests for dict encoding: Int, Long, Double, Byte (notnull), and all-nulls columns. Verifies DictPage + DataPage structure and correct encoding metadata.benches/encode_page.rs-- Dict encoding benchmarks for all supported types across 4 cardinalities (10, 100, 256, 1000) and null percentages (0%, 20%). Covers SIMD types, non-nullable int types, nullable int types, String, Binary, fixed-length byte arrays (Long128, UUID, Long256), and Decimal variants.Varchar dict encoding performance improvements
Changing hasher to RapidHashMap (for varchar dict encoding)
Default
FxHashMap
RapidHashMap
Storing indices directly in a
Vec<u32>instead of fetching them from the hashmapBefore
After