Skip to content

fix(core): add dedicated batch size config for parquet export#6747

Merged
bluestreak01 merged 7 commits intomasterfrom
fix/parquet-export-large-symbol-tables
Feb 11, 2026
Merged

fix(core): add dedicated batch size config for parquet export#6747
bluestreak01 merged 7 commits intomasterfrom
fix/parquet-export-large-symbol-tables

Conversation

@ideoma
Copy link
Copy Markdown
Collaborator

@ideoma ideoma commented Feb 4, 2026

Problem

When exporting tables with 1M+ distinct symbols to parquet, the large default batch size of 1M rows (from the general CREATE TABLE AS SELECT setting) prevented frequent batch commits. This caused symbol index re-scaling to be deferred, leading to performance degradation as the symbol table grew without capacity adjustments.

Summary

  • Add new configuration property cairo.parquet.export.batch.size (default: 100K rows)
  • Use the configured batch size for parquet exports instead

…bol tables

Reduce default batch size from 1M to 100K rows and ensure parquet exports
use the configured batch size. This allows symbol index re-scaling to occur
more frequently during batch commits, preventing performance degradation
when exporting tables with 1M+ distinct symbols.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 4, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/parquet-export-large-symbol-tables

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ideoma ideoma changed the title fix(core): improve parquet export and CTAS performance with large symbol tables fix(core): improve parquet export performance for tables with many distinct symbols Feb 5, 2026
@ideoma ideoma changed the title fix(core): improve parquet export performance for tables with many distinct symbols fix(core): add dedicated batch size config for parquet export Feb 5, 2026
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 10 / 11 (90.91%)

file detail

path covered line new line coverage
🔵 io/questdb/cutlass/http/processors/ExportQueryProcessor.java 2 3 66.67%
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 1 1 100.00%
🔵 io/questdb/PropertyKey.java 1 1 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 1 1 100.00%
🔵 io/questdb/PropServerConfiguration.java 2 2 100.00%
🔵 io/questdb/cutlass/text/CopyExportContext.java 1 1 100.00%
🔵 io/questdb/griffin/engine/ops/CreateTableOperationImpl.java 2 2 100.00%

@bluestreak01 bluestreak01 merged commit da8947d into master Feb 11, 2026
44 checks passed
@bluestreak01 bluestreak01 deleted the fix/parquet-export-large-symbol-tables branch February 11, 2026 15:52
maciulis pushed a commit to maciulis/questdb that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants