fix(sql): inefficient commit batching during parquet exports#6596
fix(sql): inefficient commit batching during parquet exports#6596bluestreak01 merged 3 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe PR modifies batch size condition checking in copy operations from Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
core/src/main/java/io/questdb/griffin/engine/ops/InsertAsSelectOperationImpl.java (1)
144-165: Insert‑as‑select batching fix looks good; consider aligningo3MaxLagusageThe new logic:
- Keeps unordered copy for non‑timestamp inserts.
- For timestamped inserts, uses ordered batched copy only when
batchSize > 0, respecting the new precondition inSqlCompilerImpl.copyOrderedBatched.- For
batchSize <= 0, callscopyOrderedBatched(..., Long.MAX_VALUE, 0), which in practice disables intermediateic()calls and prevents per‑row commits.Functionally this achieves the desired fix. One small nit: in
SqlCompilerImpl.copyTableDatathe non‑batched path usesLong.MAX_VALUEbut still passes through the configuredo3MaxLag, whereas here you hard‑code0. With currentcopyOrderedBatchedsemantics that makes no difference (noic()calls are triggered), but for consistency and future‑proofing you may want to passo3MaxLaghere as well.core/src/test/java/io/questdb/test/griffin/CopyExportTest.java (1)
1876-1920: Well-structured regression test with minor documentation improvements suggested.The test effectively validates that parquet exports use batched commits rather than per-row commits. The approach of comparing commit counts before/after is sound, and the threshold of 100 commits provides adequate margin to detect the regression while allowing for legitimate commit operations (temp table creation, status log updates, etc.).
A few optional improvements:
Clarify the commit threshold rationale (line 1912-1913): Consider adding a brief comment explaining that the threshold accounts for temp table operations and status updates, making it clearer why 100 is appropriate.
Metric isolation consideration: The
TableWriterMetrics.getCommitCount()returns global commit counts across all tables. WhileassertMemoryLeak()should provide isolation, the test assumes no other concurrent operations are contributing commits during the export. This is likely fine in the test environment but worth noting if debugging future flakiness.Row count verification: The test correctly generates 10,000 rows (time span of 9,999 seconds inclusive of start/end), matching the
rowCountvariable. Nice attention to detail.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
core/src/main/java/io/questdb/griffin/SqlCompilerImpl.javacore/src/main/java/io/questdb/griffin/engine/ops/InsertAsSelectOperationImpl.javacore/src/test/java/io/questdb/test/griffin/CopyExportTest.java
🧰 Additional context used
🧬 Code graph analysis (1)
core/src/test/java/io/questdb/test/griffin/CopyExportTest.java (1)
core/src/main/java/io/questdb/cairo/TableWriterMetrics.java (1)
TableWriterMetrics(32-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
- GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (3)
core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java (2)
294-317: Assert onbatchSize > 0is well-protected by call-site guardsAll direct callers of
copyOrderedBatched(...)already enforce thebatchSize > 0precondition before invoking the method:
InsertAsSelectOperationImplguards withif (batchSize > 0)or passesLong.MAX_VALUESqlCompilerImplguards withelse if (batchSize > 0)or passesLong.MAX_VALUEThe assert provides valuable defensive documentation and will catch any future violations introduced by new call sites, though the current code respects the precondition throughout.
3473-3495: Logic correctly handles non-positivebatchSizeby disabling intermediate commitsThe updated logic:
- Uses unordered copying when there is no designated timestamp (
timestampIndex == -1), ignoringbatchSize.- For timestamped tables with
batchSize > 0, callscopyOrderedBatchedwith the actual batch size as the commit interval.- For
batchSize <= 0, callscopyOrderedBatchedwithLong.MAX_VALUE, which ensures the conditionrowCount >= commitTargetnever triggers during data ingestion, effectively disabling intermediateic()calls so that only the outerwriter.commit()runs.This correctly avoids single-row commits for non-positive batch sizes during data copy operations while preserving ordered ingestion semantics.
core/src/test/java/io/questdb/test/griffin/CopyExportTest.java (1)
30-30: LGTM: Import added for commit metrics validation.The
TableWriterMetricsimport is necessary for the regression test and correctly placed.
[PR Coverage check]😍 pass : 3 / 3 (100.00%) file detail
|
When copying data to tables during the parquet export process, each inserted row was a single-row commit, due to a logic bug.