Skip to content

fix(sql): support copies of very wide tables and result sets#6525

Merged
bluestreak01 merged 52 commits intomasterfrom
nw_wide_copies
Dec 21, 2025
Merged

fix(sql): support copies of very wide tables and result sets#6525
bluestreak01 merged 52 commits intomasterfrom
nw_wide_copies

Conversation

@nwoolmer
Copy link
Copy Markdown
Contributor

@nwoolmer nwoolmer commented Dec 11, 2025

Fixes #3312
Fixes #3326

This PR fixes a bug preventing copying of data between tables with thousands (6000+) of columns, through upgrades to RecordToRowCopier. The same bug existed for RecordSink, used for copying large result sets within the query engine.

Background

An optimised copy routine is generated when moving data between tables or result sets. This routine is produced by a custom JIT compiler that inlines the copy instructions into Java bytecode.

Java bytecode has a method size limitation of 64K. For very wide tables (thousands of columns), this limit could be exceeded.

This PR initially introduces a new 'looping' version of the this copy routine, which is essentially the non-inlined version of the the bytecode routine. This loops for each row, and each column within that row, with a switch dispatch based on the from/to types.

When benchmarking these changes, a cliff was identified, where the bytecode routine performance dropped off significantly.

Size Mode Score (ops/s) Error
200 BYTECODE 127.737 ± 3.829
200 LOOP 95.593 ± 0.606
500 BYTECODE 5.477 ± 0.045
500 LOOP 37.878 ± 0.386

This was narrowed down further:

columnCount Mode Score (ops/s) Error
459 BYTECODE 37.721 ± 203.739
459 LOOP 40.679 ± 29.255
460 BYTECODE 5.771 ± 2.689
460 LOOP 38.517 ± 170.291

In this instance, with 459 columns, the method's bytecode size was 7995, whereas it was 8013 with 460 columns. This corresponds to the C2 compiler falling back to C1 compilation for methods larger than 8K.

To resolve this, an additional 'chunked' copier is added. This is identical to the original 'bytecode' copier, except that if the output is estimated to be more than 6K, it is chunked into several sub-methods covering a subset of columns. Then an outer method is used which invokes the sub-methods, allowing for each method to be individually C2 compiled.

For larger copies, the looping solution already became superior to the bytecode solution, but chunked generally outperforms them both.

columnCount Mode Score (ops/s) Error
10 BYTECODE 5039.909 ± 149.821
10 CHUNKED 5035.962 ± 227.926
10 LOOP 2217.538 ± 225.230
50 BYTECODE 935.253 ± 37.877
50 CHUNKED 934.717 ± 22.769
50 LOOP 428.769 ± 48.631
100 BYTECODE 405.010 ± 14.386
100 CHUNKED 404.665 ± 8.594
100 LOOP 216.923 ± 24.707
200 BYTECODE 126.786 ± 5.069
200 CHUNKED 118.335 ± 142.697
200 LOOP 106.945 ± 67.420
500 BYTECODE 33.003 ± 73.343
500 CHUNKED 36.649 ± 32.249
500 LOOP 44.049 ± 3.047
1000 BYTECODE 2.627 ± 0.233
1000 CHUNKED 18.592 ± 9.296
1000 LOOP 20.433 ± 49.360
2000 BYTECODE 1.299 ± 0.139
2000 CHUNKED 10.076 ± 0.040
2000 LOOP 10.857 ± 1.065
3000 BYTECODE 0.835 ± 1.221
3000 CHUNKED 6.626 ± 1.634
3000 LOOP 6.181 ± 1.031
4000 BYTECODE 0.653 ± 0.064
4000 CHUNKED 4.961 ± 2.186
4000 LOOP 3.406 ± 10.489
5000 CHUNKED 3.524 ± 13.199
5000 LOOP 2.442 ± 0.599
6000 BYTECODE DNF DNF
6000 CHUNKED 3.202 ± 4.605
6000 LOOP 1.907 ± 1.217

If I/O is the bottleneck, then these differences are more minor. The benchmarks test the performance of the routines themselves, sans-IO.

Changes

  • RecordToRowCopier
    • Introduce new looping variant, to handle wider copies
    • Introduce new chunked variant, for performance boost
    • Add config to disable chunking
      • cairo.sql.copier.chunked=true
    • Add benchmark to test performance
    • `RecordSink
      • Introduce new looping variant, to handle wider copies
      • Introduce new chunked variant, for performance boost
      • Add config to disable chunking same as prior
      • Add benchmark to test performance
  • Tests
    • large set function
    • INSERT AS SELECT
    • CREATE AS SELECT
    • parameterised across variants
    • fuzz?

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 11, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR resolves the "Too much input" bytecode exception for operations on wide tables (6000+ columns) by introducing chunked bytecode generation and loop-based fallback implementations. It adds a configuration flag cairo.sql.copier.chunked (default enabled) and refactors RecordSinkFactory and RecordToRowCopierUtils to split large bytecode methods across multiple smaller methods, with looping alternatives as final fallback. Integration points are updated to pass the configuration flag through the factory call chain.

Changes

Cohort / File(s) Summary
Configuration Properties
PropertyKey.java, PropServerConfiguration.java, CairoConfiguration.java, DynamicPropServerConfiguration.java
Added new CAIRO_SQL_COPIER_CHUNKED enum constant and property key; added copierChunkedEnabled field and isCopierChunkedEnabled() method to configuration; added property to dynamic reload set.
Looping Sink Implementations
LoopingRecordSink.java, LoopingRecordToRowCopier.java
New public classes providing loop-based record copying as fallback for wide tables; extensive switch-based type handling for all column types including decimals, geospatial, arrays, and conversions.
Factory Chunking Logic
RecordSinkFactory.java, RecordToRowCopierUtils.java
Refactored to support chunked bytecode generation via calculateChunkBoundaries(), estimateBytecodeSize(), and generateChunkedSink()/generateChunkedCopier() methods; added DEFAULT_METHOD_SIZE_LIMIT constant; extended public getInstance() and generateCopier() overloads with chunkedEnabled and methodSizeLimit parameters.
Bytecode Assembly Helpers
BytecodeAssembler.java
Added getMethodCodeSize() and startPrivateMethod() public methods to support splitting bytecode across multiple private methods.
Integration Points
SqlCodeGenerator.java, SqlCompilerImpl.java, MatViewRefreshJob.java, RankFunctionFactory.java, AbstractSampleByFillRecordCursorFactory.java, DistinctRecordCursorFactory.java, DistinctTimeSeriesRecordCursorFactory.java, GroupByRecordCursorFactory.java, SampleByFillNoneRecordCursorFactory.java, SampleByInterpolateRecordCursorFactory.java
Updated factory calls to pass configuration.isCopierChunkedEnabled() as new parameter; changes propagate configuration flag through code generation and sink/copier creation paths.
JMH Benchmarks
RecordSinkBenchmark.java, RecordToRowCopierBenchmark.java
New benchmark classes comparing three implementations (BYTECODE, CHUNKED, LOOP) with parameterized column counts; isolated from I/O via mock SPI/metadata; configured warmup, measurement, and forking via JMH annotations.
Test Coverage
ServerMainTest.java, RecordArrayTest.java, RecordChainTest.java, RecordSinkFactoryTest.java, OrderedMapTest.java, AbstractO3Test.java, InsertAsSelectTest.java, LoopingRecordToRowCopierTest.java, RecordToRowCopierIntegrationTest.java
Updated existing tests to pass new boolean parameter; added new test suites for looping implementations and parameterized cross-implementation tests; added helper for creating wide tables (6000 columns) and comprehensive test cases for type conversions, nulls, and wide table operations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Areas requiring extra attention:

  • LoopingRecordSink.copyColumn() and LoopingRecordToRowCopier type dispatch: Extensive switch-based logic covering 30+ column types with decimal, geospatial, and array handling; verify completeness of type coverage and correctness of conversions.
  • Chunking boundary calculation in RecordSinkFactory and RecordToRowCopierUtils: Logic for splitting bytecode across methods based on size estimates (calculateChunkBoundaries(), estimateColumnBytecodeSize()); ensure chunk splits correctly without duplicating or losing column copy logic.
  • Parameter propagation through integration points: Verify that configuration.isCopierChunkedEnabled() is correctly passed in all 10+ call sites (SqlCodeGenerator, MatViewRefreshJob, various cursor factories); check that default configurations are sensible.
  • Fallback behavior and error handling: Confirm that chunked generation gracefully falls back to looping when chunking fails (getChunkedInstanceClass() returns null), and that looping implementations handle all type scenarios.
  • BytecodeAssembler startPrivateMethod() integration: Verify that private method generation integrates correctly with the chunking workflow and that method code size limits are respected.

Possibly related PRs

  • questdb/questdb#6443: Modifies compiler/generator code paths (SqlCodeGenerator/SqlCompilerImpl) and type-conversion checks; overlaps with integration points updated in this PR.

Suggested reviewers

  • bluestreak01
  • puzpuzpuz
  • mtopolnik

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.53% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'fix(sql): support copies of very wide tables and result sets' accurately and concisely summarizes the main change, addressing the core issue of enabling data copying for very wide tables and large result sets.
Linked Issues check ✅ Passed The PR successfully addresses both linked issues: #3312 (INSERT...SELECT failing on 6000-column tables due to BytecodeException) and #3326 (SELECT with many AVG columns failing on bytecode generation), by introducing looping and chunked copier implementations that circumvent JVM bytecode size limits and optimize performance for wide schemas.
Out of Scope Changes check ✅ Passed All code changes are directly related to the PR objectives: implementing looping and chunked variants of RecordToRowCopier and RecordSink, adding configuration support, updating call sites, and adding comprehensive benchmarks and tests for very wide table operations. No unrelated changes identified.
Description check ✅ Passed The PR description is directly related to the changeset, describing the bug fix for very wide table copies and detailing the looping and chunked implementations.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nwoolmer nwoolmer added Core Related to storage, data type, etc. Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution labels Dec 11, 2025
@nwoolmer nwoolmer marked this pull request as ready for review December 11, 2025 15:30
@nwoolmer
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 11, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (13)
benchmarks/src/main/java/org/questdb/RecordToRowCopierBenchmark.java (3)

164-190: Consider removing or documenting commented-out benchmark methods.

While keeping alternative benchmarks commented out is common in JMH files, consider either:

  1. Removing them if not needed (can be retrieved from git history)
  2. Adding a brief comment explaining when/why to enable them

This is a minor readability suggestion.


194-197: Consider using Files.createTempDirectory() for safer temp directory creation.

The current pattern has two issues:

  1. Race condition between delete() and mkdir()
  2. mkdir() return value is unchecked—if it fails, the benchmark proceeds with an invalid path
-        File tempDir = File.createTempFile("questdb-bench-", "");
-        tempDir.delete();
-        tempDir.mkdir();
-        tempDbRoot = tempDir.getAbsolutePath();
+        Path tempPath = java.nio.file.Files.createTempDirectory("questdb-bench-");
+        tempDbRoot = tempPath.toAbsolutePath().toString();

360-382: Null returns for string/symbol methods are safe given current SIMPLE_TYPES.

The mock returns null for getStrA, getStrB, getSymA, getSymB. This works because SIMPLE_TYPES only includes numeric and boolean types. If STRING or SYMBOL types are added later, the copier would attempt to copy nulls, potentially causing issues.

Consider adding a comment noting this limitation, or returning empty strings for safety.

core/src/test/java/io/questdb/test/griffin/InsertAsSelectTest.java (1)

190-216: Wide-table INSERT AS SELECT tests effectively cover the original failure

The new tests and createTableManyCols helper construct ~6000‑column tables and execute INSERT … SELECT (with and without EXCEPT), which is sufficient to trigger the wide‑schema copier generation that used to throw BytecodeException. This is a good regression guard.

If you want slightly stronger coverage, you could insert a few rows into temp before the INSERT to also exercise the runtime copy loop, but the key compile‑time failure is already well covered.

core/src/test/java/io/questdb/test/cairo/RecordArrayTest.java (1)

73-73: LGTM! Consider test parameterization.

The test updates correctly pass the new copierChunkedEnabled flag to RecordSinkFactory.getInstance(). However, all tests hardcode true, exercising only the chunked-enabled path. Based on the PR objectives mentioning "parameterise tests across variants," consider adding parameterized tests that exercise both true and false values to verify correct behavior in both chunked and non-chunked modes.

Also applies to: 128-128, 206-206, 377-377

core/src/test/java/io/questdb/test/cairo/RecordSinkFactoryTest.java (1)

86-86: LGTM! Consider test parameterization.

The test updates correctly accommodate the expanded RecordSinkFactory.getInstance() API signatures. All three test methods now pass the copierChunkedEnabled flag (hardcoded as true). Line 324 also adds intermediate null parameters for optional flags, which is correct for that overload. Similar to RecordArrayTest, consider parameterizing these tests to exercise both chunked and non-chunked modes for better coverage.

Also applies to: 120-120, 324-324

benchmarks/src/main/java/org/questdb/RecordSinkBenchmark.java (1)

64-83: Benchmark harness is sound; consider minor cleanup and potential coverage extension

The benchmark setup (params, mock types/record/SPI, and the three implementation modes) looks correct and should give meaningful throughput comparisons across column counts. Two optional improvements:

  • Given benchmarkBatchCopy already overrides @BenchmarkMode and @OutputTimeUnit, the class-level @BenchmarkMode(Mode.AverageTime) / @OutputTimeUnit(TimeUnit.NANOSECONDS) are redundant; removing them would simplify the annotations.
  • If you later want to exercise more of RecordSink’s surface (e.g., STRING/SYMBOL/DECIMAL paths), you could extend SIMPLE_TYPES and add corresponding getters in MockRecord.

Also applies to: 100-105, 119-182

core/src/test/java/io/questdb/test/griffin/LoopingRecordToRowCopierTest.java (1)

33-88: Tests correctly validate copy semantics, but may not reliably hit the looping copier path

The scenarios here (many columns, wide tables, type conversions, nulls) are good functional coverage and the assertions look correct. One nuance:

  • Comments like “trigger loop-based copier” / “will use loop-based for 9 columns” and the class name suggest these tests specifically exercise the LoopingRecordToRowCopier, but with the current heuristics (bytecode size estimation + chunked mode) and default method-size limits, many of these cases will still be served by the bytecode or chunked copiers.

If the intent is to guarantee coverage of the loop-based implementation, consider, for example:

  • Overriding the configuration in this test to disable chunking and/or reduce the method-size limit so the planner is forced into the looping path for these tables; or
  • Adding a dedicated test that invokes RecordToRowCopierUtils.generateCopier(..., /*chunkedEnabled=*/false, smallLimit) directly and uses that copier in a controlled copy loop.

Alternatively, if you only care about the semantics regardless of the concrete copier, updating the comments to avoid stating that the loop-based implementation is definitely used would prevent confusion later.

Also applies to: 157-190

core/src/test/java/io/questdb/test/griffin/RecordToRowCopierIntegrationTest.java (1)

352-380: Test name doesn't match actual column count.

The test testExtremelyWideTableInsert uses getColumnCount(100), which means BYTECODE mode gets 50 columns and CHUNKED/LOOPING get 100 columns. This isn't "extremely wide" compared to testCreateTableAsSelectVeryWideTable which uses 500. Consider renaming or increasing the column count.

     @Test
-    public void testExtremelyWideTableInsert() throws Exception {
-        int columnCount = getColumnCount(100);
+    public void testWideTableInsert() throws Exception {
+        int columnCount = getColumnCount(300);
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (3)

379-390: Consider including all decimal types in the estimation.

The estimation only adds overhead for DECIMAL128 and DECIMAL256, but DECIMAL8-64 types also generate bytecode. While the current estimates work as a heuristic, more accurate estimates could help optimize chunk boundaries.


1279-1284: Chunk size validation is post-hoc; consider pre-validation.

The chunk size is validated after generating bytecode (line 1280-1281). If a single column generates more bytecode than methodSizeLimit, the method returns null after doing substantial work. Consider validating during calculateChunkBoundaries to fail earlier.

However, this is acceptable as a fallback since getMethodCodeSize() gives the actual size rather than an estimate.


1216-1228: Function keys are not chunked and could exceed method size limit.

If keyFunctions has many entries, the main copy method could still exceed the bytecode size limit since function key copying happens entirely in the main method. Consider chunking function keys as well for extreme cases.

For now, this is acceptable since typically the number of key functions is small compared to column count.

core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java (1)

386-427: Consider extracting constant pool setup to reduce code duplication.

The constant pool setup (lines 410-545) is nearly identical to the setup in generateSingleMethodCopier (lines 1563-1700). This creates maintenance burden and risk of divergence.

Consider extracting the constant pool setup into a shared helper method or a builder pattern to avoid ~135 lines of duplicated code. This would ensure both code paths remain synchronized and reduce maintenance overhead.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f63af4 and e45aeed.

📒 Files selected for processing (30)
  • benchmarks/src/main/java/org/questdb/RecordSinkBenchmark.java (1 hunks)
  • benchmarks/src/main/java/org/questdb/RecordToRowCopierBenchmark.java (1 hunks)
  • core/src/main/java/io/questdb/DynamicPropServerConfiguration.java (1 hunks)
  • core/src/main/java/io/questdb/PropServerConfiguration.java (3 hunks)
  • core/src/main/java/io/questdb/PropertyKey.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/LoopingRecordSink.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (7 hunks)
  • core/src/main/java/io/questdb/cairo/mv/MatViewRefreshJob.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/LoopingRecordToRowCopier.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java (3 hunks)
  • core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (16 hunks)
  • core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java (5 hunks)
  • core/src/main/java/io/questdb/griffin/engine/functions/window/RankFunctionFactory.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/engine/groupby/AbstractSampleByFillRecordCursorFactory.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/engine/groupby/DistinctRecordCursorFactory.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/engine/groupby/DistinctTimeSeriesRecordCursorFactory.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/engine/groupby/SampleByFillNoneRecordCursorFactory.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/engine/groupby/SampleByInterpolateRecordCursorFactory.java (1 hunks)
  • core/src/main/java/io/questdb/std/BytecodeAssembler.java (4 hunks)
  • core/src/test/java/io/questdb/test/ServerMainTest.java (1 hunks)
  • core/src/test/java/io/questdb/test/cairo/RecordArrayTest.java (4 hunks)
  • core/src/test/java/io/questdb/test/cairo/RecordChainTest.java (5 hunks)
  • core/src/test/java/io/questdb/test/cairo/RecordSinkFactoryTest.java (3 hunks)
  • core/src/test/java/io/questdb/test/cairo/map/OrderedMapTest.java (4 hunks)
  • core/src/test/java/io/questdb/test/cairo/o3/AbstractO3Test.java (1 hunks)
  • core/src/test/java/io/questdb/test/griffin/InsertAsSelectTest.java (2 hunks)
  • core/src/test/java/io/questdb/test/griffin/LoopingRecordToRowCopierTest.java (1 hunks)
  • core/src/test/java/io/questdb/test/griffin/RecordToRowCopierIntegrationTest.java (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-10T14:28:48.329Z
Learnt from: mtopolnik
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2025-11-10T14:28:48.329Z
Learning: In AsOfJoinDenseRecordCursorFactoryBase.java, the `backwardScanExhausted` flag is intentionally NOT reset in `toTop()` because backward scan results are reusable across cursor rewinds. The backward scan caches historical matches that remain valid when the cursor is rewound.

Applied to files:

  • core/src/test/java/io/questdb/test/cairo/RecordArrayTest.java
  • core/src/test/java/io/questdb/test/cairo/RecordChainTest.java
📚 Learning: 2025-08-04T09:16:27.366Z
Learnt from: jerrinot
Repo: questdb/questdb PR: 5996
File: core/src/test/java/io/questdb/test/cutlass/http/line/LineHttpSenderTest.java:350-356
Timestamp: 2025-08-04T09:16:27.366Z
Learning: In QuestDB ILP tests, maximum dimension values like `(1 << 28) - 1` are intentionally hard-coded instead of using constants like `ArrayView.DIM_MAX_LEN` to serve as "canaries" that will cause test failures if the API limits change, ensuring developers are aware of breaking changes to the ILP API.

Applied to files:

  • core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java
🧬 Code graph analysis (13)
benchmarks/src/main/java/org/questdb/RecordSinkBenchmark.java (3)
core/src/main/java/io/questdb/cairo/LoopingRecordSink.java (1)
  • LoopingRecordSink (46-361)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
  • RecordSinkFactory (43-2158)
core/src/main/java/io/questdb/std/Rnd.java (1)
  • Rnd (39-491)
core/src/main/java/io/questdb/griffin/engine/functions/window/RankFunctionFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
  • RecordSinkFactory (43-2158)
core/src/main/java/io/questdb/griffin/engine/groupby/DistinctTimeSeriesRecordCursorFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
  • RecordSinkFactory (43-2158)
core/src/test/java/io/questdb/test/griffin/LoopingRecordToRowCopierTest.java (1)
core/src/test/java/io/questdb/test/tools/TestUtils.java (1)
  • TestUtils (154-2667)
core/src/test/java/io/questdb/test/cairo/map/OrderedMapTest.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
  • RecordSinkFactory (43-2158)
core/src/main/java/io/questdb/griffin/engine/groupby/AbstractSampleByFillRecordCursorFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
  • RecordSinkFactory (43-2158)
core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java (1)
core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java (1)
  • RecordToRowCopierUtils (50-2851)
core/src/test/java/io/questdb/test/griffin/InsertAsSelectTest.java (2)
core/src/main/java/io/questdb/cairo/PartitionBy.java (1)
  • PartitionBy (44-192)
core/src/test/java/io/questdb/test/cairo/TableModel.java (1)
  • TableModel (35-238)
core/src/main/java/io/questdb/griffin/engine/groupby/SampleByFillNoneRecordCursorFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
  • RecordSinkFactory (43-2158)
core/src/main/java/io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
  • RecordSinkFactory (43-2158)
core/src/main/java/io/questdb/griffin/engine/groupby/DistinctRecordCursorFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
  • RecordSinkFactory (43-2158)
core/src/test/java/io/questdb/test/cairo/RecordChainTest.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
  • RecordSinkFactory (43-2158)
core/src/test/java/io/questdb/test/cairo/RecordSinkFactoryTest.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
  • RecordSinkFactory (43-2158)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
  • GitHub Check: New pull request (Check Changes Check changes)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (48)
benchmarks/src/main/java/org/questdb/RecordToRowCopierBenchmark.java (7)

98-139: LGTM! Well-structured benchmark configuration.

The JMH annotations, parameter ranges, and field declarations are appropriate for comparing the three copier implementations across various column counts. The commented-out @Param alternatives (lines 122-123) are useful for focused investigations around bytecode thresholds.


154-162: LGTM! Standard JMH benchmark pattern.

The batch copy loop with final Blackhole.consume() correctly prevents dead code elimination while measuring throughput.


226-247: LGTM! Clear implementation selection with good documentation.

The comments (lines 233-240) explaining the chunking threshold behavior are helpful for understanding why certain magic numbers are used.


254-276: LGTM! Proper resource cleanup.

The engine null check and recursive directory cleanup using Files.walkFileTree is the standard approach.


281-297: LGTM! Minimal mock implementation.


393-459: LGTM! Appropriate mock metadata implementation.

The mock correctly implements required interface methods with sensible defaults for benchmark isolation.


465-633: LGTM! No-op row implementation isolates copier logic from I/O overhead.

The empty method implementations are intentional and clearly documented (lines 462-463), enabling accurate measurement of copier performance.

core/src/test/java/io/questdb/test/ServerMainTest.java (1)

799-800: New cairo.sql.copier.chunked row matches config and reload semantics

The added expected row uses the correct property path/env var, default true value, and marks it as reloadable, consistent with PropertyKey.CAIRO_SQL_COPIER_CHUNKED and DynamicPropServerConfiguration.dynamicProps. Looks good.

core/src/main/java/io/questdb/DynamicPropServerConfiguration.java (1)

97-99: Making copier settings dynamically reloadable is consistent

Adding CAIRO_SQL_COPY_EXPORT_ROOT and CAIRO_SQL_COPIER_CHUNKED to dynamicProps correctly exposes these as runtime‑reloadable options and matches the updated show parameters expectations. No issues seen.

core/src/main/java/io/questdb/PropertyKey.java (1)

127-127: PropertyKey.CAIRO_SQL_COPIER_CHUNKED definition is correct

Enum entry is named and placed consistently with other Cairo SQL keys, with the expected property path used elsewhere in the PR. All good.

core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1)

745-754: Default isCopierChunkedEnabled() API is sensible

Providing a default true implementation with clear Javadoc cleanly exposes the chunked‑copier toggle while keeping existing CairoConfiguration implementors source‑compatible. Implementation matches intended behavior.

core/src/main/java/io/questdb/griffin/engine/groupby/SampleByFillNoneRecordCursorFactory.java (1)

76-77: Group-by sample-by map sink now honors copier chunking config

Passing configuration.isCopierChunkedEnabled() into RecordSinkFactory.getInstance correctly integrates the new chunked/looping copier selection into this path. No further changes needed here.

core/src/main/java/io/questdb/griffin/engine/groupby/AbstractSampleByFillRecordCursorFactory.java (1)

69-72: Sample-by fill factories now share chunked copier behavior

Using configuration.isCopierChunkedEnabled() when constructing mapSink keeps this abstract base in line with the concrete SampleBy variants and the new RecordSinkFactory API. Looks consistent.

core/src/main/java/io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java (1)

84-86: Group-by map sink correctly wired to chunked copier flag

The updated RecordSinkFactory.getInstance call passes configuration.isCopierChunkedEnabled() through the key‑aware overload, so group‑by map building will follow the new single‑method/chunked/looping selection logic. This is consistent with other call sites.

core/src/test/java/io/questdb/test/griffin/InsertAsSelectTest.java (1)

27-36: Additional imports are appropriate

Bringing in PartitionBy and TableModel is exactly what the new wide-table tests require; import layout remains consistent with the rest of the file.

core/src/main/java/io/questdb/griffin/engine/functions/window/RankFunctionFactory.java (1)

188-188: LGTM!

The addition of configuration.isCopierChunkedEnabled() correctly enables chunked copier support for rank window functions, allowing them to handle wide tables without bytecode size failures.

core/src/main/java/io/questdb/griffin/engine/groupby/DistinctTimeSeriesRecordCursorFactory.java (1)

58-58: LGTM!

The addition of configuration.isCopierChunkedEnabled() correctly propagates the chunked copier configuration to the record sink factory, enabling wide-table support for distinct time series operations.

core/src/test/java/io/questdb/test/cairo/o3/AbstractO3Test.java (1)

511-512: LGTM!

The addition of sqlExecutionContext.getCairoEngine().getConfiguration().isCopierChunkedEnabled() correctly enables chunked copier support for O3 uncommitted inserts. The configuration is appropriately sourced from the engine, and the copier caching logic remains unchanged.

core/src/main/java/io/questdb/griffin/engine/groupby/DistinctRecordCursorFactory.java (1)

87-87: LGTM!

The addition of configuration.isCopierChunkedEnabled() correctly propagates the chunked copier configuration to the record sink factory, enabling wide-table support for distinct operations. The change is consistent with similar updates across other factory classes.

core/src/main/java/io/questdb/PropServerConfiguration.java (3)

199-199: New copierChunkedEnabled field fits existing configuration style

Final, immutable boolean alongside other Cairo config flags; naming and placement are consistent, no issues spotted.


1863-1863: Property wiring for CAIRO_SQL_COPIER_CHUNKED looks correct

Using getBoolean(..., true) aligns with the intended default of enabling chunked copiers; this will also populate allPairs for introspection via the usual path.


4212-4215: Accessor isCopierChunkedEnabled correctly exposes the flag

Override matches the CairoConfiguration API and simply delegates to the stored value, so downstream components can toggle behavior via configuration as intended.

core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (8)

1392-1394: Hash join slave sink now respects copier chunking flag

Using RecordSinkFactory.getInstance(asm, slaveMetadata, entityColumnFilter, configuration.isCopierChunkedEnabled()) with entityColumnFilter.of(slaveMetadata.getColumnCount()) keeps semantics the same while enabling chunked codegen where configured. Looks correct for the non‑light hash join path.


1489-1511: Master/slave join key copiers correctly pass timestamp‑nanos bitsets and chunk flag

Both createRecordCopierMaster and createRecordCopierSlave now pass writeTimestampAsNanosB / writeTimestampAsNanosA and configuration.isCopierChunkedEnabled() into RecordSinkFactory.getInstance(...). This lines up with how processJoinContext(...) populates those bitsets and ensures join key copiers both normalize mixed‑precision timestamps and participate in chunked codegen.


2861-2868: Set-operation sinks (UNION / INTERSECT / EXCEPT) correctly propagate chunking and symbol‑to‑string behavior

In both generateIntersectOrExceptAllFactory and generateUnionFactory, the RecordSink used for set keys is now created via RecordSinkFactory.getInstance(asm, unionMetadata, entityColumnFilter, writeSymbolAsString, configuration.isCopierChunkedEnabled()). Argument ordering matches expectations (metadata → filter → writeSymbolAsString → chunk flag), and reuses the existing writeSymbolAsString semantics while enabling chunked copiers for very wide set operations.

Also applies to: 6869-6876


3395-3411: Markout horizon CROSS JOIN slave sink wired to chunked copier

The MarkoutHorizonRecordCursorFactory path now uses RecordSinkFactory.getInstance(asm, slaveMetadata, entityColumnFilter, configuration.isCopierChunkedEnabled()) with entityColumnFilter spanning all slave columns, which is appropriate for materialising slave rows and lets the copier chunk for wide slaves. No functional regression visible.


3631-3660: Latest-by sinks correctly use key column filter and chunk flag

Both LatestByRecordCursorFactory and LatestByLightRecordCursorFactory now get their sink via RecordSinkFactory.getInstance(asm, metadata, listColumnFilterA, configuration.isCopierChunkedEnabled()). Using listColumnFilterA (prepared in prepareLatestByColumnIndexes) ensures only latest-by key columns are copied, and the chunk flag aligns this path with other wide‑schema sink users.


3729-3747: Table-based latest-by variants consistently use chunked sinks

In generateLatestByTableQuery and the table-query latest by path in generateTableQuery0, all LatestBy* factories now receive sinks from RecordSinkFactory.getInstance(asm, metadata/queryMeta, listColumnFilterA, configuration.isCopierChunkedEnabled()). This reuses the same key‑column filter (listColumnFilterA) and keeps behavior unchanged while enabling chunked copiers for large key sets.

Also applies to: 6797-6819


4184-4192: ORDER BY materialisation sink updated for chunked copier

The SortedRecordCursorFactory now builds its sink via RecordSinkFactory.getInstance(asm, orderedMetadata, entityColumnFilter, configuration.isCopierChunkedEnabled()). With entityColumnFilter.of(orderedMetadata.getColumnCount()) this still copies all columns into the sort buffer, just now through chunk-aware codegen. This is appropriate given ORDER BY on very wide rows.


5573-5583: Window-function partition and cache sinks correctly wired to chunked copiers

All window paths now construct their internal sinks with the chunk flag:

  • Partition-by sink: RecordSinkFactory.getInstance(asm, keyTypes, entityColumnFilter, configuration.isCopierChunkedEnabled()).
  • Non-fast-path partition-by sink: same pattern.
  • Cached-window chain sink: RecordSinkFactory.getInstance(asm, chainTypes, listColumnFilterA, null, listColumnFilterB, null, null, configuration.isCopierChunkedEnabled()).

Argument ordering is consistent with existing usage (types → filters → optional bitsets → chunk flag), and chunking here is safe since partition and chain column counts are typically modest; semantics remain unchanged.

Also applies to: 5811-5819, 5950-5958

core/src/test/java/io/questdb/test/cairo/map/OrderedMapTest.java (1)

1129-1129: LGTM - Consistent API migration to enable chunked copier path.

All four RecordSinkFactory.getInstance calls are correctly updated to pass true for the new chunkedEnabled parameter, ensuring these tests exercise the chunked generation path. The changes are consistent across all test methods.

Also applies to: 1817-1817, 1993-1993, 2065-2065

core/src/test/java/io/questdb/test/cairo/RecordChainTest.java (1)

69-69: LGTM - Consistent update to enable chunked copier in RecordChain tests.

All five RecordSinkFactory.getInstance calls are correctly updated with the true parameter for chunkedEnabled, maintaining consistency with the API changes across the codebase.

Also applies to: 93-93, 152-152, 220-220, 379-379

core/src/main/java/io/questdb/std/BytecodeAssembler.java (3)

258-265: LGTM - Useful addition for bytecode size tracking.

The getMethodCodeSize() method correctly returns the bytecode size by computing position() - codeStart. The Javadoc appropriately references the JVM's HugeMethodLimit (8000 bytes default), which is the primary motivation for the chunked generation feature in this PR.


755-780: LGTM - Clean implementation mirroring startMethod().

The startPrivateMethod() implementation correctly mirrors startMethod() with only the access flags differing (ACC_PRIVATE vs ACC_PUBLIC). This is needed for generating private helper methods in the chunked bytecode generation approach.


127-130: LGTM - Minor cleanup changes.

The @SuppressWarnings("unused") annotation on dcmpg() and the refactoring of defineDefaultConstructor to use the invokespecial() helper method are clean improvements.

Also applies to: 154-154

core/src/main/java/io/questdb/cairo/mv/MatViewRefreshJob.java (1)

580-589: LGTM - Correct integration of chunked copier configuration.

The getRecordToRowCopier method now properly passes configuration.isCopierChunkedEnabled() to RecordToRowCopierUtils.generateCopier, enabling runtime control of the chunked copier feature for materialized view refresh operations. This is a key production code path that will benefit from the bytecode size limits fix for wide tables.

core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java (1)

2775-2783: Copier call-site updates correctly propagate the chunking flag

All updated RecordToRowCopierUtils.generateCopier(...) invocations now pass configuration.isCopierChunkedEnabled(), so INSERT, INSERT AS SELECT, CREATE TABLE AS SELECT, and backup paths consistently honor the global copier-chunking configuration. The wiring matches the new overload signature and looks coherent.

Also applies to: 2810-2854, 2901-2909, 3533-3539, 4673-4679

core/src/main/java/io/questdb/griffin/engine/groupby/SampleByInterpolateRecordCursorFactory.java (1)

174-176: LGTM!

The changes correctly propagate the copierChunkedEnabled configuration flag to both RecordSinkFactory.getInstance calls, enabling chunked or looping implementations for wide tables in the sample-by interpolation logic.

core/src/main/java/io/questdb/griffin/LoopingRecordToRowCopier.java (2)

41-49: LGTM - Well-documented loop-based copier implementation.

The class design is clean with good separation of concerns. The approach of using switch statements over generated bytecode is a solid fallback strategy for very wide tables.


763-765: [rewritten review comment]
[classification tag]

core/src/test/java/io/questdb/test/griffin/RecordToRowCopierIntegrationTest.java (2)

66-90: LGTM - Clever parameterized test setup.

The test design effectively exercises all three copier implementations:

  • BYTECODE: small column count + chunked=false → single-method bytecode
  • LOOPING: large column count + chunked=false → loop-based fallback
  • CHUNKED: large column count + chunked=true → chunked bytecode

38-64: LGTM - Well-structured parameterized test class.

The test infrastructure is solid with clear mode definitions, appropriate test data providers, and comprehensive scenario coverage across all three copier implementations.

core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)

44-52: LGTM - Well-designed chunked generation strategy.

The implementation introduces a robust three-tier approach (single-method → chunked → looping) with appropriate fallbacks. The size estimation and chunk boundary calculation provide a reasonable heuristic for splitting large schemas.

core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java (5)

52-116: LGTM! Well-structured fallback strategy for bytecode generation.

The three-path approach (single-method → chunked → loop) provides graceful degradation for very wide tables. The constants are well-documented and the DEFAULT_HUGE_METHOD_LIMIT = 8000 aligns with JVM's HugeMethodLimit default.


299-336: LGTM! Chunk boundary calculation logic is correct.

The algorithm correctly handles edge cases:

  • Single columns that exceed CHUNK_TARGET_SIZE are isolated in their own chunk
  • Empty chunks are prevented via the currentChunkSize > 0 check
  • Boundaries always include start (0) and end (n) markers

338-384: Estimation heuristics are reasonable with safety fallback.

The bytecode size estimation uses simple heuristics that may underestimate complex conversions (e.g., STRING→TIMESTAMP with driver lookup). However, the post-generation size check at line 713-716 provides a safety net by falling back to the loop-based implementation if actual size exceeds the limit.


2774-2779: LGTM! Post-generation size check provides essential safety net.

The actual bytecode size check after generation ensures correctness even when the estimation heuristics underestimate. While this may result in wasted bytecode generation when fallback occurs, this happens only during query compilation (not execution), so the performance impact is acceptable.


118-225: The code is correct as written. transferIntToDecimal and transferLongToDecimal check for null sentinels because Numbers.INT_NULL and Numbers.LONG_NULL are defined constants representing nullable int and long primitives in QuestDB. In contrast, no null sentinel constants (BYTE_NULL or SHORT_NULL) exist in the Numbers class, indicating that byte and short primitive columns do not support null values in QuestDB's type system. The null checks in transferByteToDecimal and transferShortToDecimal are therefore not applicable or necessary—the asymmetry reflects the intentional design distinction between which primitive types support nullability, not an oversight.

Likely an incorrect or invalid review comment.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😞 fail : 648 / 1297 (49.96%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 0 1 00.00%
🔵 io/questdb/griffin/engine/groupby/DistinctTimeSeriesRecordCursorFactory.java 0 1 00.00%
🔵 io/questdb/cairo/pool/SequencerMetadataPool.java 0 1 00.00%
🔵 io/questdb/cairo/wal/WalWriter.java 6 20 30.00%
🔵 io/questdb/cairo/RecordSinkFactory.java 261 699 37.34%
🔵 io/questdb/cairo/TableWriter.java 12 22 54.55%
🔵 io/questdb/griffin/LoopingRecordToRowCopier.java 192 314 61.15%
🔵 io/questdb/cairo/LoopingRecordSink.java 130 192 67.71%
🔵 io/questdb/griffin/engine/groupby/SampleByFillNoneRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/PropertyKey.java 2 2 100.00%
🔵 io/questdb/std/BytecodeAssembler.java 14 14 100.00%
🔵 io/questdb/griffin/SqlCompilerImpl.java 2 2 100.00%
🔵 io/questdb/griffin/SqlCodeGenerator.java 11 11 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 2 2 100.00%
🔵 io/questdb/griffin/engine/groupby/AbstractSampleByFillRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/PropServerConfiguration.java 4 4 100.00%
🔵 io/questdb/griffin/engine/functions/window/RankFunctionFactory.java 1 1 100.00%
🔵 io/questdb/griffin/engine/groupby/SampleByInterpolateRecordCursorFactory.java 2 2 100.00%
🔵 io/questdb/griffin/engine/groupby/DistinctRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByAtom.java 4 4 100.00%
🔵 io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/cairo/CairoConfiguration.java 1 1 100.00%

@bluestreak01
Copy link
Copy Markdown
Member

bytecode generators are fuzzed, they don't all run at the same time.

@bluestreak01 bluestreak01 merged commit dc8e45d into master Dec 21, 2025
42 of 43 checks passed
@bluestreak01 bluestreak01 deleted the nw_wide_copies branch December 21, 2025 18:50
@bluestreak01
Copy link
Copy Markdown
Member

we just under...

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior Core Related to storage, data type, etc. SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Too Much Input Exception When SELECTing Many AVG Columns Insert select query to existing table fails with 6000 columns

3 participants