fix(sql): support copies of very wide tables and result sets#6525
fix(sql): support copies of very wide tables and result sets#6525bluestreak01 merged 52 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR resolves the "Too much input" bytecode exception for operations on wide tables (6000+ columns) by introducing chunked bytecode generation and loop-based fallback implementations. It adds a configuration flag Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Areas requiring extra attention:
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (13)
benchmarks/src/main/java/org/questdb/RecordToRowCopierBenchmark.java (3)
164-190: Consider removing or documenting commented-out benchmark methods.While keeping alternative benchmarks commented out is common in JMH files, consider either:
- Removing them if not needed (can be retrieved from git history)
- Adding a brief comment explaining when/why to enable them
This is a minor readability suggestion.
194-197: Consider usingFiles.createTempDirectory()for safer temp directory creation.The current pattern has two issues:
- Race condition between
delete()andmkdir()mkdir()return value is unchecked—if it fails, the benchmark proceeds with an invalid path- File tempDir = File.createTempFile("questdb-bench-", ""); - tempDir.delete(); - tempDir.mkdir(); - tempDbRoot = tempDir.getAbsolutePath(); + Path tempPath = java.nio.file.Files.createTempDirectory("questdb-bench-"); + tempDbRoot = tempPath.toAbsolutePath().toString();
360-382: Null returns for string/symbol methods are safe given currentSIMPLE_TYPES.The mock returns
nullforgetStrA,getStrB,getSymA,getSymB. This works becauseSIMPLE_TYPESonly includes numeric and boolean types. IfSTRINGorSYMBOLtypes are added later, the copier would attempt to copy nulls, potentially causing issues.Consider adding a comment noting this limitation, or returning empty strings for safety.
core/src/test/java/io/questdb/test/griffin/InsertAsSelectTest.java (1)
190-216: Wide-table INSERT AS SELECT tests effectively cover the original failureThe new tests and
createTableManyColshelper construct ~6000‑column tables and executeINSERT … SELECT(with and withoutEXCEPT), which is sufficient to trigger the wide‑schema copier generation that used to throwBytecodeException. This is a good regression guard.If you want slightly stronger coverage, you could insert a few rows into
tempbefore theINSERTto also exercise the runtime copy loop, but the key compile‑time failure is already well covered.core/src/test/java/io/questdb/test/cairo/RecordArrayTest.java (1)
73-73: LGTM! Consider test parameterization.The test updates correctly pass the new
copierChunkedEnabledflag toRecordSinkFactory.getInstance(). However, all tests hardcodetrue, exercising only the chunked-enabled path. Based on the PR objectives mentioning "parameterise tests across variants," consider adding parameterized tests that exercise bothtrueandfalsevalues to verify correct behavior in both chunked and non-chunked modes.Also applies to: 128-128, 206-206, 377-377
core/src/test/java/io/questdb/test/cairo/RecordSinkFactoryTest.java (1)
86-86: LGTM! Consider test parameterization.The test updates correctly accommodate the expanded
RecordSinkFactory.getInstance()API signatures. All three test methods now pass thecopierChunkedEnabledflag (hardcoded astrue). Line 324 also adds intermediate null parameters for optional flags, which is correct for that overload. Similar toRecordArrayTest, consider parameterizing these tests to exercise both chunked and non-chunked modes for better coverage.Also applies to: 120-120, 324-324
benchmarks/src/main/java/org/questdb/RecordSinkBenchmark.java (1)
64-83: Benchmark harness is sound; consider minor cleanup and potential coverage extensionThe benchmark setup (params, mock types/record/SPI, and the three implementation modes) looks correct and should give meaningful throughput comparisons across column counts. Two optional improvements:
- Given
benchmarkBatchCopyalready overrides@BenchmarkModeand@OutputTimeUnit, the class-level@BenchmarkMode(Mode.AverageTime)/@OutputTimeUnit(TimeUnit.NANOSECONDS)are redundant; removing them would simplify the annotations.- If you later want to exercise more of
RecordSink’s surface (e.g., STRING/SYMBOL/DECIMAL paths), you could extendSIMPLE_TYPESand add corresponding getters inMockRecord.Also applies to: 100-105, 119-182
core/src/test/java/io/questdb/test/griffin/LoopingRecordToRowCopierTest.java (1)
33-88: Tests correctly validate copy semantics, but may not reliably hit the looping copier pathThe scenarios here (many columns, wide tables, type conversions, nulls) are good functional coverage and the assertions look correct. One nuance:
- Comments like “trigger loop-based copier” / “will use loop-based for 9 columns” and the class name suggest these tests specifically exercise the
LoopingRecordToRowCopier, but with the current heuristics (bytecode size estimation + chunked mode) and default method-size limits, many of these cases will still be served by the bytecode or chunked copiers.If the intent is to guarantee coverage of the loop-based implementation, consider, for example:
- Overriding the configuration in this test to disable chunking and/or reduce the method-size limit so the planner is forced into the looping path for these tables; or
- Adding a dedicated test that invokes
RecordToRowCopierUtils.generateCopier(..., /*chunkedEnabled=*/false, smallLimit)directly and uses that copier in a controlled copy loop.Alternatively, if you only care about the semantics regardless of the concrete copier, updating the comments to avoid stating that the loop-based implementation is definitely used would prevent confusion later.
Also applies to: 157-190
core/src/test/java/io/questdb/test/griffin/RecordToRowCopierIntegrationTest.java (1)
352-380: Test name doesn't match actual column count.The test
testExtremelyWideTableInsertusesgetColumnCount(100), which means BYTECODE mode gets 50 columns and CHUNKED/LOOPING get 100 columns. This isn't "extremely wide" compared totestCreateTableAsSelectVeryWideTablewhich uses 500. Consider renaming or increasing the column count.@Test - public void testExtremelyWideTableInsert() throws Exception { - int columnCount = getColumnCount(100); + public void testWideTableInsert() throws Exception { + int columnCount = getColumnCount(300);core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (3)
379-390: Consider including all decimal types in the estimation.The estimation only adds overhead for DECIMAL128 and DECIMAL256, but DECIMAL8-64 types also generate bytecode. While the current estimates work as a heuristic, more accurate estimates could help optimize chunk boundaries.
1279-1284: Chunk size validation is post-hoc; consider pre-validation.The chunk size is validated after generating bytecode (line 1280-1281). If a single column generates more bytecode than
methodSizeLimit, the method returns null after doing substantial work. Consider validating duringcalculateChunkBoundariesto fail earlier.However, this is acceptable as a fallback since
getMethodCodeSize()gives the actual size rather than an estimate.
1216-1228: Function keys are not chunked and could exceed method size limit.If
keyFunctionshas many entries, the maincopymethod could still exceed the bytecode size limit since function key copying happens entirely in the main method. Consider chunking function keys as well for extreme cases.For now, this is acceptable since typically the number of key functions is small compared to column count.
core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java (1)
386-427: Consider extracting constant pool setup to reduce code duplication.The constant pool setup (lines 410-545) is nearly identical to the setup in
generateSingleMethodCopier(lines 1563-1700). This creates maintenance burden and risk of divergence.Consider extracting the constant pool setup into a shared helper method or a builder pattern to avoid ~135 lines of duplicated code. This would ensure both code paths remain synchronized and reduce maintenance overhead.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
benchmarks/src/main/java/org/questdb/RecordSinkBenchmark.java(1 hunks)benchmarks/src/main/java/org/questdb/RecordToRowCopierBenchmark.java(1 hunks)core/src/main/java/io/questdb/DynamicPropServerConfiguration.java(1 hunks)core/src/main/java/io/questdb/PropServerConfiguration.java(3 hunks)core/src/main/java/io/questdb/PropertyKey.java(1 hunks)core/src/main/java/io/questdb/cairo/CairoConfiguration.java(1 hunks)core/src/main/java/io/questdb/cairo/LoopingRecordSink.java(1 hunks)core/src/main/java/io/questdb/cairo/RecordSinkFactory.java(7 hunks)core/src/main/java/io/questdb/cairo/mv/MatViewRefreshJob.java(1 hunks)core/src/main/java/io/questdb/griffin/LoopingRecordToRowCopier.java(1 hunks)core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java(3 hunks)core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java(16 hunks)core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java(5 hunks)core/src/main/java/io/questdb/griffin/engine/functions/window/RankFunctionFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/groupby/AbstractSampleByFillRecordCursorFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/groupby/DistinctRecordCursorFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/groupby/DistinctTimeSeriesRecordCursorFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/groupby/SampleByFillNoneRecordCursorFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/groupby/SampleByInterpolateRecordCursorFactory.java(1 hunks)core/src/main/java/io/questdb/std/BytecodeAssembler.java(4 hunks)core/src/test/java/io/questdb/test/ServerMainTest.java(1 hunks)core/src/test/java/io/questdb/test/cairo/RecordArrayTest.java(4 hunks)core/src/test/java/io/questdb/test/cairo/RecordChainTest.java(5 hunks)core/src/test/java/io/questdb/test/cairo/RecordSinkFactoryTest.java(3 hunks)core/src/test/java/io/questdb/test/cairo/map/OrderedMapTest.java(4 hunks)core/src/test/java/io/questdb/test/cairo/o3/AbstractO3Test.java(1 hunks)core/src/test/java/io/questdb/test/griffin/InsertAsSelectTest.java(2 hunks)core/src/test/java/io/questdb/test/griffin/LoopingRecordToRowCopierTest.java(1 hunks)core/src/test/java/io/questdb/test/griffin/RecordToRowCopierIntegrationTest.java(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-10T14:28:48.329Z
Learnt from: mtopolnik
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2025-11-10T14:28:48.329Z
Learning: In AsOfJoinDenseRecordCursorFactoryBase.java, the `backwardScanExhausted` flag is intentionally NOT reset in `toTop()` because backward scan results are reusable across cursor rewinds. The backward scan caches historical matches that remain valid when the cursor is rewound.
Applied to files:
core/src/test/java/io/questdb/test/cairo/RecordArrayTest.javacore/src/test/java/io/questdb/test/cairo/RecordChainTest.java
📚 Learning: 2025-08-04T09:16:27.366Z
Learnt from: jerrinot
Repo: questdb/questdb PR: 5996
File: core/src/test/java/io/questdb/test/cutlass/http/line/LineHttpSenderTest.java:350-356
Timestamp: 2025-08-04T09:16:27.366Z
Learning: In QuestDB ILP tests, maximum dimension values like `(1 << 28) - 1` are intentionally hard-coded instead of using constants like `ArrayView.DIM_MAX_LEN` to serve as "canaries" that will cause test failures if the API limits change, ensuring developers are aware of breaking changes to the ILP API.
Applied to files:
core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java
🧬 Code graph analysis (13)
benchmarks/src/main/java/org/questdb/RecordSinkBenchmark.java (3)
core/src/main/java/io/questdb/cairo/LoopingRecordSink.java (1)
LoopingRecordSink(46-361)core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
RecordSinkFactory(43-2158)core/src/main/java/io/questdb/std/Rnd.java (1)
Rnd(39-491)
core/src/main/java/io/questdb/griffin/engine/functions/window/RankFunctionFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
RecordSinkFactory(43-2158)
core/src/main/java/io/questdb/griffin/engine/groupby/DistinctTimeSeriesRecordCursorFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
RecordSinkFactory(43-2158)
core/src/test/java/io/questdb/test/griffin/LoopingRecordToRowCopierTest.java (1)
core/src/test/java/io/questdb/test/tools/TestUtils.java (1)
TestUtils(154-2667)
core/src/test/java/io/questdb/test/cairo/map/OrderedMapTest.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
RecordSinkFactory(43-2158)
core/src/main/java/io/questdb/griffin/engine/groupby/AbstractSampleByFillRecordCursorFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
RecordSinkFactory(43-2158)
core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java (1)
core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java (1)
RecordToRowCopierUtils(50-2851)
core/src/test/java/io/questdb/test/griffin/InsertAsSelectTest.java (2)
core/src/main/java/io/questdb/cairo/PartitionBy.java (1)
PartitionBy(44-192)core/src/test/java/io/questdb/test/cairo/TableModel.java (1)
TableModel(35-238)
core/src/main/java/io/questdb/griffin/engine/groupby/SampleByFillNoneRecordCursorFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
RecordSinkFactory(43-2158)
core/src/main/java/io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
RecordSinkFactory(43-2158)
core/src/main/java/io/questdb/griffin/engine/groupby/DistinctRecordCursorFactory.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
RecordSinkFactory(43-2158)
core/src/test/java/io/questdb/test/cairo/RecordChainTest.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
RecordSinkFactory(43-2158)
core/src/test/java/io/questdb/test/cairo/RecordSinkFactoryTest.java (1)
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
RecordSinkFactory(43-2158)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (48)
benchmarks/src/main/java/org/questdb/RecordToRowCopierBenchmark.java (7)
98-139: LGTM! Well-structured benchmark configuration.The JMH annotations, parameter ranges, and field declarations are appropriate for comparing the three copier implementations across various column counts. The commented-out
@Paramalternatives (lines 122-123) are useful for focused investigations around bytecode thresholds.
154-162: LGTM! Standard JMH benchmark pattern.The batch copy loop with final
Blackhole.consume()correctly prevents dead code elimination while measuring throughput.
226-247: LGTM! Clear implementation selection with good documentation.The comments (lines 233-240) explaining the chunking threshold behavior are helpful for understanding why certain magic numbers are used.
254-276: LGTM! Proper resource cleanup.The engine null check and recursive directory cleanup using
Files.walkFileTreeis the standard approach.
281-297: LGTM! Minimal mock implementation.
393-459: LGTM! Appropriate mock metadata implementation.The mock correctly implements required interface methods with sensible defaults for benchmark isolation.
465-633: LGTM! No-op row implementation isolates copier logic from I/O overhead.The empty method implementations are intentional and clearly documented (lines 462-463), enabling accurate measurement of copier performance.
core/src/test/java/io/questdb/test/ServerMainTest.java (1)
799-800: Newcairo.sql.copier.chunkedrow matches config and reload semanticsThe added expected row uses the correct property path/env var, default
truevalue, and marks it as reloadable, consistent withPropertyKey.CAIRO_SQL_COPIER_CHUNKEDandDynamicPropServerConfiguration.dynamicProps. Looks good.core/src/main/java/io/questdb/DynamicPropServerConfiguration.java (1)
97-99: Making copier settings dynamically reloadable is consistentAdding
CAIRO_SQL_COPY_EXPORT_ROOTandCAIRO_SQL_COPIER_CHUNKEDtodynamicPropscorrectly exposes these as runtime‑reloadable options and matches the updatedshow parametersexpectations. No issues seen.core/src/main/java/io/questdb/PropertyKey.java (1)
127-127:PropertyKey.CAIRO_SQL_COPIER_CHUNKEDdefinition is correctEnum entry is named and placed consistently with other Cairo SQL keys, with the expected property path used elsewhere in the PR. All good.
core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1)
745-754: DefaultisCopierChunkedEnabled()API is sensibleProviding a default
trueimplementation with clear Javadoc cleanly exposes the chunked‑copier toggle while keeping existingCairoConfigurationimplementors source‑compatible. Implementation matches intended behavior.core/src/main/java/io/questdb/griffin/engine/groupby/SampleByFillNoneRecordCursorFactory.java (1)
76-77: Group-by sample-by map sink now honors copier chunking configPassing
configuration.isCopierChunkedEnabled()intoRecordSinkFactory.getInstancecorrectly integrates the new chunked/looping copier selection into this path. No further changes needed here.core/src/main/java/io/questdb/griffin/engine/groupby/AbstractSampleByFillRecordCursorFactory.java (1)
69-72: Sample-by fill factories now share chunked copier behaviorUsing
configuration.isCopierChunkedEnabled()when constructingmapSinkkeeps this abstract base in line with the concrete SampleBy variants and the newRecordSinkFactoryAPI. Looks consistent.core/src/main/java/io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java (1)
84-86: Group-by map sink correctly wired to chunked copier flagThe updated
RecordSinkFactory.getInstancecall passesconfiguration.isCopierChunkedEnabled()through the key‑aware overload, so group‑by map building will follow the new single‑method/chunked/looping selection logic. This is consistent with other call sites.core/src/test/java/io/questdb/test/griffin/InsertAsSelectTest.java (1)
27-36: Additional imports are appropriateBringing in
PartitionByandTableModelis exactly what the new wide-table tests require; import layout remains consistent with the rest of the file.core/src/main/java/io/questdb/griffin/engine/functions/window/RankFunctionFactory.java (1)
188-188: LGTM!The addition of
configuration.isCopierChunkedEnabled()correctly enables chunked copier support for rank window functions, allowing them to handle wide tables without bytecode size failures.core/src/main/java/io/questdb/griffin/engine/groupby/DistinctTimeSeriesRecordCursorFactory.java (1)
58-58: LGTM!The addition of
configuration.isCopierChunkedEnabled()correctly propagates the chunked copier configuration to the record sink factory, enabling wide-table support for distinct time series operations.core/src/test/java/io/questdb/test/cairo/o3/AbstractO3Test.java (1)
511-512: LGTM!The addition of
sqlExecutionContext.getCairoEngine().getConfiguration().isCopierChunkedEnabled()correctly enables chunked copier support for O3 uncommitted inserts. The configuration is appropriately sourced from the engine, and the copier caching logic remains unchanged.core/src/main/java/io/questdb/griffin/engine/groupby/DistinctRecordCursorFactory.java (1)
87-87: LGTM!The addition of
configuration.isCopierChunkedEnabled()correctly propagates the chunked copier configuration to the record sink factory, enabling wide-table support for distinct operations. The change is consistent with similar updates across other factory classes.core/src/main/java/io/questdb/PropServerConfiguration.java (3)
199-199: New copierChunkedEnabled field fits existing configuration styleFinal, immutable boolean alongside other Cairo config flags; naming and placement are consistent, no issues spotted.
1863-1863: Property wiring for CAIRO_SQL_COPIER_CHUNKED looks correctUsing
getBoolean(..., true)aligns with the intended default of enabling chunked copiers; this will also populateallPairsfor introspection via the usual path.
4212-4215: Accessor isCopierChunkedEnabled correctly exposes the flagOverride matches the CairoConfiguration API and simply delegates to the stored value, so downstream components can toggle behavior via configuration as intended.
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (8)
1392-1394: Hash join slave sink now respects copier chunking flagUsing
RecordSinkFactory.getInstance(asm, slaveMetadata, entityColumnFilter, configuration.isCopierChunkedEnabled())withentityColumnFilter.of(slaveMetadata.getColumnCount())keeps semantics the same while enabling chunked codegen where configured. Looks correct for the non‑light hash join path.
1489-1511: Master/slave join key copiers correctly pass timestamp‑nanos bitsets and chunk flagBoth
createRecordCopierMasterandcreateRecordCopierSlavenow passwriteTimestampAsNanosB/writeTimestampAsNanosAandconfiguration.isCopierChunkedEnabled()intoRecordSinkFactory.getInstance(...). This lines up with howprocessJoinContext(...)populates those bitsets and ensures join key copiers both normalize mixed‑precision timestamps and participate in chunked codegen.
2861-2868: Set-operation sinks (UNION / INTERSECT / EXCEPT) correctly propagate chunking and symbol‑to‑string behaviorIn both
generateIntersectOrExceptAllFactoryandgenerateUnionFactory, theRecordSinkused for set keys is now created viaRecordSinkFactory.getInstance(asm, unionMetadata, entityColumnFilter, writeSymbolAsString, configuration.isCopierChunkedEnabled()). Argument ordering matches expectations (metadata → filter →writeSymbolAsString→ chunk flag), and reuses the existingwriteSymbolAsStringsemantics while enabling chunked copiers for very wide set operations.Also applies to: 6869-6876
3395-3411: Markout horizon CROSS JOIN slave sink wired to chunked copierThe
MarkoutHorizonRecordCursorFactorypath now usesRecordSinkFactory.getInstance(asm, slaveMetadata, entityColumnFilter, configuration.isCopierChunkedEnabled())withentityColumnFilterspanning all slave columns, which is appropriate for materialising slave rows and lets the copier chunk for wide slaves. No functional regression visible.
3631-3660: Latest-by sinks correctly use key column filter and chunk flagBoth
LatestByRecordCursorFactoryandLatestByLightRecordCursorFactorynow get their sink viaRecordSinkFactory.getInstance(asm, metadata, listColumnFilterA, configuration.isCopierChunkedEnabled()). UsinglistColumnFilterA(prepared inprepareLatestByColumnIndexes) ensures only latest-by key columns are copied, and the chunk flag aligns this path with other wide‑schema sink users.
3729-3747: Table-based latest-by variants consistently use chunked sinksIn
generateLatestByTableQueryand the table-querylatest bypath ingenerateTableQuery0, allLatestBy*factories now receive sinks fromRecordSinkFactory.getInstance(asm, metadata/queryMeta, listColumnFilterA, configuration.isCopierChunkedEnabled()). This reuses the same key‑column filter (listColumnFilterA) and keeps behavior unchanged while enabling chunked copiers for large key sets.Also applies to: 6797-6819
4184-4192: ORDER BY materialisation sink updated for chunked copierThe
SortedRecordCursorFactorynow builds its sink viaRecordSinkFactory.getInstance(asm, orderedMetadata, entityColumnFilter, configuration.isCopierChunkedEnabled()). WithentityColumnFilter.of(orderedMetadata.getColumnCount())this still copies all columns into the sort buffer, just now through chunk-aware codegen. This is appropriate given ORDER BY on very wide rows.
5573-5583: Window-function partition and cache sinks correctly wired to chunked copiersAll window paths now construct their internal sinks with the chunk flag:
- Partition-by sink:
RecordSinkFactory.getInstance(asm, keyTypes, entityColumnFilter, configuration.isCopierChunkedEnabled()).- Non-fast-path partition-by sink: same pattern.
- Cached-window chain sink:
RecordSinkFactory.getInstance(asm, chainTypes, listColumnFilterA, null, listColumnFilterB, null, null, configuration.isCopierChunkedEnabled()).Argument ordering is consistent with existing usage (types → filters → optional bitsets → chunk flag), and chunking here is safe since partition and chain column counts are typically modest; semantics remain unchanged.
Also applies to: 5811-5819, 5950-5958
core/src/test/java/io/questdb/test/cairo/map/OrderedMapTest.java (1)
1129-1129: LGTM - Consistent API migration to enable chunked copier path.All four
RecordSinkFactory.getInstancecalls are correctly updated to passtruefor the newchunkedEnabledparameter, ensuring these tests exercise the chunked generation path. The changes are consistent across all test methods.Also applies to: 1817-1817, 1993-1993, 2065-2065
core/src/test/java/io/questdb/test/cairo/RecordChainTest.java (1)
69-69: LGTM - Consistent update to enable chunked copier in RecordChain tests.All five
RecordSinkFactory.getInstancecalls are correctly updated with thetrueparameter forchunkedEnabled, maintaining consistency with the API changes across the codebase.Also applies to: 93-93, 152-152, 220-220, 379-379
core/src/main/java/io/questdb/std/BytecodeAssembler.java (3)
258-265: LGTM - Useful addition for bytecode size tracking.The
getMethodCodeSize()method correctly returns the bytecode size by computingposition() - codeStart. The Javadoc appropriately references the JVM's HugeMethodLimit (8000 bytes default), which is the primary motivation for the chunked generation feature in this PR.
755-780: LGTM - Clean implementation mirroring startMethod().The
startPrivateMethod()implementation correctly mirrorsstartMethod()with only the access flags differing (ACC_PRIVATEvsACC_PUBLIC). This is needed for generating private helper methods in the chunked bytecode generation approach.
127-130: LGTM - Minor cleanup changes.The
@SuppressWarnings("unused")annotation ondcmpg()and the refactoring ofdefineDefaultConstructorto use theinvokespecial()helper method are clean improvements.Also applies to: 154-154
core/src/main/java/io/questdb/cairo/mv/MatViewRefreshJob.java (1)
580-589: LGTM - Correct integration of chunked copier configuration.The
getRecordToRowCopiermethod now properly passesconfiguration.isCopierChunkedEnabled()toRecordToRowCopierUtils.generateCopier, enabling runtime control of the chunked copier feature for materialized view refresh operations. This is a key production code path that will benefit from the bytecode size limits fix for wide tables.core/src/main/java/io/questdb/griffin/SqlCompilerImpl.java (1)
2775-2783: Copier call-site updates correctly propagate the chunking flagAll updated
RecordToRowCopierUtils.generateCopier(...)invocations now passconfiguration.isCopierChunkedEnabled(), so INSERT, INSERT AS SELECT, CREATE TABLE AS SELECT, and backup paths consistently honor the global copier-chunking configuration. The wiring matches the new overload signature and looks coherent.Also applies to: 2810-2854, 2901-2909, 3533-3539, 4673-4679
core/src/main/java/io/questdb/griffin/engine/groupby/SampleByInterpolateRecordCursorFactory.java (1)
174-176: LGTM!The changes correctly propagate the
copierChunkedEnabledconfiguration flag to bothRecordSinkFactory.getInstancecalls, enabling chunked or looping implementations for wide tables in the sample-by interpolation logic.core/src/main/java/io/questdb/griffin/LoopingRecordToRowCopier.java (2)
41-49: LGTM - Well-documented loop-based copier implementation.The class design is clean with good separation of concerns. The approach of using switch statements over generated bytecode is a solid fallback strategy for very wide tables.
763-765: [rewritten review comment]
[classification tag]core/src/test/java/io/questdb/test/griffin/RecordToRowCopierIntegrationTest.java (2)
66-90: LGTM - Clever parameterized test setup.The test design effectively exercises all three copier implementations:
- BYTECODE: small column count + chunked=false → single-method bytecode
- LOOPING: large column count + chunked=false → loop-based fallback
- CHUNKED: large column count + chunked=true → chunked bytecode
38-64: LGTM - Well-structured parameterized test class.The test infrastructure is solid with clear mode definitions, appropriate test data providers, and comprehensive scenario coverage across all three copier implementations.
core/src/main/java/io/questdb/cairo/RecordSinkFactory.java (1)
44-52: LGTM - Well-designed chunked generation strategy.The implementation introduces a robust three-tier approach (single-method → chunked → looping) with appropriate fallbacks. The size estimation and chunk boundary calculation provide a reasonable heuristic for splitting large schemas.
core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java (5)
52-116: LGTM! Well-structured fallback strategy for bytecode generation.The three-path approach (single-method → chunked → loop) provides graceful degradation for very wide tables. The constants are well-documented and the
DEFAULT_HUGE_METHOD_LIMIT = 8000aligns with JVM's HugeMethodLimit default.
299-336: LGTM! Chunk boundary calculation logic is correct.The algorithm correctly handles edge cases:
- Single columns that exceed
CHUNK_TARGET_SIZEare isolated in their own chunk- Empty chunks are prevented via the
currentChunkSize > 0check- Boundaries always include start (0) and end (n) markers
338-384: Estimation heuristics are reasonable with safety fallback.The bytecode size estimation uses simple heuristics that may underestimate complex conversions (e.g., STRING→TIMESTAMP with driver lookup). However, the post-generation size check at line 713-716 provides a safety net by falling back to the loop-based implementation if actual size exceeds the limit.
2774-2779: LGTM! Post-generation size check provides essential safety net.The actual bytecode size check after generation ensures correctness even when the estimation heuristics underestimate. While this may result in wasted bytecode generation when fallback occurs, this happens only during query compilation (not execution), so the performance impact is acceptable.
118-225: The code is correct as written.transferIntToDecimalandtransferLongToDecimalcheck for null sentinels becauseNumbers.INT_NULLandNumbers.LONG_NULLare defined constants representing nullable int and long primitives in QuestDB. In contrast, no null sentinel constants (BYTE_NULLorSHORT_NULL) exist in the Numbers class, indicating that byte and short primitive columns do not support null values in QuestDB's type system. The null checks intransferByteToDecimalandtransferShortToDecimalare therefore not applicable or necessary—the asymmetry reflects the intentional design distinction between which primitive types support nullability, not an oversight.Likely an incorrect or invalid review comment.
core/src/main/java/io/questdb/griffin/LoopingRecordToRowCopier.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/questdb/griffin/LoopingRecordToRowCopier.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/questdb/griffin/RecordToRowCopierUtils.java
Outdated
Show resolved
Hide resolved
[PR Coverage check]😞 fail : 648 / 1297 (49.96%) file detail
|
|
bytecode generators are fuzzed, they don't all run at the same time. |

Fixes #3312
Fixes #3326
This PR fixes a bug preventing copying of data between tables with thousands (6000+) of columns, through upgrades to RecordToRowCopier. The same bug existed for RecordSink, used for copying large result sets within the query engine.
Background
An optimised copy routine is generated when moving data between tables or result sets. This routine is produced by a custom JIT compiler that inlines the copy instructions into Java bytecode.
Java bytecode has a method size limitation of 64K. For very wide tables (thousands of columns), this limit could be exceeded.
This PR initially introduces a new 'looping' version of the this copy routine, which is essentially the non-inlined version of the the bytecode routine. This loops for each row, and each column within that row, with a switch dispatch based on the from/to types.
When benchmarking these changes, a cliff was identified, where the bytecode routine performance dropped off significantly.
This was narrowed down further:
In this instance, with 459 columns, the method's bytecode size was
7995, whereas it was8013with 460 columns. This corresponds to the C2 compiler falling back to C1 compilation for methods larger than 8K.To resolve this, an additional 'chunked' copier is added. This is identical to the original 'bytecode' copier, except that if the output is estimated to be more than 6K, it is chunked into several sub-methods covering a subset of columns. Then an outer method is used which invokes the sub-methods, allowing for each method to be individually C2 compiled.
For larger copies, the looping solution already became superior to the bytecode solution, but chunked generally outperforms them both.
If I/O is the bottleneck, then these differences are more minor. The benchmarks test the performance of the routines themselves, sans-IO.
Changes
RecordToRowCopiercairo.sql.copier.chunked=trueAdd config to disable chunkingsame as priorINSERT AS SELECTCREATE AS SELECT