Skip to content

perf(sql): avoid temp table materialization for Parquet export with computed columns#6795

Merged
bluestreak01 merged 65 commits intomasterfrom
mt_parquet-direct
Feb 24, 2026
Merged

perf(sql): avoid temp table materialization for Parquet export with computed columns#6795
bluestreak01 merged 65 commits intomasterfrom
mt_parquet-direct

Conversation

@mtopolnik
Copy link
Copy Markdown
Contributor

@mtopolnik mtopolnik commented Feb 20, 2026

Fixes #6766

Summary

  • Parquet export (COPY ... TO and HTTP /exp) no longer falls back to a temporary table when the projection contains computed expressions (e.g., CAST, arithmetic, string functions). Instead, a new hybrid export mode passes through raw page-frame-backed columns zero-copy and materializes only the computed columns into native buffers, row by row per frame.
  • Queries with no page-frame backing at all (e.g., cross joins) now use a cursor-based export that materializes all columns row by row, also avoiding the temp table detour. Only queries that require re-partitioning (PARTITION BY override) or that contain a computed BINARY column still use the temp table path.
  • Fixes several resource lifecycle issues on error paths: page-frame cursor leaks, temp directory leaks on moveFile() failure, use-after-free on error path cleanup, Rust streaming writer close ordering relative to pinned native buffers, and SYMBOL column dispatch in cursor-based mode (was using the adjusted output type instead of the source type for record accessor dispatch).
  • Refactors the exporter hierarchy: BaseParquetExporter holds shared state and methods (dropTempTable(), drainHybridFrames()), while HTTPSerialParquetExporter and SQLSerialParquetExporter extend it independently. This removes the prior dependency where the SQL exporter extended the HTTP exporter despite not using any HTTP-specific fields.
  • Adds zero-allocation optimizations for the HTTP connection-reuse path: the exporter pre-allocates materializer buffers, per-column arrays, metadata objects, and file-write callbacks once and reuses them across requests. A buffer pool recycles computed-column native buffers rather than freeing them after each row group flush.

Design

ParquetExportMode.determineExportMode() inspects the compiled RecordCursorFactory before constructing any temp table:

Mode Condition Behavior
DIRECT_PAGE_FRAME Factory supports PageFrameCursor directly Zero-copy, existing fast path
PAGE_FRAME_BACKED VirtualRecordCursorFactory whose base supports page frames, no computed BINARY columns Copies pass-through columns zero-copy from page frames; materializes computed columns per frame
CURSOR_BASED No page-frame backing, or descending ORDER BY with computed columns, no BINARY columns Materializes all columns row by row from RecordCursor
TEMP_TABLE PARTITION BY override or computed BINARY columns Existing slow path (unchanged)

The ParquetExportMode enum owns the mode selection logic (determineExportMode()), including the descending-order fallback from PAGE_FRAME_BACKED to CURSOR_BASED. This avoids duplicated post-hoc fixups and re-applies the fallback correctly after TableReferenceOutOfDateException recompilation.

HybridColumnMaterializer handles both PAGE_FRAME_BACKED and CURSOR_BASED modes. For each output column it either copies page-frame pointers directly into the 7-longs-per-column layout that the Rust Parquet writer expects, or evaluates the column's Function against a record and writes results into MemoryCARWImpl data/aux buffers. The materializer converts computed SYMBOL columns to STRING in the adjusted metadata but dispatches them via source type for correct record accessor usage.

Because Rust's Parquet writer holds references to native memory across multiple frame writes (pending row-group data), the materializer pins compute buffers that Rust may still reference and allocates fresh ones, releasing pinned buffers only after the Rust side confirms a row-group flush. A buffer pool avoids reallocation overhead. setUp() pre-computes per-column metadata (types, functions, symbol-to-string flags) and computed-column indices once into flat arrays to eliminate repeated indirection in hot loops.

Mode selection happens before constructing CreateTableOperation, which eliminates an extra query compilation pass for all non-temp-table cases. The ring queue passes the compiled RecordCursorFactory to the exporter, avoiding redundant SqlCompiler.compile() calls.

Exporter hierarchy

BaseParquetExporter (shared fields: task, context, circuit breaker; shared methods: dropTempTable, drainHybridFrames)
├── HTTPSerialParquetExporter (streaming HTTP state, process with PeerIsSlowToReadException resume)
└── SQLSerialParquetExporter (one-shot file export, processHybridExport, processTempTableExport)

A single StreamPartitionParquetExporter.setUp() implementation serves all export modes. The baseColumnMap nullity distinguishes pass-through vs. all-real-column paths, and meta.getWriterIndex(i) returns the correct value for both table metadata and GenericRecordMetadata.

Test plan

  • Existing CopyExportTest and ExpParquetExportTest suites pass
  • New tests cover PAGE_FRAME_BACKED export with column tops, descending fallback after recompile, and various computed-column combinations
  • New tests cover CURSOR_BASED export path including SYMBOL column dispatch
  • Resource leak tests verify cleanup on error paths (page frame cursor leak on SqlException, temp dir leak on streaming failure and moveFile failure)
  • Mmap error test now forces TEMP_TABLE mode via BINARY column, ensuring it exercises the temp table path
  • Flaky timeout and circuit-breaker tests accept both "peer disconnect"/"malformed chunk" and "timeout"/"cancelled by user" race outcomes

mtopolnik and others added 30 commits February 18, 2026 13:12
Replace fragile null-check and materializer-based inference of
the Parquet export mode with explicit enum dispatch in both HTTP
and SQL export paths.

- Add TABLE_READER enum value for direct table reference exports
  (COPY tablename TO), eliminating null as a sentinel
- Store ParquetExportMode on HTTPSerialParquetExporter, set via
  setExportMode() before process(), switch on it explicitly
- Replace if/else chain in SQLSerialParquetExporter.process()
  with exhaustive switch
- Remove isPageFrameBacked field/method from RecordToColumnBuffers
  (was only used for the fragile inference)
- Remove @nullable from CopyExportRequestTask.exportMode since
  all paths now set a concrete enum value

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Move ownership transfer of pageFrameCursor, materializer, and
materializerColumnData from state to exporter to after setUp()
and setupXxxExport() succeed. Previously, state fields were
nulled before these calls, so if setUp() threw (e.g., from the
native createStreamingParquetWriter call), the resources would
be orphaned — state couldn't free them (fields already null)
and the exporter hadn't received them yet.

Also add a comment in CopyExportRequestTask.clear() documenting
why pageFrameCursor is nulled without Misc.free(): ownership
belongs to HTTPSerialParquetExporter or ExportQueryProcessorState.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
For streaming export modes (DIRECT_PAGE_FRAME, PAGE_FRAME_BACKED,
CURSOR_BASED), the SELECT query was compiled twice: once in
CopyExportFactory.getCursor() to call determineExportMode(), and
again in SQLSerialParquetExporter.processStreamingExport(). Pass
the RecordCursorFactory from the first compilation through the
ring queue task to the second location, removing the redundant
SqlCompiler.compile() call. The factory is transferred through
the ring queue using a null-transfer pattern to prevent
double-free on task.clear().

Also fix a latent bug in the partition-mismatch path: when
COPY specifies a different PARTITION BY than the table,
determineExportMode() was skipped, leaving exportMode at its
initial TABLE_READER value. The else branch now explicitly sets
TEMP_TABLE for re-partitioning.

Rename the two RecordCursorFactory fields in CopyExportRequestTask
from "factory"/"compiledFactory" to "tempTableFactory"/"selectFactory"
to clarify which export path each serves.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The two setUp() overloads duplicated column name/metadata
serialization and the createStreamingParquetWriter() call. The
1-arg overload now delegates to the 3-arg version with null
pfc and baseColumnMap, and the 3-arg version guards the symbol
table handling with a null check on baseColumnMap.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
RecordToColumnBuffers.resetBuffers() used to truncate native memory
buffers in place between page frames. Rust's pending_partitions still
held raw pointers to the old data, so when the next frame overwrote
the same addresses, Rust read corrupted data and panicked with
"range end index N out of range for slice of length M".

The fix replaces truncate-in-place with pin-and-replace: when buffers
contain data, they are moved to pinned lists and fresh buffers are
allocated. After a row group flush, an inFlightRows check determines
when all pinned partitions have been drained by Rust, and the pinned
buffers are freed. This applies to all four export loops (HTTP and SQL,
PAGE_FRAME_BACKED and CURSOR_BASED).

Tests are extended to cover var-size computed columns (STRING, VARCHAR,
ARRAY) in multi-row-group scenarios across all export paths, and
descending ORDER BY is forced to CURSOR_BASED mode since page frame
cursors cannot reverse row order within partitions.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Fix createOp use-after-free in CopyExportRequestJob: transfer
  ownership out of queue task before clear(), mirroring the existing
  selectFactory pattern.
- Fix createOp leak in CopyExportFactory when the export queue is
  full: hoist variable above try block and free in outer finally.
- Fix RecordToColumnBuffers leak in ExportQueryProcessor CURSOR_BASED
  setup: wrap in try-catch matching the PAGE_FRAME_BACKED path.
- Fix createParquetOp not freed in ExportQueryProcessorState.close().
- Fix createOp, tempTableFactory, pageFrameCursor not freed in
  CopyExportRequestTask.close().
- Fix factory not freed on PeerIsSlowToReadException in
  HTTPSerialParquetExporter.process().

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add a dense int[] of computed column indices to avoid scanning
all outputColumnCount columns in hot loops that only need to
visit computed columns. For wide tables with few computed
columns (e.g., 100 columns, 2 computed), the inner loop in
materializeComputedColumns() was wasting ~100M branch iterations
per frame doing no work.

The computedColumnIndices array is built once during setUp() and
setUpPageFrameBacked(), then used in materializeComputedColumns()
and both loops in resetBuffers() (buffer pinning/reallocation
and STRING offset initialization).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace per-row metadata and function lookups in
materializeComputedColumns() and buildColumnDataFromCursor()
with flat arrays populated once at setup time:

- computedOutputTypes[]: column types from adjustedMetadata
- computedFunctions[]: Function instances (page-frame path)
- computedIsSymbolToString[]: symbol-to-string conversion flag

This eliminates three repeated indirections per row per column:
adjustedMetadata.getColumnType() (3 dereferences through ObjList
and TableColumnMetadata), functions.getQuick() (2 dereferences),
and func.getType() (virtual dispatch). Also applies the same
optimization to resetBuffers() where it re-allocates after
pinning.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The three HTTP streaming export methods were logging at
INFO level on every frame/batch. For tables with many
partitions this produces hundreds of log lines per export.

Demote the six per-frame and resume progress messages to
DEBUG, keeping INFO only for start/completion messages.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Instead of freeing MemoryCARWImpl instances when pinned buffers are
released after a row group flush, return them to a reuse pool.
Subsequent frames draw from the pool via obtainBuffer(), which
truncates and reuses an existing buffer or allocates a new one only
when the pool is empty.

This eliminates short-lived Java object allocation in steady state.
For a 100-partition export with 5 computed columns, it avoids ~1000
MemoryCARWImpl allocations. The pool is naturally bounded by peak
usage (framesPerRowGroup * buffersPerFrame) and properly freed in
clear()/close().

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Pre-allocate RecordToColumnBuffers and DirectLongList as final
fields instead of creating them fresh on every export request.

HTTP path (ExportQueryProcessorState): materializer and
materializerColumnData are now final fields initialized once.
clear() calls .clear() on them; close() calls Misc.free().
ExportQueryProcessor no longer wraps setUp() in try-catch with
fresh allocation. copyQueryToParquetFile() no longer nulls out
the materializer fields since ownership stays with the state.
HTTPSerialParquetExporter now borrows these references instead
of owning them, so clearExportResources() nulls rather than
freeing them.

COPY path (SQLSerialParquetExporter): streamBuffers and
streamColumnData are new final fields reused across exports.
processStreamingExport() calls clear() instead of using
try-with-resources, and processCursorExport() uses the fields
directly instead of receiving them as parameters.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace the per-export lambda and long[1] array with a reusable
static nested FileWriteCallback class. A single instance is held
in a final field and re-initialized via of(ff, fd) for each
export, eliminating the heap allocation and lambda capture per
row group.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace six per-request array allocations in
RecordToColumnBuffers with pre-allocated final
collection fields (IntList, BoolList, ObjList).
The setUp() methods now use setPos()/setQuick()
to resize and populate, and clear() resets size
without freeing backing arrays.

Also convert the local int[] identityMap in
SQLSerialParquetExporter to a reusable IntList
field, and update CopyExportRequestTask.setUp()
to accept IntList instead of int[].

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The clear() method nulled pageFrameCursor unconditionally before
the tempTableFactory check, so the temp-table path (which owns
the cursor) would try to free an already-null reference, leaking
the native resource. Move the ownership check first so the cursor
is properly freed when tempTableFactory is non-null, and only
null-without-freeing in the non-owning paths.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add comments at both PAGE_FRAME_BACKED cast sites in
ExportQueryProcessor explaining why the unchecked cast to
VirtualRecordCursorFactory is safe: determineExportMode() only
returns PAGE_FRAME_BACKED when the unwrapped factory is already
verified to be an instanceof VirtualRecordCursorFactory.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add a class-level javadoc and per-value comments explaining when
each export mode is used and what it implies for how data flows
into the Parquet writer.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
processCursorStreamExport() and processPageFrameStreamExport()
in HTTPSerialParquetExporter shared nearly identical loop bodies
(writeHybridFrame, in-flight tracking, buffer pinning). Merge
them into processHybridStreamExport() with a boolean branch for
the data-source difference and partition release.

Apply the same merge in SQLSerialParquetExporter: the inline
PAGE_FRAME_BACKED block and processCursorExport() become a
single processHybridExport() method.

Net -43 lines with no behavioral change.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
                                                                                                                                                                                                        
The old name described the mechanism (records → column buffers)                                                                                                                                       
rather than the purpose. The new name reflects the class's key                                                                                                                                      
design: hybrid export where pass-through columns use zero-copy                                                                                                                                        
page frame pointers while computed columns are materialized into                                                                                                                                      
buffers.

Updated all 7 referencing files. Pure rename, no logic changes.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Extract the inner frame-drain loop that was duplicated between
HTTPSerialParquetExporter.processHybridStreamExport() and
SQLSerialParquetExporter.processHybridExport() into a shared
protected method drainHybridFrames() on the base class.

The two loops were identical in structure — fetch frame, materialize
columns, write hybrid frame, track in-flight rows, release pinned
buffers after row group flush — but differed only in which fields
they operated on. The new method takes the varying pieces (exporter,
materializer, column data list, pfc/cursor) as parameters. The
pfc != null check replaces the former isPageFrameBacked boolean
with no change in hot-path cost.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
When setUpPageFrameBacked() throws SqlException (via
Function.init()), the exception escapes the inner try block
and is caught by the outer SqlException handler. That handler
sends the error response but does not free the page frame
cursor already assigned to state.pageFrameCursor, leaking it
until the HTTP connection resets.

Add state.clear() after syntaxError() in the outer
SqlException catch block to immediately release the cursor,
materializer, and factory resources.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@mtopolnik mtopolnik added the Performance Performance improvements label Feb 20, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 20, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mt_parquet-direct

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

mtopolnik and others added 7 commits February 20, 2026 17:36
Move createOp assignment before method calls on the impl object
in CopyExportFactory so the catch blocks can free it if
setTableKind(), setBatchSize(), or validateAndUpdateMetadataFromSelect()
throw. Remove redundant selectFactory assignment-back pattern in
the same catch blocks since we throw immediately after.

Remove no-op Misc.free(factory) from the PeerIsSlowToReadException
catch in HTTPSerialParquetExporter — factory is provably null at
that point because ownership was already transferred to the task.

Add @nullable to the setWriteCallback() parameter in
CopyExportRequestTask to match the @nullable field it sets.

Remove redundant throws SqlException from processStreamingExport()
in SQLSerialParquetExporter — all SqlExceptions are caught inside
the method and wrapped in CopyExportException.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The test expected only "timeout, query aborted" but there are
two racing code paths when the circuit breaker trips during
PAGE_FRAME_BACKED parquet export: the page-frame factory throws
"timeout, query aborted" while the HTTP exporter throws
"cancelled by user". Accept either message as a valid outcome.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Eliminate the intermediate `impl` variable for
CreateTableOperationImpl by assigning directly to `createOp`
(narrowed from the interface type). This resolves the
AutoCloseableResource warning about a resource not being in a
try-with-resources block.

Remove redundant Misc.free() calls from the inner catch blocks,
since the outer finally block already handles freeing both
`selectFactory` and `createOp` on all exception paths. This also
resolves the ConstantValue warning where `selectFactory` was
always null at the point of the inner catch.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The test expected only "peer disconnect" when the server aborts
mid-stream due to timeout, but the client can also receive a
"malformed chunk" error when chunked transfer encoding is
interrupted. Accept both error messages as valid outcomes.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Apply the same fix from aa0850a to the three remaining
test methods that assert on HttpClientException messages
during server-side abort. When chunked transfer encoding
is interrupted, the client may receive either "peer
disconnect" or "malformed chunk", so both must be accepted.

Affected methods:
- testParquetExportCancel
- testParquetExportTimeout
- testParquetExportTimeoutNodelay

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
core/src/test/java/io/questdb/test/griffin/CopyExportTest.java (2)

462-503: ⚠️ Potential issue | 🟡 Minor

Test name no longer matches behavior. The body now asserts a successful streaming export and “no temp table,” so the “FailsWithMmapError” name is misleading. Either reintroduce the mmap-failure simulation or rename the test. Also, switch query assertions to assertQueryNoLeakCheck.

✏️ Possible rename
-    public void testCopyParquetFailsWithMmapErrorCleansTempTable() throws Exception {
+    public void testCopyParquetStreamingExportDoesNotCreateTempTable() throws Exception {

As per coding guidelines: Use assertQueryNoLeakCheck() to assert the results of queries in tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/CopyExportTest.java` around lines
462 - 503, The test method in CopyExportTest now verifies a successful streaming
export and absence of temp tables, so rename the test (the method currently
named with "FailsWithMmapError") to reflect success (e.g.,
testStreamingExportSucceedsWithoutTempTable) and update its internal assertions
to use assertQueryNoLeakCheck(...) instead of assertSql(...) for the parquet
read verification; leave the CopyExportRunnable, runAndFetchCopyExportID,
exportRoot and engine.getTableTokens(...) checks unchanged. Ensure the new test
name is used wherever referenced.

2441-2483: ⚠️ Potential issue | 🟡 Minor

Use underscore separators in SQL numeric literals and assertQueryNoLeakCheck. The SQL text uses 10000; please format 5+ digit numbers with underscores.

📝 Suggested fix
-            String insertQuery = "INSERT BATCH 10000 INTO symbol_test SELECT " +
+            String insertQuery = "INSERT BATCH 10_000 INTO symbol_test SELECT " +

As per coding guidelines: Use assertQueryNoLeakCheck() to assert the results of queries in tests; Always use underscores as thousands separators in numbers of 5 digits or more in QuestDB SQL and implementation code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/CopyExportTest.java` around lines
2441 - 2483, Replace any plain five-plus digit numeric literals with underscore
separators (e.g., change 10000 to 10_000) in the INSERT BATCH size and
long_sequence call and any other occurrences; and replace assertions that use
assertSql(...) for result verification with assertQueryNoLeakCheck(...) so the
test uses the leak-check-free assertion method. Specifically, update the INSERT
statement construction where "INSERT BATCH 10000" and "long_sequence(10000)" are
built, and change the three assertSql(...) calls that validate
sys.copy_export_log, count_distinct(sym) from read_parquet(...), and
adjusted_value from read_parquet(...) to use assertQueryNoLeakCheck(...) while
keeping the same expected/result SQL and references to runAndFetchCopyExportID,
testCopyExport, CopyExportRunnable, and sqlExecutionContext.
🧹 Nitpick comments (7)
core/src/main/java/io/questdb/cutlass/parquet/ReusablePageFrameMemory.java (1)

41-48: Alphabetize private instance fields. The new class should keep private instance fields in strict alphabetical order.

♻️ Suggested reorder
-    private final DirectLongList pageAddresses = new DirectLongList(32, MemoryTag.NATIVE_PARQUET_EXPORTER);
-    private final DirectLongList pageSizes = new DirectLongList(32, MemoryTag.NATIVE_PARQUET_EXPORTER);
-    private int columnCount;
-    private boolean hasColumnTops;
-    private long rowIdOffset;
+    private int columnCount;
+    private boolean hasColumnTops;
+    private final DirectLongList pageAddresses = new DirectLongList(32, MemoryTag.NATIVE_PARQUET_EXPORTER);
+    private final DirectLongList pageSizes = new DirectLongList(32, MemoryTag.NATIVE_PARQUET_EXPORTER);
+    private long rowIdOffset;

As per coding guidelines: Java class members must be grouped by kind (static vs. instance) and visibility, and sorted alphabetically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cutlass/parquet/ReusablePageFrameMemory.java`
around lines 41 - 48, The private instance fields in ReusablePageFrameMemory are
not in alphabetical order; reorder them alphabetically to comply with the coding
guideline. Specifically, within class ReusablePageFrameMemory place the private
fields in this alphabetical order: auxPageAddresses, auxPageSizes, columnCount,
hasColumnTops, pageAddresses, pageSizes, rowIdOffset (keep their types and
MemoryTag/NATIVE_PARQUET_EXPORTER initializers unchanged). Ensure grouping by
visibility remains the same (private instance fields together) and update any
constructor or initialization references if reordering affects readability.
core/src/main/java/io/questdb/griffin/PriorityMetadata.java (1)

68-70: Keep public methods alphabetically ordered. Please place getVirtualColumnReservedSlots() after the other get* methods to preserve alphabetical ordering.

As per coding guidelines: Java class members must be grouped by kind (static vs. instance) and visibility, and sorted alphabetically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/griffin/PriorityMetadata.java` around lines 68
- 70, The method getVirtualColumnReservedSlots() is out of alphabetical order
among the public getters; move its declaration so it sits alphabetically after
the other public get* methods in the class (i.e., relocate
getVirtualColumnReservedSlots() to the block containing the other public
getters), ensuring member grouping and visibility remain unchanged and the class
follows the alphabetical ordering guideline.
core/src/main/java/io/questdb/cutlass/parquet/HTTPSerialParquetExporter.java (1)

240-254: Keep new public methods sorted alphabetically.

The added public APIs (setExportMode/setupCursorBasedExport/setupPageFrameBackedExport) should be ordered alphabetically within the public-method block.
As per coding guidelines: **/*.java: Java class members must be grouped by kind (static vs. instance) and visibility, and sorted alphabetically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cutlass/parquet/HTTPSerialParquetExporter.java`
around lines 240 - 254, Public instance methods are out of alphabetical order:
reorder the new public APIs so they are alphabetically sorted within the
public-method block. Specifically, place setExportMode, setupCursorBasedExport,
and setupPageFrameBackedExport in alphabetical order (setExportMode,
setupCursorBasedExport, setupPageFrameBackedExport is already alphabetical by
name—ensure they are placed in the correct public-method section and, if not,
move them so the block's public instance methods follow alphabetical sorting),
update any nearby public methods to maintain the overall alphabetical ordering
required by the class member grouping rules.
core/src/main/java/io/questdb/cutlass/parquet/HybridColumnMaterializer.java (1)

76-109: Reorder fields to satisfy member ordering rules.

This new class doesn’t follow the required alphabetical ordering of instance fields. Please reorder the field block accordingly.
As per coding guidelines: **/*.java: Java class members must be grouped by kind (static vs. instance) and visibility, and sorted alphabetically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cutlass/parquet/HybridColumnMaterializer.java`
around lines 76 - 109, The instance fields in HybridColumnMaterializer are not
ordered per project rules; reorder the field declarations so all static fields
are first, then instance fields grouped by visibility (private, protected,
public), and within each group sorted alphabetically; specifically adjust the
ordering of DEFAULT_PAGE_SIZE, auxBuffers, baseColumnMap, bufferPool,
computedBufferIdx, computedColumnIndices, computedFunctions,
computedIsSymbolToString, computedOutputTypes, computedCount, dataBuffers,
decimal128A, decimal256A, functionRecord, functions, hybridSymbolTableSource,
outputColumnCount, pageFrameRecord, pfMemory, pinnedAuxBuffers,
pinnedDataBuffers, adjustedMetadata (and any other members) so private instance
fields appear alphabetically and after any static fields in the class
HybridColumnMaterializer.
core/src/main/java/io/questdb/cutlass/parquet/SQLSerialParquetExporter.java (1)

69-80: Keep new instance fields alphabetically ordered.

The added fields (identityColumnMap/streamBuffers/streamColumnData/writeCallback) are not placed in alphabetical order within the instance-field block. Please reorder to match the project standard.
As per coding guidelines: **/*.java: Java class members must be grouped by kind (static vs. instance) and visibility, and sorted alphabetically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cutlass/parquet/SQLSerialParquetExporter.java`
around lines 69 - 80, The instance fields in SQLSerialParquetExporter are out of
alphabetical order; reorder the instance-field block so members are alphabetized
by name (e.g., configuration, copyExportRoot, exportPath, ff, fromParquet,
identityColumnMap, nameSink, streamBuffers, streamColumnData, tempPath,
toParquet, writeCallback) and keep their visibility/grouping intact—update the
declaration order for the fields identityColumnMap, nameSink, streamBuffers,
streamColumnData, writeCallback (and copyExportRoot) to match this alphabetical
ordering while leaving types and initializers unchanged.
core/src/main/java/io/questdb/cutlass/http/processors/ExportQueryProcessorState.java (1)

56-95: Reorder newly added members to preserve alphabetical grouping.

The new fields and helper methods are not placed in alphabetical order within their visibility groups. Please reorder for consistency.
As per coding guidelines: **/*.java: Java class members must be grouped by kind (static vs. instance) and visibility, and sorted alphabetically.

Also applies to: 212-226

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/cutlass/http/processors/ExportQueryProcessorState.java`
around lines 56 - 95, The class member declarations are out of alphabetical
order within their visibility groups; reorder the instance fields and helper
members so they are grouped by visibility (private vs package-private) and
sorted alphabetically. Specifically, move and reorder symbols such as
arrayState, columnIndex, columnValueFullySent, copyExportContext, copyID, count,
countRows, createParquetOp, cursor, delimiter, descending, errorMessage,
errorPosition, firstParquetWriteCall, hasNext, metadata, noMeta,
pageFrameCursor, parquetExportMode, parquetExportTableName, parquetFileOffset,
pausedQuery, queryCacheable, queryState, record, recordCursorFactory,
recordCursor, rnd, serialExporterInit, serialParquetExporter, setup fields like
sqlText, skip, stop, task, timeout, writeCallback, materializer,
materializerColumnData, fileName, exportModel so each visibility block is
alphabetized; apply the same reordering to the other block referenced (lines
~212-226) to satisfy the coding guideline.
core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java (1)

56-77: Sort new fields/getters/setters alphabetically within visibility groups.

The additions (exportMode/selectFactory/selectText/tempTableFactory and accessors) should be ordered alphabetically within their member groups.
As per coding guidelines: **/*.java: Java class members must be grouped by kind (static vs. instance) and visibility, and sorted alphabetically.

Also applies to: 150-193, 260-289

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`
around lines 56 - 77, Reorder the newly added instance fields and their
accessors in CopyExportRequestTask so they follow alphabetical order within the
same visibility group: move exportMode, fileName, metadata, now,
nowTimestampType, pageFrameCursor, parquetVersion, rawArrayEncoding,
rowGroupSize, rowGroupSize, selectFactory, selectText, statisticsEnabled,
tableName, tempTableFactory, writeCallback into the correct alphabetical
positions among the other private instance fields; likewise, sort the
corresponding public/private getters and setters (the methods introduced around
the 150-193 and 260-289 ranges) alphabetically within their visibility block so
method names like getExportMode/setExportMode,
getSelectFactory/setSelectFactory, getSelectText/setSelectText,
getTempTableFactory/setTempTableFactory appear in proper alphabetical order
relative to the other accessors in CopyExportRequestTask.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/test/java/io/questdb/test/cutlass/http/ExpParquetExportTest.java`:
- Around line 792-821: In testParquetExportCursorBasedMultipleRowGroups replace
direct engine.execute(...) DDL calls with the test helper execute(...) and
convert long SQL strings to Java text blocks; specifically, change the two
CREATE TABLE calls that currently use engine.execute(...) to use execute(sql,
sqlExecutionContext) and rewrite their SQL arguments as multiline text blocks
("""...""") instead of concatenated strings, and ensure remaining long query
strings in the queries array are also expressed as text blocks; keep the call to
assertParquetExportDataCorrectness(...) unchanged.

In `@core/src/test/java/io/questdb/test/griffin/CopyExportTest.java`:
- Around line 1374-1410: The testCopyQueryWithCursorBasedMultipleRowGroups
contains a long inline COPY SQL and uses assertSql/assertEventually for result
checks; refactor the COPY statement passed to runAndFetchCopyExportID into a
Java text block (multiline string) and replace the assertSql/assertEventually
result assertions with assertQueryNoLeakCheck to follow the test conventions.
Specifically, update the CopyExportRunnable lambda that calls
runAndFetchCopyExportID to use a """...""" text block for the COPY (including
the WITH FORMAT parquet row_group_size 100 clause) and change the two
assertSql(...) calls inside the test lambda (which currently validate
sys.copy_export_log and read_parquet(...)) to equivalent
assertQueryNoLeakCheck(...) calls while keeping the same SQL and expected
output; leave identifiers like testCopyExport, runAndFetchCopyExportID,
CopyExportRunnable, and assertEventually in place where needed.

---

Outside diff comments:
In `@core/src/test/java/io/questdb/test/griffin/CopyExportTest.java`:
- Around line 462-503: The test method in CopyExportTest now verifies a
successful streaming export and absence of temp tables, so rename the test (the
method currently named with "FailsWithMmapError") to reflect success (e.g.,
testStreamingExportSucceedsWithoutTempTable) and update its internal assertions
to use assertQueryNoLeakCheck(...) instead of assertSql(...) for the parquet
read verification; leave the CopyExportRunnable, runAndFetchCopyExportID,
exportRoot and engine.getTableTokens(...) checks unchanged. Ensure the new test
name is used wherever referenced.
- Around line 2441-2483: Replace any plain five-plus digit numeric literals with
underscore separators (e.g., change 10000 to 10_000) in the INSERT BATCH size
and long_sequence call and any other occurrences; and replace assertions that
use assertSql(...) for result verification with assertQueryNoLeakCheck(...) so
the test uses the leak-check-free assertion method. Specifically, update the
INSERT statement construction where "INSERT BATCH 10000" and
"long_sequence(10000)" are built, and change the three assertSql(...) calls that
validate sys.copy_export_log, count_distinct(sym) from read_parquet(...), and
adjusted_value from read_parquet(...) to use assertQueryNoLeakCheck(...) while
keeping the same expected/result SQL and references to runAndFetchCopyExportID,
testCopyExport, CopyExportRunnable, and sqlExecutionContext.

---

Duplicate comments:
In `@core/src/test/java/io/questdb/test/griffin/CopyExportTest.java`:
- Around line 1479-1511: The test uses assertSql to verify query results inside
testCopyQueryWithPassthroughSymbol; replace those assertSql calls with
assertQueryNoLeakCheck(...) to follow the test guideline and avoid duplicate
leak checks — specifically update the two assertions that check "SELECT
export_path, num_exported_files, status FROM \"sys.copy_export_log\" LIMIT -1"
and "SELECT * FROM read_parquet(...)" within the CopyExportRunnable/test lambda
to call assertQueryNoLeakCheck with the same expected strings and queries so the
assertions use the no-leak-check variant.
- Around line 1150-1181: Replace the duplicate assertSql query assertions inside
testCopyQueryWithArithmeticExpression with assertQueryNoLeakCheck to follow the
test guideline: update the two calls to assertSql that verify
"sys.copy_export_log" and the read_parquet output (inside the assertEventually
lambda) to use assertQueryNoLeakCheck with the same expected strings and SQL
statements so the test uses the no-leak-check variant; leave the surrounding
CopyExportRunnable, assertEventually, and testCopyExport usage unchanged.
- Around line 1306-1337: In testCopyQueryWithComputedNullValues, replace the
duplicate assertSql calls used for validating query results with
assertQueryNoLeakCheck to follow test guidelines: update the two assertSql
invocations inside the assertEventually lambda (the one asserting "SELECT
export_path, num_exported_files, status FROM \"sys.copy_export_log\" ..." and
the one asserting "SELECT * FROM read_parquet(...)" ) to use
assertQueryNoLeakCheck instead so the test uses assertQueryNoLeakCheck for query
assertions while keeping the surrounding structure (CopyExportRunnable,
runAndFetchCopyExportID, assertEventually, testCopyExport) intact.
- Around line 1412-1441: The test testCopyQueryWithEmptyComputedResult uses
assertSql for query result checks but per guidelines should use
assertQueryNoLeakCheck; replace the two assertSql(...) calls (the one asserting
"export_path\tnum_exported_files\tstatus\n..." against SELECT export_path...
FROM \"sys.copy_export_log\" and the one asserting the count from
read_parquet(...)) with assertQueryNoLeakCheck calls using the same expected
string and query; ensure you call the overload of assertQueryNoLeakCheck used
elsewhere in tests and import/adjust parameters to match the test harness.
- Around line 1259-1304: In testCopyQueryWithComputedColumnsMultipleRowGroups,
replace direct assertSql(...) calls used to validate query results inside
assertEventually with assertQueryNoLeakCheck(...) to follow the test guideline
and avoid duplicate leak checks; locate the three assertSql invocations inside
the lambda passed to assertEventually (the SELECT from "sys.copy_export_log",
the count(*) read_parquet, and the two read_parquet result checks) and call
assertQueryNoLeakCheck with the same expected string and SQL string for each
instead, leaving CopyExportRunnable, runAndFetchCopyExportID, testCopyExport,
and assertEventually usage unchanged.
- Around line 1542-1597: The test method
testCopyStreamingExportFailureCleansUpTempDir uses assertSql to assert query
results inside the assertEventually block; replace that call with
assertQueryNoLeakCheck to follow the test guideline. Locate the assertSql(...)
invocation inside the assertEventually lambda in
CopyExportTest.testCopyStreamingExportFailureCleansUpTempDir and call
assertQueryNoLeakCheck with the same expected-result string and SQL query
("SELECT status FROM \"sys.copy_export_log\" LIMIT -1"), leaving the rest of the
cleanup assertions unchanged.
- Around line 1183-1219: The test uses assertSql to validate query results
inside the assertion closure; replace those assertSql calls with
assertQueryNoLeakCheck to follow the test guideline and avoid duplicate leak
checks — specifically update the two calls to assertSql within the
assertEventually block (the "SELECT export_path, num_exported_files, status FROM
\"sys.copy_export_log\" LIMIT -1" assertion and the "SELECT * FROM
read_parquet(...)" assertion) to use assertQueryNoLeakCheck while keeping the
same expected result strings and SQL queries; no other logic changes to
testCopyExport, CopyExportRunnable, assertEventually, or read_parquet invocation
are needed.
- Around line 1221-1257: The test method
testCopyQueryWithComputedAndPassthroughColumns uses assertSql inside the
CopyExportRunnable test runnable which performs query assertions; replace those
assertSql calls with assertQueryNoLeakCheck to follow the guideline and avoid
duplicate leak checks — specifically update the two assertSql invocations (one
checking sys.copy_export_log and one reading the parquet file) inside the lambda
assigned to CopyExportRunnable test so they call assertQueryNoLeakCheck with the
same expected string and SQL query arguments.
- Around line 1443-1477: In testCopyQueryWithFullMaterialization, the query
result assertions should use assertQueryNoLeakCheck instead of assertSql
(duplicate of previous guideline); update the assertions inside the
assertEventually lambda (the two calls to assertSql that check
"sys.copy_export_log" and the read_parquet output) to call
assertQueryNoLeakCheck with the same expected string and SQL query, leaving the
surrounding test structure (CopyExportRunnable stmt/test and testCopyExport
invocation) unchanged.
- Around line 1339-1372: The test testCopyQueryWithComputedSymbolColumn uses
assertSql for asserting query results inside the assertEventually block; replace
those assertSql(...) calls with assertQueryNoLeakCheck(...) to follow the test
guideline (update both the export log assertion and the read_parquet assertion),
keeping the same SQL strings and expected results and leaving the surrounding
assertEventually, CopyExportRunnable stmt, and testCopyExport(...) usage
unchanged.
- Around line 1599-1644: In testCopyStreamingExportMoveFailureCleansUpTempDir
replace the use of assertSql(...) inside the assertEventually block with
assertQueryNoLeakCheck(...) to follow the test guideline about using
assertQueryNoLeakCheck for query assertions; locate the assertion call in the
CopyExportTest method (the assertSql that checks "SELECT status FROM
\"sys.copy_export_log\" LIMIT -1") and swap it to assertQueryNoLeakCheck with
the same expected result string so the query assertion does not trigger the
standard memory-leak check twice.

---

Nitpick comments:
In
`@core/src/main/java/io/questdb/cutlass/http/processors/ExportQueryProcessorState.java`:
- Around line 56-95: The class member declarations are out of alphabetical order
within their visibility groups; reorder the instance fields and helper members
so they are grouped by visibility (private vs package-private) and sorted
alphabetically. Specifically, move and reorder symbols such as arrayState,
columnIndex, columnValueFullySent, copyExportContext, copyID, count, countRows,
createParquetOp, cursor, delimiter, descending, errorMessage, errorPosition,
firstParquetWriteCall, hasNext, metadata, noMeta, pageFrameCursor,
parquetExportMode, parquetExportTableName, parquetFileOffset, pausedQuery,
queryCacheable, queryState, record, recordCursorFactory, recordCursor, rnd,
serialExporterInit, serialParquetExporter, setup fields like sqlText, skip,
stop, task, timeout, writeCallback, materializer, materializerColumnData,
fileName, exportModel so each visibility block is alphabetized; apply the same
reordering to the other block referenced (lines ~212-226) to satisfy the coding
guideline.

In `@core/src/main/java/io/questdb/cutlass/parquet/CopyExportRequestTask.java`:
- Around line 56-77: Reorder the newly added instance fields and their accessors
in CopyExportRequestTask so they follow alphabetical order within the same
visibility group: move exportMode, fileName, metadata, now, nowTimestampType,
pageFrameCursor, parquetVersion, rawArrayEncoding, rowGroupSize, rowGroupSize,
selectFactory, selectText, statisticsEnabled, tableName, tempTableFactory,
writeCallback into the correct alphabetical positions among the other private
instance fields; likewise, sort the corresponding public/private getters and
setters (the methods introduced around the 150-193 and 260-289 ranges)
alphabetically within their visibility block so method names like
getExportMode/setExportMode, getSelectFactory/setSelectFactory,
getSelectText/setSelectText, getTempTableFactory/setTempTableFactory appear in
proper alphabetical order relative to the other accessors in
CopyExportRequestTask.

In
`@core/src/main/java/io/questdb/cutlass/parquet/HTTPSerialParquetExporter.java`:
- Around line 240-254: Public instance methods are out of alphabetical order:
reorder the new public APIs so they are alphabetically sorted within the
public-method block. Specifically, place setExportMode, setupCursorBasedExport,
and setupPageFrameBackedExport in alphabetical order (setExportMode,
setupCursorBasedExport, setupPageFrameBackedExport is already alphabetical by
name—ensure they are placed in the correct public-method section and, if not,
move them so the block's public instance methods follow alphabetical sorting),
update any nearby public methods to maintain the overall alphabetical ordering
required by the class member grouping rules.

In `@core/src/main/java/io/questdb/cutlass/parquet/HybridColumnMaterializer.java`:
- Around line 76-109: The instance fields in HybridColumnMaterializer are not
ordered per project rules; reorder the field declarations so all static fields
are first, then instance fields grouped by visibility (private, protected,
public), and within each group sorted alphabetically; specifically adjust the
ordering of DEFAULT_PAGE_SIZE, auxBuffers, baseColumnMap, bufferPool,
computedBufferIdx, computedColumnIndices, computedFunctions,
computedIsSymbolToString, computedOutputTypes, computedCount, dataBuffers,
decimal128A, decimal256A, functionRecord, functions, hybridSymbolTableSource,
outputColumnCount, pageFrameRecord, pfMemory, pinnedAuxBuffers,
pinnedDataBuffers, adjustedMetadata (and any other members) so private instance
fields appear alphabetically and after any static fields in the class
HybridColumnMaterializer.

In `@core/src/main/java/io/questdb/cutlass/parquet/ReusablePageFrameMemory.java`:
- Around line 41-48: The private instance fields in ReusablePageFrameMemory are
not in alphabetical order; reorder them alphabetically to comply with the coding
guideline. Specifically, within class ReusablePageFrameMemory place the private
fields in this alphabetical order: auxPageAddresses, auxPageSizes, columnCount,
hasColumnTops, pageAddresses, pageSizes, rowIdOffset (keep their types and
MemoryTag/NATIVE_PARQUET_EXPORTER initializers unchanged). Ensure grouping by
visibility remains the same (private instance fields together) and update any
constructor or initialization references if reordering affects readability.

In `@core/src/main/java/io/questdb/cutlass/parquet/SQLSerialParquetExporter.java`:
- Around line 69-80: The instance fields in SQLSerialParquetExporter are out of
alphabetical order; reorder the instance-field block so members are alphabetized
by name (e.g., configuration, copyExportRoot, exportPath, ff, fromParquet,
identityColumnMap, nameSink, streamBuffers, streamColumnData, tempPath,
toParquet, writeCallback) and keep their visibility/grouping intact—update the
declaration order for the fields identityColumnMap, nameSink, streamBuffers,
streamColumnData, writeCallback (and copyExportRoot) to match this alphabetical
ordering while leaving types and initializers unchanged.

In `@core/src/main/java/io/questdb/griffin/PriorityMetadata.java`:
- Around line 68-70: The method getVirtualColumnReservedSlots() is out of
alphabetical order among the public getters; move its declaration so it sits
alphabetically after the other public get* methods in the class (i.e., relocate
getVirtualColumnReservedSlots() to the block containing the other public
getters), ensuring member grouping and visibility remain unchanged and the class
follows the alphabetical ordering guideline.

Comment on lines +792 to +821
@Test
public void testParquetExportCursorBasedMultipleRowGroups() throws Exception {
getExportTester()
.run((engine, sqlExecutionContext) -> {
// CROSS JOIN produces a CURSOR_BASED factory (no page frame support).
// 10 x 500 = 5000 rows with computed columns and small row groups
// exercises the buffer pinning fix in the cursor-based path.
engine.execute("""
CREATE TABLE cb_t1 AS (
SELECT x AS a FROM long_sequence(10)
)""", sqlExecutionContext);
engine.execute("""
CREATE TABLE cb_t2 AS (
SELECT x AS b,
rnd_str(5, 10, 0) AS s,
rnd_varchar(5, 10, 0) AS vc,
rnd_double_array(1, 5) AS arr
FROM long_sequence(500)
)""", sqlExecutionContext);

String[] queries = {
"SELECT cb_t1.a + cb_t2.b AS sum_ab, cb_t2.s FROM cb_t1 CROSS JOIN cb_t2",
"SELECT cb_t1.a * cb_t2.b AS product, cb_t2.s FROM cb_t1 CROSS JOIN cb_t2",
// VARCHAR and ARRAY through cursor-based buffers
"SELECT cb_t1.a + cb_t2.b AS sum_ab, cb_t2.vc FROM cb_t1 CROSS JOIN cb_t2",
"SELECT cb_t1.a + cb_t2.b AS sum_ab, cb_t2.arr FROM cb_t1 CROSS JOIN cb_t2",
};

assertParquetExportDataCorrectness(engine, sqlExecutionContext, queries, queries.length * 3, 200);
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use test helpers for DDL and prefer text blocks for long SQL.

New tests still use engine.execute(...) for DDL and build long SQL via string concatenation (e.g., Line 1489 and Line 1597). Please switch to execute() for non-queries and use multiline string literals for longer SQL statements.
As per coding guidelines: **/src/test/java/**/*.java: Use execute() to run non-queries (DDL) in tests; Use multiline strings for longer SQL statements (multiple INSERT rows, complex queries) and to assert multiline query results in tests.

Also applies to: 1405-1657

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/cutlass/http/ExpParquetExportTest.java`
around lines 792 - 821, In testParquetExportCursorBasedMultipleRowGroups replace
direct engine.execute(...) DDL calls with the test helper execute(...) and
convert long SQL strings to Java text blocks; specifically, change the two
CREATE TABLE calls that currently use engine.execute(...) to use execute(sql,
sqlExecutionContext) and rewrite their SQL arguments as multiline text blocks
("""...""") instead of concatenated strings, and ensure remaining long query
strings in the queries array are also expressed as text blocks; keep the call to
assertParquetExportDataCorrectness(...) unchanged.

Comment on lines +1374 to +1410
@Test
public void testCopyQueryWithCursorBasedMultipleRowGroups() throws Exception {
// CURSOR_BASED: CROSS JOIN produces a non-page-frame factory.
// Small row_group_size forces multiple batches and exercises buffer pinning.
assertMemoryLeak(() -> {
execute("CREATE TABLE cb_t1 AS (SELECT x AS a FROM long_sequence(10))");
execute("""
CREATE TABLE cb_t2 AS (
SELECT x AS b,
rnd_str(5, 10, 0) AS s,
rnd_varchar(5, 10, 0) AS vc,
rnd_double_array(1, 5) AS arr
FROM long_sequence(500)
)""");

// All columns materialized through buffers in cursor mode:
// STRING (s), VARCHAR (vc), and ARRAY (arr) exercise var-size buffer pinning
CopyExportRunnable stmt = () ->
runAndFetchCopyExportID(
"COPY (SELECT cb_t1.a + cb_t2.b AS sum_ab, cb_t2.s, cb_t2.vc, cb_t2.arr FROM cb_t1 CROSS JOIN cb_t2) TO 'cb_rg_output' WITH FORMAT parquet row_group_size 100",
sqlExecutionContext
);

CopyExportRunnable test = () ->
assertEventually(() -> {
assertSql(
"export_path\tnum_exported_files\tstatus\n" +
exportRoot + File.separator + "cb_rg_output.parquet\t1\tfinished\n",
"SELECT export_path, num_exported_files, status FROM \"sys.copy_export_log\" LIMIT -1"
);
assertSql("count\n5000\n",
"SELECT count(*) FROM read_parquet('" + exportRoot + File.separator + "cb_rg_output.parquet')");
});

testCopyExport(stmt, test);
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a text block for the complex COPY query; align assertions with assertQueryNoLeakCheck. The CROSS JOIN COPY statement is long and would be clearer as a multiline string.

♻️ Example refactor
-            CopyExportRunnable stmt = () ->
-                    runAndFetchCopyExportID(
-                            "COPY (SELECT cb_t1.a + cb_t2.b AS sum_ab, cb_t2.s, cb_t2.vc, cb_t2.arr FROM cb_t1 CROSS JOIN cb_t2) TO 'cb_rg_output' WITH FORMAT parquet row_group_size 100",
-                            sqlExecutionContext
-                    );
+            CopyExportRunnable stmt = () ->
+                    runAndFetchCopyExportID(
+                            """
+                                    COPY (
+                                        SELECT cb_t1.a + cb_t2.b AS sum_ab,
+                                               cb_t2.s,
+                                               cb_t2.vc,
+                                               cb_t2.arr
+                                        FROM cb_t1 CROSS JOIN cb_t2
+                                    ) TO 'cb_rg_output' WITH FORMAT PARQUET row_group_size 100
+                                    """,
+                            sqlExecutionContext
+                    );

As per coding guidelines: Use assertQueryNoLeakCheck() to assert the results of queries in tests; use multiline strings for longer statements (multiple INSERT rows, complex queries).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/CopyExportTest.java` around lines
1374 - 1410, The testCopyQueryWithCursorBasedMultipleRowGroups contains a long
inline COPY SQL and uses assertSql/assertEventually for result checks; refactor
the COPY statement passed to runAndFetchCopyExportID into a Java text block
(multiline string) and replace the assertSql/assertEventually result assertions
with assertQueryNoLeakCheck to follow the test conventions. Specifically, update
the CopyExportRunnable lambda that calls runAndFetchCopyExportID to use a
"""...""" text block for the COPY (including the WITH FORMAT parquet
row_group_size 100 clause) and change the two assertSql(...) calls inside the
test lambda (which currently validate sys.copy_export_log and read_parquet(...))
to equivalent assertQueryNoLeakCheck(...) calls while keeping the same SQL and
expected output; leave identifiers like testCopyExport, runAndFetchCopyExportID,
CopyExportRunnable, and assertEventually in place where needed.

mtopolnik and others added 10 commits February 21, 2026 14:29
In CURSOR_BASED mode, buildColumnDataFromCursor() dispatched
writeColumnValue() using the adjusted output type (STRING for SYMBOL
columns). This caused record.getStrA(col) to be called instead of
record.getSymA(col), producing incorrect data or errors.

Track the original (pre-adjustment) column types in a separate
computedSourceTypes list and use them for writeColumnValue() dispatch,
so SYMBOL columns correctly go through the getSymA() path. The adjusted
output types are still used for buffer management and addColumnData().

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The test was not exercising the temp table export path because
the streaming export path (introduced on this branch) handled
the query without creating a temp table. Restore the original
FilesFacade mmap-failure injection from master, and change the
table schema to include a BINARY column with a WHERE clause to
force the TEMP_TABLE export mode. This ensures the test actually
triggers temp table creation, mmap failure, and cleanup.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Move determineExportMode(), hasComputedBinaryColumn(), and
unwrapFactory() from HybridColumnMaterializer to ParquetExportMode.
These are general-purpose dispatch methods that determine which
export strategy to use — they don't belong on one specific
strategy implementation.

Also fold the descending-order fallback (PAGE_FRAME_BACKED to
CURSOR_BASED) into determineExportMode() via a new isDescending
parameter, eliminating duplicated post-hoc fixups in
ExportQueryProcessor.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The three setUp() overloads in StreamPartitionParquetExporter
shared ~80% of their structure (buffer init, column iteration,
SYMBOL handling, createStreamingParquetWriter call). The
parameterless setUp() was a full copy of the three-arg version
with slightly different SYMBOL resolution and writer index logic.

Collapse into a single implementation that the other two delegate
to. The key insight is that meta.getWriterIndex(i) already returns
the correct value for all paths: the physical writer column index
for table metadata, and i for GenericRecordMetadata (hybrid/cursor
paths). The baseColumnMap nullity distinguishes "all columns are
real table columns" from "only mapped columns are pass-through".

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Both HTTPSerialParquetExporter.process() and
SQLSerialParquetExporter.process() had nearly identical finally
blocks for dropping the temporary table used in TEMP_TABLE export
mode: set phase, update status, resolve table token, drop table,
and handle CairoException/Throwable with identical log-and-update
logic.

Extract this into a protected dropTempTable() method on the base
class. The SQL subclass previously used its fromParquet Path field
for the drop call; the shared method uses Path.getThreadLocal()
instead, which is equivalent since dropTableOrViewOrMatView
consumes the path immediately. The HTTP-specific
freeOwnedPageFrameCursor() call remains at the call site since it
is not part of the drop logic.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Extract shared state and methods from HTTPSerialParquetExporter
into a new abstract BaseParquetExporter class. Both
HTTPSerialParquetExporter and SQLSerialParquetExporter now extend
this base directly.

The base class holds fields (copyExportContext,
insertSelectReporter, sqlExecutionContext, circuitBreaker, task)
and methods (of(), dropTempTable(), drainHybridFrames()) that both
export paths need. HTTP-specific streaming state (exportMode,
fullCursor, materializer, materializerColumnData, streamingPfc)
stays in HTTPSerialParquetExporter where it belongs.

This removes the awkward dependency where SQLSerialParquetExporter
extended HTTPSerialParquetExporter despite not using any of its
HTTP-specific fields or methods.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The identity column map [0, 1, 2, ..., n-1] was rebuilt on every
DIRECT_PAGE_FRAME export invocation. Since the field is an instance
variable and its contents are deterministic for a given size, guard
the rebuild with a size check so it is only constructed once per
distinct column count.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace three separate catch blocks (SqlException, CairoException,
Throwable) in HTTPSerialParquetExporter.process() with a single
catch (Throwable) that uses pattern variables to extract the error
message and errno per type. All three blocks performed identical
cleanup: log, free factory, clear export resources, update status,
and rethrow — differing only in how they obtained the message and
errno.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
assert exportMode != null;
switch (exportMode) {
case PAGE_FRAME_BACKED, CURSOR_BASED -> processHybridStreamExport();
case DIRECT_PAGE_FRAME, TABLE_READER, TEMP_TABLE -> processStreamExport();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should handle default case as a failsafe

Copy link
Copy Markdown
Member

@bluestreak01 bluestreak01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Design & Architecture

The mode detection in ParquetExportMode.determineExportMode() is clean and well-reasoned:

  • DIRECT_PAGE_FRAME (zero-copy) → PAGE_FRAME_BACKED (hybrid) → CURSOR_BASED (full materialization) → TEMP_TABLE (fallback)

The BINARY column exclusion from the hybrid/cursor paths is the right call given the unbounded data size. The BaseParquetExporter extraction cleanly separates HTTP vs SQL concerns while sharing drainHybridFrames() and dropTempTable().

The zero-copy claim is valid for pass-through columns in buildColumnDataFromPageFrame() — page frame addresses are read directly with no data copying. Buffer pinning/pooling prevents use-after-free while minimizing allocation. Resource cleanup ordering is correct (Rust writer closed before native buffers freed). The ownership model (state owns, exporter borrows) is clear.


Issues

1. Field ordering violation in HybridColumnMaterializer

HybridColumnMaterializer.java:103-104adjustedMetadata and hybridSymbolTableSource are declared after pinnedDataBuffers/pinnedAuxBuffers, but alphabetically they should come first among the instance fields (before auxBuffers).

2. Pinned buffer release heuristic lacks invariant documentation

BaseParquetExporter.java:89-93:

if (inFlightRows <= rowCount) {
    mat.releasePinnedBuffers();
}

This relies on a non-obvious invariant: that when the in-flight row count drops to at most one frame's worth, the Rust writer has consumed all data from the previously pinned buffers. A comment documenting why this is safe (e.g., "Rust processes frames FIFO and rowsWrittenToRowGroups covers all frames up to the written row group boundary") would be valuable.

3. writeColumnValue() and writeComputedValue() are near-duplicates

HybridColumnMaterializer.java:522-575 and 577-625 are two large switch statements covering every column type — one operates on Record + column index, the other on Function + Record. Any new column type must be added to both. A comment noting they must stay in sync would help prevent future drift.

4. SYMBOL handling asymmetry between the two writeXxx methods

In writeColumnValue() (line 536-540), SYMBOL is explicitly handled — calling record.getSymA(col) and converting to STRING. But in writeComputedValue(), there is no SYMBOL case. The materializeComputedColumns() method (line 447) handles the symbol-to-string case separately via computedIsSymbolToString, calling func.getSymbol() outside of writeComputedValue(). This works, but the asymmetry makes it easy to overlook that writeComputedValue() is intentionally missing the SYMBOL branch. If someone were to call writeComputedValue() with a SYMBOL type directly, it would hit the default branch and throw. Worth a comment.

5. PR description — active voice

The CLAUDE.md says to use active voice. A few passive constructions in the PR description:

  • "Per-column metadata are pre-computed into flat arrays at setup time" → "setUp() pre-computes per-column metadata into flat arrays"
  • "Computed SYMBOL columns are converted to STRING" → "The materializer converts computed SYMBOL columns to STRING"

6. computedCount reset in resetBuffers()

HybridColumnMaterializer.java:494-500computedCount is temporarily zeroed and then re-incremented through allocateBuffer() during the pin-and-reallocate path. This is correct but subtle. A brief comment explaining why computedCount is temporarily zeroed would help.


Resource Management

Resource cleanup ordering is correct: ExportQueryProcessorState.clear() closes the Rust streaming writer (via task.clear()) before freeing materializer buffers (via materializer.clear()), preventing use-after-free.


Test Coverage

Well covered:

  • Computed columns (CAST, arithmetic, string functions)
  • NULL values, SYMBOL-to-STRING conversion
  • Multiple row groups, cursor-based mode, column tops

Missing or weak:

  • PeerIsSlowToReadException resume in hybrid mode — the HTTP resume codepath for processHybridStreamExport() / processCursorBasedExport() is untested.
  • NULL values spanning row group boundaries — existing NULL tests use small data fitting one row group.
  • Cursor-based mode with NULLs across row groups — not tested.

I've pushed tests covering the latter two gaps on this branch (5 new tests, all green):

  • CopyExportTest#testCopyQueryWithComputedNullValuesMultipleRowGroups
  • CopyExportTest#testCopyQueryWithCursorBasedNullsMultipleRowGroups
  • CopyExportTest#testCopyQueryWithComputedColumnsDescending
  • ExpParquetExportTest#testParquetExportPageFrameComputedColumnsNulls
  • ExpParquetExportTest#testParquetExportPageFrameCursorBasedNulls

Add 5 new tests covering gaps identified during code review:

- CopyExportTest#testCopyQueryWithComputedNullValuesMultipleRowGroups:
  PAGE_FRAME_BACKED with periodic NULLs in computed columns spanning
  multiple row group flushes (row_group_size 50, 1000 rows).

- CopyExportTest#testCopyQueryWithCursorBasedNullsMultipleRowGroups:
  CURSOR_BASED (CROSS JOIN) with NULL sentinels for both fixed-size
  and var-size columns across row group boundaries.

- CopyExportTest#testCopyQueryWithComputedColumnsDescending:
  PAGE_FRAME_BACKED with computed columns and small row_group_size
  forcing multiple row group flushes, verifying data correctness.

- ExpParquetExportTest#testParquetExportPageFrameComputedColumnsNulls:
  HTTP export with periodic NULLs in computed columns, randomized
  row_group_size via assertParquetExportDataCorrectness().

- ExpParquetExportTest#testParquetExportPageFrameCursorBasedNulls:
  HTTP cursor-based export with NULLs in CROSS JOIN results.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@bluestreak01 bluestreak01 merged commit 8205f87 into master Feb 24, 2026
7 of 41 checks passed
@bluestreak01 bluestreak01 deleted the mt_parquet-direct branch February 24, 2026 16:57
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 10 / 11 (90.91%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/SqlCompilerImpl.java 7 8 87.50%
🔵 io/questdb/griffin/SqlExecutionContext.java 2 2 100.00%
🔵 io/questdb/cairo/security/AllowAllSecurityContext.java 1 1 100.00%

bluestreak01 pushed a commit that referenced this pull request Feb 25, 2026
maciulis pushed a commit to maciulis/questdb that referenced this pull request Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(sql): Parquet export with functions in projection falls back to slow temp table path

3 participants