Skip to content

fix(pgwire): allow large varchar column regardless of send buffer size#6603

Merged
bluestreak01 merged 6 commits intomasterfrom
jh_pgwire_varchar_fragmented_col
Jan 7, 2026
Merged

fix(pgwire): allow large varchar column regardless of send buffer size#6603
bluestreak01 merged 6 commits intomasterfrom
jh_pgwire_varchar_fragmented_col

Conversation

@jerrinot
Copy link
Copy Markdown
Contributor

@jerrinot jerrinot commented Jan 6, 2026

Previously, VARCHAR columns larger than the PGWire send buffer would fail. This was inconsistent with array columns, which already supported fragmented sending.

Additionally, there was a latent bug: when a row couldn't fit in the buffer and had to be abandoned via resetIncompleteRecord(), the partial-send array offset wasn't being reset. On retry, the code would skip the column's length header and resume from a stale byte offset, creating a malformed packet.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 6, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces partial writing support for UTF-8 sequences and VARCHAR data in PostgreSQL wire protocol handling. It adds new putPartial() methods to the sink interfaces, refactors state tracking in PGPipelineEntry to use a generic resume offset, updates size calculations to support incremental sends, and adds test coverage for buffer overflow scenarios with VARCHAR and array values.

Changes

Cohort / File(s) Summary
Sink Interface Extensions
core/src/main/java/io/questdb/cutlass/pgwire/PGResponseSink.java, core/src/main/java/io/questdb/cutlass/pgwire/PGConnectionContext.java
Added putPartial(Utf8Sequence, int, int) method to both PGResponseSink and ResponseUtf8Sink inner class to support writing partial UTF-8 sequences into the response buffer, returning the actual number of bytes written.
Pipeline State Refactoring
core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java
Replaced outResendArrayFlatIndex field with outResendResumePoint to generically track byte/element offsets for partial sends of arrays and VARCHAR data. Updated all related send loops, boundary checks, and state reset mechanisms to use the new resume point tracking for chunked serialization.
Sizing Calculation Updates
core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java
Renamed calculateColumnBinSize() parameter from arrayResumePoint to resumePoint and updated VARCHAR and ARRAY handling logic to compute remaining serializable bytes based on the resume offset, enabling partial serialization across multiple sends.
Test Coverage
core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java, core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java
Added testSendBufferOverflowVarcharWithArray() and three new VARCHAR overflow tests (testSendBufferOverflowVarchar, testSendBufferOverflowVarcharExactBufferSize, testSendBufferOverflowVarcharMultipleColumns) to validate partial send behavior under buffer constraints; adjusted existing test expectations for new fragmentation logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

  • questdb/questdb#6455: Modifies pgwire array resend logic in PGPipelineEntry and array/varchar resume-size handling in PGUtils, directly related to the state tracking and calculation changes in this PR.

Suggested labels

Bug

Suggested reviewers

  • bluestreak01
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.86% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the primary change: enabling VARCHAR columns to be sent in fragments regardless of send buffer size, which is the main objective addressed in the PR.
Description check ✅ Passed The description clearly relates to the changeset by explaining the problem (VARCHAR columns failing when larger than send buffer) and the fix (supporting fragmented sending), plus the latent bug about incomplete record reset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jerrinot jerrinot added the Postgres Wire Issues or changes relating to Postgres wire protocol label Jan 6, 2026
1. resetIncompleteRecord() wouldn't reset column resumePoint
2. estimateRecordSize() would be 0 as a resume point instead of -1 -> array header was not included
@jerrinot jerrinot marked this pull request as ready for review January 7, 2026 11:22
@jerrinot
Copy link
Copy Markdown
Contributor Author

jerrinot commented Jan 7, 2026

@coderabbitai please review this PR

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 7, 2026

@jerrinot: I'll review this PR for you.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java:
- Line 10099: Replace uses of TestUtils.assertContains("VARCHAR content mismatch
at row " + i, expectedValue, actualValue) with an exact equality assertion: call
Assert.assertEquals("VARCHAR content mismatch at row " + i, expectedValue,
actualValue) (or org.junit.Assert.assertEquals if static import not used) so the
test checks full string equality rather than containment; apply the same
replacement for the other occurrences of TestUtils.assertContains at the
indicated locations and add/import Assert if necessary.
🧹 Nitpick comments (1)
core/src/main/java/io/questdb/cutlass/pgwire/PGConnectionContext.java (1)

1822-1837: Implementation is correct but consider performance optimization.

The partial write logic correctly:

  • Calculates available space accounting for checkCapacity's strict inequality (< not <=)
  • Clamps the write size to available buffer space
  • Updates the buffer pointer and returns bytes written

However, the byte-by-byte copy loop (lines 1831-1833) will be slow for large VARCHAR values. The existing put(Utf8Sequence) method (line 1696) uses Utf8s.strCpy() for bulk copying. Consider using a bulk copy mechanism if the Utf8s utility supports offset-based copying, or using Unsafe.copyMemory() if the Utf8Sequence provides direct memory access.

⚡ Suggested performance optimization

If Utf8Sequence provides a way to get the underlying memory address or if there's a bulk copy utility that supports offsets, replace the loop with:

// Pseudo-code - actual implementation depends on Utf8Sequence API
if (us.hasDirectMemory()) {
    Unsafe.getUnsafe().copyMemory(
        us.address() + offset, 
        sendBufferPtr, 
        toWrite
    );
    sendBufferPtr += toWrite;
} else {
    // fallback to current byte-by-byte copy
}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bbe902 and 2c1340f.

📒 Files selected for processing (6)
  • core/src/main/java/io/questdb/cutlass/pgwire/PGConnectionContext.java
  • core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java
  • core/src/main/java/io/questdb/cutlass/pgwire/PGResponseSink.java
  • core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java
  • core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java
  • core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java
🧰 Additional context used
🧬 Code graph analysis (3)
core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java (2)
core/src/main/java/io/questdb/cairo/ColumnType.java (1)
  • ColumnType (58-1168)
core/src/main/java/io/questdb/network/NoSpaceLeftInResponseBufferException.java (1)
  • NoSpaceLeftInResponseBufferException (30-57)
core/src/main/java/io/questdb/cutlass/pgwire/PGConnectionContext.java (1)
core/src/main/java/io/questdb/std/Unsafe.java (1)
  • Unsafe (40-542)
core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java (1)
core/src/test/java/io/questdb/test/tools/TestUtils.java (1)
  • TestUtils (154-2667)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
  • GitHub Check: New pull request (Coverage Report Coverage Report)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
  • GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (18)
core/src/main/java/io/questdb/cutlass/pgwire/PGResponseSink.java (1)

139-147: LGTM: Well-documented interface addition for partial write support.

The new putPartial method signature is clear and well-documented. The return value (actual bytes written) enables callers to track progress when writing Utf8 data in fragments, which aligns with the PR's goal of supporting large VARCHAR columns regardless of buffer size.

core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java (1)

744-780: Good test coverage for combined large VARCHAR and array fragmentation.

The test validates the new fragmented sending capability by combining large VARCHAR (500-999 chars) and array (100-999 elements) in the same row across 9 rows. The buffer sizing and fragmentation settings should exercise partial write scenarios effectively.

Minor observation: The receive buffer is sized at 4 * (elemCount + varcharSize) while forceRecvFragmentationChunkSize = Integer.MAX_VALUE disables receive fragmentation. This focuses the test on send-side fragmentation, which aligns with the PR objectives.

core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java (3)

87-87: LGTM: Parameter rename reflects broader usage.

The parameter rename from arrayResumePoint to resumePoint appropriately reflects its expanded role in tracking partial send progress for both VARCHAR and ARRAY types, not just arrays.


139-146: VARCHAR partial send logic looks correct.

The implementation properly handles the resume point semantics:

  • resumePoint == -1: First send, include header (4 bytes) + all data
  • resumePoint >= 0: Continuation, return only remaining data bytes from the offset (header already sent)

The logic correctly handles null values (header only) and calculates remaining bytes based on the resume point.


178-186: ARRAY partial send logic is consistent and correct.

The implementation follows the same pattern as VARCHAR:

  • Uses actualResumePoint (clamped to 0) for element-based calculations
  • Includes header when resumePoint == -1, skips it otherwise
  • Calculates remaining elements and their serialized size from the resume point

Note: For arrays, resumePoint represents an element index (not byte offset), which is appropriate given arrays have variable-size elements.

core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java (5)

10080-10109: Well-structured test for VARCHAR buffer overflow.

The test appropriately exercises fragmented VARCHAR sending with values 3x larger than the send buffer, verifies multiple rows, and uses clear assertions with descriptive error messages. The buffer size configuration isolates the send-side fragmentation behavior.


10111-10139: Good coverage for exact buffer size edge case.

Testing VARCHAR values close to the buffer boundary (accounting for message headers) is important for validating edge-case behavior in the fragmentation logic. The test appropriately verifies multiple rows to ensure consistency.


10141-10173: Comprehensive test for multiple large VARCHAR columns.

Testing multiple large VARCHAR columns in the same row validates that the fragmentation logic correctly handles state across columns. This is a valuable scenario for ensuring the resume-point fix mentioned in the PR objectives works correctly.


10728-10730: Correct adjustment to exclude VARCHAR from oversized columns.

The updated logic now excludes VARCHAR (index 0) from being made oversized, aligning with the PR's feature that VARCHAR now supports fragmented sending. The comment clearly documents this distinction.


10767-10767: Appropriate test adjustments for VARCHAR fragmentation behavior.

The coordinated changes reflect the new VARCHAR fragmentation capability:

  • Reduced strSize and updated error message assertions align with new size calculations (VARCHAR no longer contributes to buffer overflow since it fragments).
  • Added WAL to table definitions and drainWalQueue() calls ensure test consistency with WAL-enabled tables.
  • Updated requiredSize values in error messages (1063 and 1024) correctly reflect the recalculated buffer requirements.

Also applies to: 10791-10794, 10805-10805, 10812-10812, 10826-10826, 10833-10833

core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java (8)

178-178: LGTM!

Clear field declaration with helpful comment explaining the sentinel value semantics: -1 means header not yet sent, 0+ is the resume offset.


323-323: LGTM!

Proper reset to -1 in close() ensures clean state for pooled entry reuse.


1077-1097: LGTM!

The logic correctly:

  1. Applies effectiveResumeOffset only to the column currently being resumed (line 1079)
  2. Passes -1 for other columns to calculate full size including header
  3. Excludes ARRAY and VARCHAR from the buffer-size check (line 1095) since they support fragmented sending

This addresses the core PR objective of allowing large VARCHAR columns regardless of send buffer size.


1297-1297: LGTM!

Passing -1 as resumePoint correctly calculates full column size including header, which is appropriate for size estimation purposes.


1591-1641: LGTM!

The fragmented array sending logic correctly:

  1. Sends header only once when outResendResumePoint == -1 (lines 1591-1608)
  2. Bookmarks after header to preserve partial progress
  3. Resumes from outResendResumePoint on retry (line 1615)
  4. Updates resume point after each element (line 1632)
  5. Resets to -1 on successful completion (line 1637)
  6. Preserves resume point on NoSpaceLeftInResponseBufferException for retry

1655-1681: LGTM!

The recursive array handling correctly uses outResendResumePoint to:

  1. Skip already-sent elements by checking outFlatIndex == outResendResumePoint (line 1656)
  2. Increment resume point after writing each element (line 1680)

This maintains consistency with the vanilla array path in outColBinArr.


2318-2344: LGTM!

The new VARCHAR fragmented sending implementation correctly handles partial writes:

  1. Null handling: Early return with null value (lines 2320-2323)
  2. Header-once pattern: Size header sent only when outResendResumePoint == -1, then set to 0 (lines 2326-2330)
  3. Partial write: putPartial writes as many bytes as fit, returns bytes written (line 2333)
  4. Progress tracking: Resume point updated with written bytes, then bookmarked (lines 2334-2335)
  5. Retry mechanism: If incomplete, throws NoSpaceLeftInResponseBufferException to trigger flush (lines 2337-2341)
  6. Clean completion: Reset to -1 when done (line 2343)

The bookmark placement after putPartial correctly preserves partial progress for retry.


2851-2865: Critical bug fix - LGTM!

This addresses the latent bug described in the PR objectives. Previously, resetIncompleteRecord did not reset outResendResumePoint, causing:

  • Stale resume offsets to persist across retry attempts
  • Column headers to be incorrectly omitted (because resumePoint != -1)
  • Malformed packets when data resumed from wrong byte offset

The fix (line 2861) ensures that when a row is abandoned and restarted from scratch, both outResendColumnIndex and outResendResumePoint are reset, so the retry correctly re-sends headers and starts from offset 0.

The comprehensive JavaDoc clearly documents the method's purpose and invariants.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 44 / 44 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/cutlass/pgwire/PGConnectionContext.java 9 9 100.00%
🔵 io/questdb/cutlass/pgwire/PGPipelineEntry.java 28 28 100.00%
🔵 io/questdb/cutlass/pgwire/PGUtils.java 7 7 100.00%

@bluestreak01 bluestreak01 merged commit 2c7d4b7 into master Jan 7, 2026
43 checks passed
@bluestreak01 bluestreak01 deleted the jh_pgwire_varchar_fragmented_col branch January 7, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Postgres Wire Issues or changes relating to Postgres wire protocol

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants