fix(pgwire): allow large varchar column regardless of send buffer size#6603
fix(pgwire): allow large varchar column regardless of send buffer size#6603bluestreak01 merged 6 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces partial writing support for UTF-8 sequences and VARCHAR data in PostgreSQL wire protocol handling. It adds new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
1. resetIncompleteRecord() wouldn't reset column resumePoint 2. estimateRecordSize() would be 0 as a resume point instead of -1 -> array header was not included
|
@coderabbitai please review this PR |
|
@jerrinot: I'll review this PR for you. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java:
- Line 10099: Replace uses of TestUtils.assertContains("VARCHAR content mismatch
at row " + i, expectedValue, actualValue) with an exact equality assertion: call
Assert.assertEquals("VARCHAR content mismatch at row " + i, expectedValue,
actualValue) (or org.junit.Assert.assertEquals if static import not used) so the
test checks full string equality rather than containment; apply the same
replacement for the other occurrences of TestUtils.assertContains at the
indicated locations and add/import Assert if necessary.
🧹 Nitpick comments (1)
core/src/main/java/io/questdb/cutlass/pgwire/PGConnectionContext.java (1)
1822-1837: Implementation is correct but consider performance optimization.The partial write logic correctly:
- Calculates available space accounting for
checkCapacity's strict inequality (<not<=)- Clamps the write size to available buffer space
- Updates the buffer pointer and returns bytes written
However, the byte-by-byte copy loop (lines 1831-1833) will be slow for large VARCHAR values. The existing
put(Utf8Sequence)method (line 1696) usesUtf8s.strCpy()for bulk copying. Consider using a bulk copy mechanism if theUtf8sutility supports offset-based copying, or usingUnsafe.copyMemory()if the Utf8Sequence provides direct memory access.⚡ Suggested performance optimization
If
Utf8Sequenceprovides a way to get the underlying memory address or if there's a bulk copy utility that supports offsets, replace the loop with:// Pseudo-code - actual implementation depends on Utf8Sequence API if (us.hasDirectMemory()) { Unsafe.getUnsafe().copyMemory( us.address() + offset, sendBufferPtr, toWrite ); sendBufferPtr += toWrite; } else { // fallback to current byte-by-byte copy }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/src/main/java/io/questdb/cutlass/pgwire/PGConnectionContext.javacore/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.javacore/src/main/java/io/questdb/cutlass/pgwire/PGResponseSink.javacore/src/main/java/io/questdb/cutlass/pgwire/PGUtils.javacore/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.javacore/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java
🧰 Additional context used
🧬 Code graph analysis (3)
core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java (2)
core/src/main/java/io/questdb/cairo/ColumnType.java (1)
ColumnType(58-1168)core/src/main/java/io/questdb/network/NoSpaceLeftInResponseBufferException.java (1)
NoSpaceLeftInResponseBufferException(30-57)
core/src/main/java/io/questdb/cutlass/pgwire/PGConnectionContext.java (1)
core/src/main/java/io/questdb/std/Unsafe.java (1)
Unsafe(40-542)
core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java (1)
core/src/test/java/io/questdb/test/tools/TestUtils.java (1)
TestUtils(154-2667)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
- GitHub Check: New pull request (Coverage Report Coverage Report)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (18)
core/src/main/java/io/questdb/cutlass/pgwire/PGResponseSink.java (1)
139-147: LGTM: Well-documented interface addition for partial write support.The new
putPartialmethod signature is clear and well-documented. The return value (actual bytes written) enables callers to track progress when writing Utf8 data in fragments, which aligns with the PR's goal of supporting large VARCHAR columns regardless of buffer size.core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java (1)
744-780: Good test coverage for combined large VARCHAR and array fragmentation.The test validates the new fragmented sending capability by combining large VARCHAR (500-999 chars) and array (100-999 elements) in the same row across 9 rows. The buffer sizing and fragmentation settings should exercise partial write scenarios effectively.
Minor observation: The receive buffer is sized at
4 * (elemCount + varcharSize)whileforceRecvFragmentationChunkSize = Integer.MAX_VALUEdisables receive fragmentation. This focuses the test on send-side fragmentation, which aligns with the PR objectives.core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java (3)
87-87: LGTM: Parameter rename reflects broader usage.The parameter rename from
arrayResumePointtoresumePointappropriately reflects its expanded role in tracking partial send progress for both VARCHAR and ARRAY types, not just arrays.
139-146: VARCHAR partial send logic looks correct.The implementation properly handles the resume point semantics:
resumePoint == -1: First send, include header (4 bytes) + all dataresumePoint >= 0: Continuation, return only remaining data bytes from the offset (header already sent)The logic correctly handles null values (header only) and calculates remaining bytes based on the resume point.
178-186: ARRAY partial send logic is consistent and correct.The implementation follows the same pattern as VARCHAR:
- Uses
actualResumePoint(clamped to 0) for element-based calculations- Includes header when
resumePoint == -1, skips it otherwise- Calculates remaining elements and their serialized size from the resume point
Note: For arrays,
resumePointrepresents an element index (not byte offset), which is appropriate given arrays have variable-size elements.core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java (5)
10080-10109: Well-structured test for VARCHAR buffer overflow.The test appropriately exercises fragmented VARCHAR sending with values 3x larger than the send buffer, verifies multiple rows, and uses clear assertions with descriptive error messages. The buffer size configuration isolates the send-side fragmentation behavior.
10111-10139: Good coverage for exact buffer size edge case.Testing VARCHAR values close to the buffer boundary (accounting for message headers) is important for validating edge-case behavior in the fragmentation logic. The test appropriately verifies multiple rows to ensure consistency.
10141-10173: Comprehensive test for multiple large VARCHAR columns.Testing multiple large VARCHAR columns in the same row validates that the fragmentation logic correctly handles state across columns. This is a valuable scenario for ensuring the resume-point fix mentioned in the PR objectives works correctly.
10728-10730: Correct adjustment to exclude VARCHAR from oversized columns.The updated logic now excludes VARCHAR (index 0) from being made oversized, aligning with the PR's feature that VARCHAR now supports fragmented sending. The comment clearly documents this distinction.
10767-10767: Appropriate test adjustments for VARCHAR fragmentation behavior.The coordinated changes reflect the new VARCHAR fragmentation capability:
- Reduced
strSizeand updated error message assertions align with new size calculations (VARCHAR no longer contributes to buffer overflow since it fragments).- Added WAL to table definitions and
drainWalQueue()calls ensure test consistency with WAL-enabled tables.- Updated
requiredSizevalues in error messages (1063 and 1024) correctly reflect the recalculated buffer requirements.Also applies to: 10791-10794, 10805-10805, 10812-10812, 10826-10826, 10833-10833
core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java (8)
178-178: LGTM!Clear field declaration with helpful comment explaining the sentinel value semantics:
-1means header not yet sent,0+is the resume offset.
323-323: LGTM!Proper reset to
-1inclose()ensures clean state for pooled entry reuse.
1077-1097: LGTM!The logic correctly:
- Applies
effectiveResumeOffsetonly to the column currently being resumed (line 1079)- Passes
-1for other columns to calculate full size including header- Excludes ARRAY and VARCHAR from the buffer-size check (line 1095) since they support fragmented sending
This addresses the core PR objective of allowing large VARCHAR columns regardless of send buffer size.
1297-1297: LGTM!Passing
-1asresumePointcorrectly calculates full column size including header, which is appropriate for size estimation purposes.
1591-1641: LGTM!The fragmented array sending logic correctly:
- Sends header only once when
outResendResumePoint == -1(lines 1591-1608)- Bookmarks after header to preserve partial progress
- Resumes from
outResendResumePointon retry (line 1615)- Updates resume point after each element (line 1632)
- Resets to
-1on successful completion (line 1637)- Preserves resume point on
NoSpaceLeftInResponseBufferExceptionfor retry
1655-1681: LGTM!The recursive array handling correctly uses
outResendResumePointto:
- Skip already-sent elements by checking
outFlatIndex == outResendResumePoint(line 1656)- Increment resume point after writing each element (line 1680)
This maintains consistency with the vanilla array path in
outColBinArr.
2318-2344: LGTM!The new VARCHAR fragmented sending implementation correctly handles partial writes:
- Null handling: Early return with null value (lines 2320-2323)
- Header-once pattern: Size header sent only when
outResendResumePoint == -1, then set to 0 (lines 2326-2330)- Partial write:
putPartialwrites as many bytes as fit, returns bytes written (line 2333)- Progress tracking: Resume point updated with written bytes, then bookmarked (lines 2334-2335)
- Retry mechanism: If incomplete, throws
NoSpaceLeftInResponseBufferExceptionto trigger flush (lines 2337-2341)- Clean completion: Reset to
-1when done (line 2343)The bookmark placement after
putPartialcorrectly preserves partial progress for retry.
2851-2865: Critical bug fix - LGTM!This addresses the latent bug described in the PR objectives. Previously,
resetIncompleteRecorddid not resetoutResendResumePoint, causing:
- Stale resume offsets to persist across retry attempts
- Column headers to be incorrectly omitted (because
resumePoint != -1)- Malformed packets when data resumed from wrong byte offset
The fix (line 2861) ensures that when a row is abandoned and restarted from scratch, both
outResendColumnIndexandoutResendResumePointare reset, so the retry correctly re-sends headers and starts from offset 0.The comprehensive JavaDoc clearly documents the method's purpose and invariants.
core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java
Outdated
Show resolved
Hide resolved
[PR Coverage check]😍 pass : 44 / 44 (100.00%) file detail
|
Previously,
VARCHARcolumns larger than the PGWire send buffer would fail. This was inconsistent with array columns, which already supported fragmented sending.Additionally, there was a latent bug: when a row couldn't fit in the buffer and had to be abandoned via
resetIncompleteRecord(), the partial-send array offset wasn't being reset. On retry, the code would skip the column's length header and resume from a stale byte offset, creating a malformed packet.