fix(pgwire): fix protocol corruption when sending arrays from server to client#6455
fix(pgwire): fix protocol corruption when sending arrays from server to client#6455bluestreak01 merged 13 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces binary and text mode testing for psycopg3 connections, refactors PGWire array column binary size calculations with sentinel-based header management, updates test expectations with consistent spacing, and removes legacy WAL skip logic from PGArraysTest while adding queue draining. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
before it was conflated with header sending
skipOnWal() was designed for tests that don't touch physical tables, where running in both WAL and non-WAL modes would be redundant. However, since WAL mode is now randomized, a test marked with skipOnWal() may never execute if the randomizer picks WAL mode.
|
@coderabbitai review please |
|
@jerrinot: I'll review the changes in this PR that fix the protocol corruption issue when sending arrays from server to client. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java (1)
236-247: Remove duplicate comments.Lines 239-240 and 244-246 contain duplicated "take rough upper estimate based on the string length" comments. This appears to be a copy-paste artifact.
case ColumnType.STRING -> { final CharSequence strValue = record.getStrA(columnIndex); - // take rough upper estimate based on the string length yield strValue == null ? Integer.BYTES : Integer.BYTES + 3L * strValue.length(); - // take rough upper estimate based on the string length } case ColumnType.SYMBOL -> { final CharSequence symValue = record.getSymA(columnIndex); - // take rough upper estimate based on the string length yield symValue == null ? Integer.BYTES : Integer.BYTES + 3L * symValue.length(); - // take rough upper estimate based on the string length }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
compat/src/test/python/runner_psycopg3.py(2 hunks)compat/src/test/resources/test_cases.yaml(25 hunks)core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java(5 hunks)core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java(4 hunks)core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java(8 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-04T09:16:27.366Z
Learnt from: jerrinot
Repo: questdb/questdb PR: 5996
File: core/src/test/java/io/questdb/test/cutlass/http/line/LineHttpSenderTest.java:350-356
Timestamp: 2025-08-04T09:16:27.366Z
Learning: In QuestDB ILP tests, maximum dimension values like `(1 << 28) - 1` are intentionally hard-coded instead of using constants like `ArrayView.DIM_MAX_LEN` to serve as "canaries" that will cause test failures if the API limits change, ensuring developers are aware of breaking changes to the ILP API.
Applied to files:
core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java
🧬 Code graph analysis (2)
compat/src/test/python/runner_psycopg3.py (3)
compat/src/test/python/runner_asyncpg.py (1)
run_test(98-158)compat/src/test/python/runner_psycopg2.py (1)
run_test(98-131)core/rust/qdb-sqllogictest/sqllogictest-engines/src/external.rs (1)
connect(68-82)
core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java (1)
core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java (1)
PGUtils(40-307)
🪛 Ruff (0.14.5)
compat/src/test/python/runner_psycopg3.py
156-156: Possible hardcoded password assigned to argument: "password"
(S106)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (13)
compat/src/test/python/runner_psycopg3.py (2)
99-103: LGTM! Clean implementation of binary mode support.The addition of the
binaryparameter torun_testand its propagation to cursor creation correctly enables dual-mode testing.
141-161: Dual-mode testing implementation is well-structured.The nested loop structure correctly runs all tests in both binary and text modes, with proper connection isolation per iteration. Creating a new connection for each test iteration ensures clean test state.
Regarding the hardcoded password at line 156: while static analysis flags this, it's acceptable in test code using QuestDB's default credentials.
compat/src/test/resources/test_cases.yaml (1)
5-625: LGTM! Consistent formatting improves readability.The changes throughout this file are purely cosmetic, adding consistent spacing to array and list elements. No test semantics or expected values have been altered.
core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java (1)
98-98: LGTM! Consistent WAL handling across tests.The
drainWalQueue()calls are correctly placed after data modifications (inserts/updates) and before queries. This ensures WAL data is flushed before reading, enabling these tests to run correctly in both WAL and non-WAL modes rather than being skipped.Also applies to: 117-117, 366-366, 388-388, 417-418, 472-473, 520-522, 597-598, 619-620, 636-637, 694-694
core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java (4)
61-73: LGTM! Clean separation of header and resume size calculations.The new API correctly separates:
calculateArrayColBinSizeIncludingHeaderfor initial array serializationcalculateArrayResumeColBinSizefor resumed partial sendsThe nullCount derivation
cardinality - notNullCountis correct.
170-180: Correct sentinel-based size calculation.The logic properly handles the three states:
-1: Initial send - includes header size, counts all elements0: After header sent - no header, resume from element 0>0: Partial resume - no header, resume from element N
Math.max(0, arrayResumePoint)correctly normalizes -1 to 0 for element counting while preserving the header inclusion check at line 176.
259-265: Header size calculation matches serialization.The formula correctly accounts for:
- 4 fixed
Integer.BYTESfields (size, dim count, has-nulls flag, component type)- 2 ×
Integer.BYTESper dimension (length + lower bound)This matches the serialization in
PGPipelineEntry.outColBinArr.
189-197: Clean refactor to switch expression.The logic is preserved while improving readability. The explicit
AssertionErrorfor unsupported element types is appropriate given the assertion at lines 167-169 that limits to DOUBLE/LONG.core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java (5)
173-173: Critical fix: Sentinel -1 distinguishes "header not sent" from "resume at element 0".This is the core fix for the protocol corruption. Previously,
0was overloaded to mean both "header not sent yet" and "resume from element 0", causing the header to be re-sent on buffer overflow recovery. The new-1sentinel unambiguously indicates "header not yet written."
1077-1090: Correct per-column resume point handling.The logic properly applies
outResendArrayFlatIndexonly to the column being resumed (outResendColumnIndex), while other array columns use-1to indicate they haven't been started (need full send with header).
1721-1772: Correct header/element separation with proper state transitions.The flow is now:
-1→ send header, bookmark, transition to0>= 0→ send elements, incrementing index after each with bookmark- On successful completion → reset to
-1- On
NoSpaceLeftInResponseBufferException→ resetToBookmark preserves last good stateThe size calculation at line 1729 correctly subtracts
Integer.BYTESbecause the size field value excludes its own size (standard PG wire protocol).
309-312: State reset in close() is complete.The reset of
outResendArrayFlatIndexto-1at line 309 is part of the complete state cleanup. The corresponding per-record resets (lines 2884-2886) and per-array reset (line 1767) ensure consistent state during normal operation, whileclose()handles pooling cleanup.
1714-1719: LGTM!Null array handling (dimCount == 0) correctly short-circuits before any header/element processing, leaving
outResendArrayFlatIndexunchanged. This is appropriate since null arrays require no resend state.
|
an unrelated failure in a MV test - |
[PR Coverage check]😞 fail : 24 / 50 (48.00%) file detail
|
Ent PR: https://github.com/questdb/questdb-enterprise/pull/790
the problem was reproducible with this python script: