fix(pgwire): fix protocol corruption when sending arrays from server to client by bluestreak01 · Pull Request #6455 · questdb/questdb

bluestreak01 · 2025-11-26T02:18:38Z

Ent PR: https://github.com/questdb/questdb-enterprise/pull/790

the problem was reproducible with this python script:

from questdb.ingress import Sender, TimestampNanos
import numpy as np
import psycopg

conf = "http::addr=localhost:9000;username=admin;password=quest;"
i = 0
with Sender.from_conf(conf) as sender:
    while i < 10000:
        sender.row(
            'ob',
            symbols={'symbol': 'XXX' },
            columns={
                'bids': np.array(
                [
                    [1.0850, 600000],
                    [1.0849, 300000],
                    [1.0848, 150000]
                ],
                dtype=np.float64
            ),
            'asks': np.array(
                [
                    [1.0853, 500000],
                    [1.0854, 250000],
                    [1.0855, 125000]
                ],
                dtype=np.float64
            )},
            at=TimestampNanos.now())
        sender.flush()
        i+=1


conn = psycopg.connect(
    host='localhost',
    port=8812,
    user='admin',
    password='quest',
    dbname='qdb',
    autocommit=True  # Important for QuestDB
)



batchs = []
with conn.cursor(binary=True) as cur:
    # Execute a query that might return many rows
    cur.execute("SELECT * FROM Ob") #works with LIMIT 5

    batch_size = 10
    total_processed = 0

    while True:
        batch = cur.fetchmany(batch_size)
        
        if not batch:
            break
        batchs += batch
        total_processed += len(batch)
        # if total_processed % 10000 == 0:
        print(f"Processed {total_processed} rows so far...")

    print(f"Finished processing {total_processed} total rows")

…to client

coderabbitai · 2025-11-26T02:18:42Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces binary and text mode testing for psycopg3 connections, refactors PGWire array column binary size calculations with sentinel-based header management, updates test expectations with consistent spacing, and removes legacy WAL skip logic from PGArraysTest while adding queue draining.

Changes

Cohort / File(s)	Summary
Python Test Runner Mode Support `compat/src/test/python/runner_psycopg3.py`	Added `binary` parameter to `run_test()` function. Main test runner now loops over binary and text modes for psycopg connections, running each test in both modes with mode labeling and autocommit enabled.
YAML Test Case Formatting `compat/src/test/resources/test_cases.yaml`	Reformatted YAML list and record representations to include consistent spacing around elements (e.g., `[1]` → `[ 1 ]`, `["BTC-USD", "buy", ...]` → `[ "BTC-USD", "buy", ... ]`). No semantic changes to test logic or data values.
PGWire Array Resend Logic `core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java`	Introduced sentinel value -1 for `outResendArrayFlatIndex` to indicate unsent array header. Updated initialization, reset paths in `close()` and `clear()`, and header-send condition in `outColBinArr` from `== 0` to `== -1`. Adjusted column tail size calculation to compute effective resend indices per column.
PGWire Utility Array Sizing Refactor `core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java`	Refactored array column binary size calculation. Renamed `calculateArrayColBinSize()` to `calculateArrayColBinSizeIncludingHeader()`. Added new public method `calculateArrayResumeColBinSize()` for resume-only sizing. Adjusted internal `calculateColumnBinSize()` logic to compute actual resume point, remaining elements, and not-null count. Refactored `countNotNull()` and `estimateColumnTxtSize()` to use switch expressions. Added private helper `calculateArrayHeaderSize()`.
PGWire Array Tests WAL Handling `core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java`	Removed `skipOnWalRun()` calls throughout test methods and deleted the helper method. Added `drainWalQueue()` invocations following test setup and insert paths to flush WAL state.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

PGUtils.java: Dense refactoring with new public API methods and revised array sizing logic requiring careful verification of edge cases (arrayResumePoint, notNullCount, header inclusion semantics)
PGPipelineEntry.java: Sentinel value semantics (-1 vs. 0) and state transitions for array header resend logic require tracing through header send/reset paths
PGArraysTest.java: Verify that drainWalQueue() placement correctly flushes WAL state without masking unrelated test issues; confirm removal of skipOnWalRun() is intentional
runner_psycopg3.py: Validate that binary and text mode loops execute independently and connection cleanup is correct

Suggested labels

Bug

Suggested reviewers

bluestreak01

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.38% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing protocol corruption when sending arrays from server to client, which aligns with the core changes in PGPipelineEntry.java and PGUtils.java.
Description check	✅ Passed	The description provides a reproducible Python script demonstrating the protocol corruption issue when sending arrays via psycopg with binary=True cursor, directly relating to the changeset's purpose.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

before it was conflated with header sending

skipOnWal() was designed for tests that don't touch physical tables, where running in both WAL and non-WAL modes would be redundant. However, since WAL mode is now randomized, a test marked with skipOnWal() may never execute if the randomizer picks WAL mode.

jerrinot · 2025-11-26T12:14:47Z

@coderabbitai review please

coderabbitai · 2025-11-26T12:15:02Z

@jerrinot: I'll review the changes in this PR that fix the protocol corruption issue when sending arrays from server to client.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java (1)

236-247: Remove duplicate comments.

Lines 239-240 and 244-246 contain duplicated "take rough upper estimate based on the string length" comments. This appears to be a copy-paste artifact.

             case ColumnType.STRING -> {
                 final CharSequence strValue = record.getStrA(columnIndex);
-                // take rough upper estimate based on the string length
                 yield strValue == null ? Integer.BYTES : Integer.BYTES + 3L * strValue.length();
-                // take rough upper estimate based on the string length
             }
             case ColumnType.SYMBOL -> {
                 final CharSequence symValue = record.getSymA(columnIndex);
-                // take rough upper estimate based on the string length
                 yield symValue == null ? Integer.BYTES : Integer.BYTES + 3L * symValue.length();
-                // take rough upper estimate based on the string length
             }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9968b7c and 51db7ba.

📒 Files selected for processing (5)

compat/src/test/python/runner_psycopg3.py (2 hunks)
compat/src/test/resources/test_cases.yaml (25 hunks)
core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java (5 hunks)
core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java (4 hunks)
core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java (8 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-04T09:16:27.366Z

Learnt from: jerrinot
Repo: questdb/questdb PR: 5996
File: core/src/test/java/io/questdb/test/cutlass/http/line/LineHttpSenderTest.java:350-356
Timestamp: 2025-08-04T09:16:27.366Z
Learning: In QuestDB ILP tests, maximum dimension values like `(1 << 28) - 1` are intentionally hard-coded instead of using constants like `ArrayView.DIM_MAX_LEN` to serve as "canaries" that will cause test failures if the API limits change, ensuring developers are aware of breaking changes to the ILP API.

Applied to files:

core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java

🧬 Code graph analysis (2)

compat/src/test/python/runner_psycopg3.py (3)

compat/src/test/python/runner_asyncpg.py (1)

run_test (98-158)

compat/src/test/python/runner_psycopg2.py (1)

run_test (98-131)

core/rust/qdb-sqllogictest/sqllogictest-engines/src/external.rs (1)

connect (68-82)

core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java (1)

core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java (1)

PGUtils (40-307)

🪛 Ruff (0.14.5)

compat/src/test/python/runner_psycopg3.py

156-156: Possible hardcoded password assigned to argument: "password"

(S106)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)

GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
GitHub Check: New pull request (Hosted Running tests on windows-other-2)
GitHub Check: New pull request (Hosted Running tests on windows-other-1)
GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
GitHub Check: New pull request (Hosted Running tests on mac-other)
GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
GitHub Check: New pull request (Hosted Running tests on mac-cairo)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
GitHub Check: New pull request (Hosted Running tests on mac-griffin)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
GitHub Check: New pull request (Check Changes Check changes)
GitHub Check: build
GitHub Check: build

🔇 Additional comments (13)

compat/src/test/python/runner_psycopg3.py (2)

99-103: LGTM! Clean implementation of binary mode support.

The addition of the binary parameter to run_test and its propagation to cursor creation correctly enables dual-mode testing.

141-161: Dual-mode testing implementation is well-structured.

The nested loop structure correctly runs all tests in both binary and text modes, with proper connection isolation per iteration. Creating a new connection for each test iteration ensures clean test state.

Regarding the hardcoded password at line 156: while static analysis flags this, it's acceptable in test code using QuestDB's default credentials.

compat/src/test/resources/test_cases.yaml (1)

5-625: LGTM! Consistent formatting improves readability.

The changes throughout this file are purely cosmetic, adding consistent spacing to array and list elements. No test semantics or expected values have been altered.

core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java (1)

98-98: LGTM! Consistent WAL handling across tests.

The drainWalQueue() calls are correctly placed after data modifications (inserts/updates) and before queries. This ensures WAL data is flushed before reading, enabling these tests to run correctly in both WAL and non-WAL modes rather than being skipped.

Also applies to: 117-117, 366-366, 388-388, 417-418, 472-473, 520-522, 597-598, 619-620, 636-637, 694-694

core/src/main/java/io/questdb/cutlass/pgwire/PGUtils.java (4)

61-73: LGTM! Clean separation of header and resume size calculations.

The new API correctly separates:

calculateArrayColBinSizeIncludingHeader for initial array serialization

calculateArrayResumeColBinSize for resumed partial sends

The nullCount derivation cardinality - notNullCount is correct.

170-180: Correct sentinel-based size calculation.

The logic properly handles the three states:

-1: Initial send - includes header size, counts all elements

0: After header sent - no header, resume from element 0

>0: Partial resume - no header, resume from element N

Math.max(0, arrayResumePoint) correctly normalizes -1 to 0 for element counting while preserving the header inclusion check at line 176.

259-265: Header size calculation matches serialization.

The formula correctly accounts for:

4 fixed Integer.BYTES fields (size, dim count, has-nulls flag, component type)

2 × Integer.BYTES per dimension (length + lower bound)

This matches the serialization in PGPipelineEntry.outColBinArr.

189-197: Clean refactor to switch expression.

The logic is preserved while improving readability. The explicit AssertionError for unsupported element types is appropriate given the assertion at lines 167-169 that limits to DOUBLE/LONG.

core/src/main/java/io/questdb/cutlass/pgwire/PGPipelineEntry.java (5)

173-173: Critical fix: Sentinel -1 distinguishes "header not sent" from "resume at element 0".

This is the core fix for the protocol corruption. Previously, 0 was overloaded to mean both "header not sent yet" and "resume from element 0", causing the header to be re-sent on buffer overflow recovery. The new -1 sentinel unambiguously indicates "header not yet written."

1077-1090: Correct per-column resume point handling.

The logic properly applies outResendArrayFlatIndex only to the column being resumed (outResendColumnIndex), while other array columns use -1 to indicate they haven't been started (need full send with header).

1721-1772: Correct header/element separation with proper state transitions.

The flow is now:

-1 → send header, bookmark, transition to 0

>= 0 → send elements, incrementing index after each with bookmark

On successful completion → reset to -1

On NoSpaceLeftInResponseBufferException → resetToBookmark preserves last good state

The size calculation at line 1729 correctly subtracts Integer.BYTES because the size field value excludes its own size (standard PG wire protocol).

309-312: State reset in close() is complete.

The reset of outResendArrayFlatIndex to -1 at line 309 is part of the complete state cleanup. The corresponding per-record resets (lines 2884-2886) and per-array reset (line 1767) ensure consistent state during normal operation, while close() handles pooling cleanup.

1714-1719: LGTM!

Null array handling (dimCount == 0) correctly short-circuits before any header/element processing, leaving outResendArrayFlatIndex unchanged. This is appropriate since null arrays require no resend state.

jerrinot · 2025-11-26T17:10:24Z

an unrelated failure in a MV test - testSingleTablePeriodView()

glasstiger · 2025-11-27T15:41:34Z

[PR Coverage check]

😞 fail : 24 / 50 (48.00%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/cutlass/pgwire/PGUtils.java	16	42	38.10%
🔵	io/questdb/cutlass/pgwire/PGPipelineEntry.java	8	8	100.00%

fix(pgwire): fix protocol corruption when sending arrays from server …

af0de73

…to client

jerrinot added 7 commits November 26, 2025 09:26

distinct states for sending the first array element

421331c

before it was conflated with header sending

clear outResendArrayFlatIndex

df09897

fix large selects with multiple array columns

458b40a

psycopg3 to exercise both binary and text modes

0bf3d5b

tiny refactoring + clarity of comments

fa8ae0f

refactoring

635c8e4

coderabbitai bot reviewed Nov 26, 2025

View reviewed changes

jerrinot and others added 2 commits November 26, 2025 13:33

dup comment removed

2ea8876

Merge branch 'master' into vi_pg_fix

c1a763e

jerrinot and others added 3 commits November 27, 2025 08:58

Merge branch 'master' into vi_pg_fix

8ae9620

Merge branch 'master' into vi_pg_fix

ccc5f13

typo

bce02fe

jerrinot approved these changes Nov 27, 2025

View reviewed changes

bluestreak01 merged commit 4cce031 into master Nov 27, 2025
42 of 43 checks passed

bluestreak01 deleted the vi_pg_fix branch November 27, 2025 17:16

coderabbitai bot mentioned this pull request Jan 7, 2026

fix(pgwire): allow large varchar column regardless of send buffer size #6603

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pgwire): fix protocol corruption when sending arrays from server to client#6455

fix(pgwire): fix protocol corruption when sending arrays from server to client#6455
bluestreak01 merged 13 commits intomasterfrom
vi_pg_fix

bluestreak01 commented Nov 26, 2025 •

edited by jerrinot

Loading

Uh oh!

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

Review skipped

Uh oh!

jerrinot commented Nov 26, 2025

Uh oh!

coderabbitai bot commented Nov 26, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

jerrinot commented Nov 26, 2025

Uh oh!

glasstiger commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bluestreak01 commented Nov 26, 2025 • edited by jerrinot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

jerrinot commented Nov 26, 2025

Uh oh!

coderabbitai bot commented Nov 26, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

jerrinot commented Nov 26, 2025

Uh oh!

glasstiger commented Nov 27, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bluestreak01 commented Nov 26, 2025 •

edited by jerrinot

Loading

coderabbitai bot commented Nov 26, 2025 •

edited

Loading