Skip to content

perf(http): garbage-free HTTP header parser#6397

Merged
bluestreak01 merged 2 commits intomasterfrom
jh_perf_http_header_parser_garbage_free
Nov 15, 2025
Merged

perf(http): garbage-free HTTP header parser#6397
bluestreak01 merged 2 commits intomasterfrom
jh_perf_http_header_parser_garbage_free

Conversation

@jerrinot
Copy link
Copy Markdown
Contributor

This change reduces the allocation rate when parsing HTTP headers. Previously, header parser would carefully pool HTTP headers, only to store them in a map which implicitly created a defensive copy of all keys.

This change extends the map with a method to store keys without creating a defensive copy.

This change reduces the allocation rate when parsing HTTP headers.
Previously, header parser would carefully pool HTTP headers,
only to store them in a map which implicitly created a defensive
copy of all keys.

This change extends the map with a method to store keys
without creating a defensive copy.
@jerrinot jerrinot added the Performance Performance improvements label Nov 14, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 14, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces immutable key storage mechanisms across UTF-8 collection classes by adding a new putImmutable API, converting key storage types from Utf8String to Utf8Sequence, and updating dependent code in HTTP processing to use immutable-mode storage for headers.

Changes

Cohort / File(s) Summary
Hash Map Infrastructure
core/src/main/java/io/questdb/std/AbstractLowerCaseUtf8SequenceHashSet.java, core/src/main/java/io/questdb/std/LowerCaseUtf8SequenceObjHashMap.java
Converted key storage from Utf8String to Utf8Sequence type. Added new public putImmutable(Utf8Sequence, V) method to store exact key instances without defensive copies. Updated keys() return type to ObjList<Utf8Sequence>. Modified internal putAt implementations to delegate through new immutable storage path.
HTTP Processing Updates
core/src/main/java/io/questdb/cutlass/http/HttpHeaderParser.java, core/src/main/java/io/questdb/cutlass/http/processors/LineHttpTudCache.java
Changed header parsing to use putImmutable() for storing header values. Updated type annotations in LineHttpTudCache from ObjList<Utf8String> to ObjList<Utf8Sequence> for keys collection across multiple methods.
Test Coverage
core/src/test/java/io/questdb/test/std/LowerCaseUtf8SequenceObjHashMapTest.java
Added testPutImmutableRetainsKeyIdentity() test method to verify key identity preservation, key retrieval, and map state after removal operations with putImmutable API.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Key areas requiring attention:
    • Lifecycle guarantees and thread-safety implications of storing exact Utf8Sequence references without copies in putImmutable
    • Verification that all usages of the modified keys() return type properly handle Utf8Sequence instead of Utf8String
    • Consistency of immutable storage semantics across putAt delegation paths in LowerCaseUtf8SequenceObjHashMap
    • Test coverage adequacy for identity preservation and interaction with map operations

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.76% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: improving HTTP header parser performance by reducing garbage allocation through eliminating defensive copies of header keys.
Description check ✅ Passed The description is directly related to the changeset, explaining the performance optimization goal and the specific technical approach taken to achieve it.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jerrinot
Copy link
Copy Markdown
Contributor Author

@coderabbitai pls review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 14, 2025

@jerrinot, I'll review the changes in this PR focusing on the garbage-free HTTP header parser implementation and the optimization to avoid defensive copies when storing keys in the map.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 813dc07 and 2419af7.

📒 Files selected for processing (5)
  • core/src/main/java/io/questdb/cutlass/http/HttpHeaderParser.java (1 hunks)
  • core/src/main/java/io/questdb/cutlass/http/processors/LineHttpTudCache.java (4 hunks)
  • core/src/main/java/io/questdb/std/AbstractLowerCaseUtf8SequenceHashSet.java (1 hunks)
  • core/src/main/java/io/questdb/std/LowerCaseUtf8SequenceObjHashMap.java (4 hunks)
  • core/src/test/java/io/questdb/test/std/LowerCaseUtf8SequenceObjHashMapTest.java (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
  • GitHub Check: New pull request (Check Changes Check changes)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (6)
core/src/main/java/io/questdb/cutlass/http/HttpHeaderParser.java (1)

332-332: LGTM! Optimization correctly reduces allocations.

The switch to putImmutable eliminates defensive key copies during header storage. The lifecycle is properly managed: headers.clear() (line 116) removes all references before pooled DirectUtf8String instances can be reused, satisfying the immutability guarantee documented in putImmutable.

core/src/main/java/io/questdb/cutlass/http/processors/LineHttpTudCache.java (1)

86-86: LGTM! Type changes align with the immutable key storage refactor.

The widening of the key type from Utf8String to Utf8Sequence is safe and consistent with the broader changes across the map infrastructure. The usage at line 182 properly creates new instances via Utf8String.newInstance (line 169) before storage, satisfying lifecycle requirements.

Also applies to: 104-104, 119-119, 187-187

core/src/main/java/io/questdb/std/AbstractLowerCaseUtf8SequenceHashSet.java (1)

40-40: LGTM! Key storage type widened to support immutable insertions.

Changing the backing storage from Utf8String[] to Utf8Sequence[] enables the map to store any Utf8Sequence implementation directly, which is essential for the garbage-free optimization introduced by putImmutable.

Also applies to: 51-51

core/src/test/java/io/questdb/test/std/LowerCaseUtf8SequenceObjHashMapTest.java (1)

206-231: LGTM! Comprehensive test coverage for identity-preserving behavior.

The test properly verifies that putImmutable retains the exact key instance (via assertSame at lines 219-220), exercises both insertion paths (DirectUtf8Sink and Utf8String), and confirms correct behavior on removal. This provides strong validation for the garbage-free optimization.

core/src/main/java/io/questdb/std/LowerCaseUtf8SequenceObjHashMap.java (2)

88-109: LGTM! Well-designed API for garbage-free key storage.

The new putImmutable method is properly documented with clear lifecycle requirements (lines 94-99), and the implementation in putImmutableAt correctly stores the key instance without defensive copying. This achieves the stated performance goal of reducing allocation rates.

Also applies to: 131-145


39-39: LGTM! Type changes enable polymorphic key storage.

The list type change from ObjList<Utf8String> to ObjList<Utf8Sequence> (line 39) and corresponding keys() return type update (line 67) properly support storing any Utf8Sequence implementation, enabling the garbage-free optimization while maintaining type safety.

Also applies to: 67-67

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 14, 2025

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 14, 2025

Caution

The CodeRabbit agent failed during execution: Clone operation failed

@bluestreak01
Copy link
Copy Markdown
Member

@CodeRabbit try again, something went wrong? does this change really reduce allocations?

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 15, 2025

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Examine the putImmutable implementation and compare with put/putAt
cat core/src/main/java/io/questdb/std/LowerCaseUtf8SequenceObjHashMap.java

Length of output: 5932


🏁 Script executed:

#!/bin/bash
# Check the HttpHeaderParser change
cat core/src/main/java/io/questdb/cutlass/http/HttpHeaderParser.java | grep -A5 -B5 "putImmutable"

Length of output: 445


🏁 Script executed:

#!/bin/bash
# Look at AbstractLowerCaseUtf8SequenceHashSet to understand the key storage change
cat core/src/main/java/io/questdb/std/AbstractLowerCaseUtf8SequenceHashSet.java

Length of output: 5978


@bluestreak01, yes, this change definitely reduces allocations! Let me explain how:

Allocation Reduction Analysis

Before (using put()):

public boolean putAt(int index, Utf8Sequence key, V value) {
    Utf8String onHeapKey = Utf8String.newInstance(key);  // ❌ ALLOCATION
    return putImmutableAt(index, onHeapKey, value);
}

Every time a header was stored, the map created a defensive copy by allocating a new Utf8String instance.

After (using putImmutable()):

public boolean putImmutable(Utf8Sequence key, V value) {
    return putImmutableAt(keyIndex(key), key, value);
}

private boolean putImmutableAt(int index, Utf8Sequence key, V value) {
    // ...
    keys[index] = key;  // ✅ NO ALLOCATION - stores key directly
    // ...
}

The key is now stored directly without creating a copy, eliminating one allocation per header.

In HttpHeaderParser:

// Changed from:
headers.put(headerName, ...)
// To:
headers.putImmutable(headerName, ...)

Since HTTP requests typically have 5-15 headers, this eliminates 5-15 allocations per request during header parsing. The change is safe because headerName comes from a pool and its lifecycle is properly managed (not reused until the request completes).

This is a clean performance optimization that reduces GC pressure in the hot path of HTTP request processing.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@bluestreak01
Copy link
Copy Markdown
Member

@CodeRabbit is there a regression where this map is used elsewhere and there is a risk of storing mutable copies ?

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 15, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all files that use LowerCaseUtf8SequenceObjHashMap
echo "=== Files using LowerCaseUtf8SequenceObjHashMap ==="
rg -n --type=java "LowerCaseUtf8SequenceObjHashMap" -g '!**/test/**'

echo -e "\n=== All put() method calls on this map type ==="
rg -nP --type=java '\.put\(' -g '!**/test/**' -C2 | rg -A2 -B2 "LowerCaseUtf8SequenceObjHashMap|\.put\("

echo -e "\n=== All putImmutable() method calls ==="
rg -n --type=java "\.putImmutable\(" -g '!**/test/**' -C3

Length of output: 270281


🏁 Script executed:

#!/bin/bash
# Look specifically at LineHttpTudCache to see how it uses the map
echo "=== LineHttpTudCache.java full content (focusing on map usage) ==="
cat core/src/main/java/io/questdb/cutlass/http/processors/LineHttpTudCache.java

echo -e "\n=== Search for .put( calls in HttpHeaderParser ==="
rg -n "headers\.put\(" core/src/main/java/io/questdb/cutlass/http/HttpHeaderParser.java -C2

echo -e "\n=== Search for .put( calls in LineHttpTudCache ==="
rg -n "tableUpdateDetails\.put\(" core/src/main/java/io/questdb/cutlass/http/processors/LineHttpTudCache.java -C2

Length of output: 10655


🏁 Script executed:

#!/bin/bash
# Check HttpHeaderParser to understand the lifecycle of headerName and headers map
echo "=== HttpHeaderParser - checking headerName lifecycle and headers usage ==="
cat core/src/main/java/io/questdb/cutlass/http/HttpHeaderParser.java

Length of output: 35503


@bluestreak01 Short answer: with the current call sites, no regression found.

  • HttpHeaderParser: We only switched this site to putImmutable(headerName, …). headerName is a DirectUtf8String taken from csPool and not reused after insertion; the headers map is cleared before the next request. That satisfies the “no-mutation while stored” contract, so it’s safe for by-reference storage.

  • LineHttpTudCache: It uses tableUpdateDetails.putAt(key, nameUtf8, tud) with nameUtf8 = Utf8String.newInstance(parser.getMeasurementName()). That key object isn’t mutated afterwards. Even if putAt internally stores the key by reference, this site remains safe.

I also scanned the tree and didn’t find any other non-test usages or any other putImmutable(...) calls. So the only place we store a pooled/mutable DirectUtf8String by reference is HttpHeaderParser, and its lifecycle is contained to a single request, with no reuse before headers.clear().

To prevent future footguns, I’d document (in the map javadoc and call sites) that putImmutable must only be used when:

  • the key object will not be mutated (no .of(...) reassignments) while present in the map, and
  • the backing memory for DirectUtf8String keys remains valid for the map’s lifetime.

If you want, I can add a brief comment near headers.putImmutable(...) and in LowerCaseUtf8SequenceObjHashMap.putImmutable(...) reinforcing that contract.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 19 / 20 (95.00%)

file detail

path covered line new line coverage
🔵 io/questdb/std/LowerCaseUtf8SequenceObjHashMap.java 13 14 92.86%
🔵 io/questdb/cutlass/http/processors/LineHttpTudCache.java 4 4 100.00%
🔵 io/questdb/std/AbstractLowerCaseUtf8SequenceHashSet.java 1 1 100.00%
🔵 io/questdb/cutlass/http/HttpHeaderParser.java 1 1 100.00%

@bluestreak01 bluestreak01 merged commit dc3d464 into master Nov 15, 2025
41 checks passed
@bluestreak01 bluestreak01 deleted the jh_perf_http_header_parser_garbage_free branch November 15, 2025 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants