Skip to content

feat(core): add a GC-free CharSequence -> int hash map for WalWriter symbols#6371

Merged
bluestreak01 merged 12 commits intomasterfrom
rd_non_gc_symbol_map
Nov 12, 2025
Merged

feat(core): add a GC-free CharSequence -> int hash map for WalWriter symbols#6371
bluestreak01 merged 12 commits intomasterfrom
rd_non_gc_symbol_map

Conversation

@RaphDal
Copy link
Copy Markdown
Contributor

@RaphDal RaphDal commented Nov 10, 2025

This PR introduces a new hash map implementation that maps CharSequence to int without relying on heap allocation or garbage collection. Its primary goal is to replace the existing CharSequenceIntHashMap used in WalWriter, which currently allocates heap-based String objects, leading to GC-churn during WAL tx commits.

The WalWriter implementation is updated to use this new hash map.

Please feel free to comment about improvements I could make to the PR.

Benchmark

In addition, this PR includes a benchmark to measure and compare the performance and memory allocations of the original CharSequenceIntHashMap versus the new SymbolMap.

Here is the raw output from the benchmark:

Benchmark                                                                 Mode  Cnt        Score        Error   Units
WalWriterSymbolBenchmark.CharSequenceIntHashMap                           avgt    3  1191600.777 ± 218457.103   ns/op
WalWriterSymbolBenchmark.CharSequenceIntHashMap:gc.alloc.rate             avgt    3       57.616 ±     10.496  MB/sec
WalWriterSymbolBenchmark.CharSequenceIntHashMap:gc.alloc.rate.norm        avgt    3    72000.708 ±      0.176    B/op
WalWriterSymbolBenchmark.CharSequenceIntHashMap:gc.count                  avgt    3        6.000               counts
WalWriterSymbolBenchmark.CharSequenceIntHashMap:gc.time                   avgt    3        9.000                   ms
WalWriterSymbolBenchmark.DirectCharSequenceIntHashMap                     avgt    3  1052896.721 ±   2210.717   ns/op
WalWriterSymbolBenchmark.DirectCharSequenceIntHashMap:gc.alloc.rate       avgt    3        0.001 ±      0.001  MB/sec
WalWriterSymbolBenchmark.DirectCharSequenceIntHashMap:gc.alloc.rate.norm  avgt    3        0.622 ±      0.001    B/op
WalWriterSymbolBenchmark.DirectCharSequenceIntHashMap:gc.count            avgt    3          ≈ 0               counts

As we can see, we have almost no allocations in the SymbolMap version in comparison to the CharSequenceIntHashMap implementation, we also have a ~12% performance improvements and less jitter.

Tests

There are a few tests to verify that the NonGC hash map implementation works properly:

  • Basic operations: put/get and clear.
  • Insertion-ordered iteration.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 10, 2025

Walkthrough

Replaces CharSequenceIntHashMap with a new off-heap SymbolMap implementation and an AbstractOffsetCharSequenceHashSet foundation; integrates SymbolMap into WalWriter and WalEventWriter, adds unit tests, and a JMH benchmark comparing both implementations.

Changes

Cohort / File(s) Summary
Off-Heap Hash Set Foundation
core/src/main/java/io/questdb/std/AbstractOffsetCharSequenceHashSet.java
New abstract class implementing offset-based open-addressing hash set for CharSequence keys with keyIndex/contains/excludes/probing and an abstract areKeysEquals() for subclasses.
Symbol Map Implementation
core/src/main/java/io/questdb/cairo/wal/SymbolMap.java
New concrete off-heap CharSequence→int map built on the abstract set; manages native memory (writeKey, capacity growth, rehash), exposes get/put/putAt/valueAt/nextOffset/nextOffset(offset)/close and NO_ENTRY_VALUE.
WAL Integration
core/src/main/java/io/questdb/cairo/wal/WalWriter.java, core/src/main/java/io/questdb/cairo/wal/WalEventWriter.java
Replaced CharSequenceIntHashMap usages with SymbolMap: field types, method signatures, iteration (keys() → nextOffset/get(offset)), keyIndex/hashCode usage, putAt calls, and added SymbolMap close/cleanup paths.
Tests & Benchmarks
core/src/test/java/io/questdb/test/cairo/wal/SymbolMapTest.java, benchmarks/src/main/java/org/questdb/WalWriterSymbolBenchmark.java
Added unit tests verifying basic operations, offsets iteration, and clearing; added JMH benchmark comparing CharSequenceIntHashMap vs SymbolMap with off-heap strings and randomized access.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant WalWriter
    participant SymbolMap
    participant OffHeap as Off-Heap Memory

    Client->>WalWriter: addSymbol(charSeq)
    WalWriter->>SymbolMap: keyIndex(charSeq, hash)
    SymbolMap->>SymbolMap: probe/find slot
    alt not found
        SymbolMap-->>WalWriter: negative index
        WalWriter->>SymbolMap: putAt(index, charSeq, value, hash)
        SymbolMap->>OffHeap: writeKey(bytes)
        OffHeap-->>SymbolMap: offset
        SymbolMap->>SymbolMap: store offset & value
    else found
        SymbolMap-->>WalWriter: index (exists)
        WalWriter->>SymbolMap: update value (if needed)
    end

    Client->>WalWriter: iterateSymbols()
    WalWriter->>SymbolMap: offset = nextOffset()
    loop offsets
        SymbolMap->>OffHeap: read key at offset
        SymbolMap-->>WalWriter: key, value
        WalWriter->>SymbolMap: offset = nextOffset(offset)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Pay special attention to:
    • Off-heap memory allocation/resize/free in SymbolMap (writeKey, rehash, close)
    • Correctness of probe/keyIndex logic and areKeysEquals() integration in AbstractOffsetCharSequenceHashSet
    • All WalWriter/WalEventWriter iteration and lifecycle changes (nextOffset/get vs keys()/getQuick)
    • Tests and benchmark correctness for equivalence and memory safety

Suggested labels

storage

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.84% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: introducing a GC-free CharSequence-to-int hash map for WalWriter symbols, which is the core contribution of this PR.
Description check ✅ Passed The description clearly explains the PR's objective to introduce a GC-free hash map implementation for WalWriter, includes benchmark results demonstrating performance improvements, and outlines test coverage.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch rd_non_gc_symbol_map

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@RaphDal
Copy link
Copy Markdown
Contributor Author

RaphDal commented Nov 10, 2025

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 10, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01b6035 and ff736db.

📒 Files selected for processing (6)
  • benchmarks/src/main/java/org/questdb/WalWriterSymbolBenchmark.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/wal/SymbolMap.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/wal/WalEventWriter.java (4 hunks)
  • core/src/main/java/io/questdb/cairo/wal/WalWriter.java (13 hunks)
  • core/src/main/java/io/questdb/std/AbstractOffsetCharSequenceHashSet.java (1 hunks)
  • core/src/test/java/io/questdb/test/cairo/wal/SymbolMapTest.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
  • GitHub Check: New pull request (Check Changes Check changes)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (5)
core/src/main/java/io/questdb/cairo/wal/WalWriter.java (5)

132-132: LGTM: Field type successfully migrated to SymbolMap.

The field declaration cleanly switches from CharSequenceIntHashMap to the new GC-free SymbolMap implementation.


1101-1102: LGTM: SymbolMap instantiation is correct.

The constructor parameters (initial capacity, load factor, and no-entry value) are appropriate for symbol mapping usage.


1316-1316: LGTM: Proper resource management for SymbolMap instances.

The code correctly closes SymbolMap instances in both cleanup paths:

  • General cleanup via freeSymbolMapReaders() called from doClose()
  • Individual removal via removeSymbolMapReader() with null checks and assignment

Also applies to: 1578-1582


1857-1878: LGTM: Correct usage of new SymbolMap API in symbol insertion paths.

The code correctly implements the new API pattern:

  • Explicitly computes hashCode before calling keyIndex
  • Uses keyIndex to probe for existing entries (returns positive for new, negative for existing)
  • Calls putAt with all required parameters including hashCode
  • Retrieves values using valueAt for negative indices

The pattern is consistently applied in both ConversionSymbolMapWriter.putSym0() and RowImpl.putSym0().

Also applies to: 2630-2656


1923-1929: LGTM: Correct implementation of offset-based iteration pattern.

The new iteration approach using nextOffset() and nextOffset(int) correctly traverses the SymbolMap:

  • Starts with nextOffset() to get first offset
  • Continues with nextOffset(offset) to get subsequent offsets
  • Terminates when offset < 0
  • Uses get(offset) to retrieve keys and get(CharSequence) to retrieve values

This offset-based traversal is consistent with the insertion-ordered iteration requirement mentioned in the PR objectives.

@ideoma
Copy link
Copy Markdown
Collaborator

ideoma commented Nov 11, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 11, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 177 / 179 (98.88%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/wal/DirectCharSequenceIntHashMap.java 155 157 98.73%
🔵 io/questdb/cairo/wal/WalEventWriter.java 2 2 100.00%
🔵 io/questdb/cairo/wal/WalWriter.java 20 20 100.00%

@bluestreak01 bluestreak01 merged commit 5d19555 into master Nov 12, 2025
37 checks passed
@bluestreak01 bluestreak01 deleted the rd_non_gc_symbol_map branch November 12, 2025 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants