feat(core): add a GC-free CharSequence -> int hash map for WalWriter symbols#6371
feat(core): add a GC-free CharSequence -> int hash map for WalWriter symbols#6371bluestreak01 merged 12 commits intomasterfrom
CharSequence -> int hash map for WalWriter symbols#6371Conversation
WalkthroughReplaces CharSequenceIntHashMap with a new off-heap SymbolMap implementation and an AbstractOffsetCharSequenceHashSet foundation; integrates SymbolMap into WalWriter and WalEventWriter, adds unit tests, and a JMH benchmark comparing both implementations. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant WalWriter
participant SymbolMap
participant OffHeap as Off-Heap Memory
Client->>WalWriter: addSymbol(charSeq)
WalWriter->>SymbolMap: keyIndex(charSeq, hash)
SymbolMap->>SymbolMap: probe/find slot
alt not found
SymbolMap-->>WalWriter: negative index
WalWriter->>SymbolMap: putAt(index, charSeq, value, hash)
SymbolMap->>OffHeap: writeKey(bytes)
OffHeap-->>SymbolMap: offset
SymbolMap->>SymbolMap: store offset & value
else found
SymbolMap-->>WalWriter: index (exists)
WalWriter->>SymbolMap: update value (if needed)
end
Client->>WalWriter: iterateSymbols()
WalWriter->>SymbolMap: offset = nextOffset()
loop offsets
SymbolMap->>OffHeap: read key at offset
SymbolMap-->>WalWriter: key, value
WalWriter->>SymbolMap: offset = nextOffset(offset)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
benchmarks/src/main/java/org/questdb/WalWriterSymbolBenchmark.java(1 hunks)core/src/main/java/io/questdb/cairo/wal/SymbolMap.java(1 hunks)core/src/main/java/io/questdb/cairo/wal/WalEventWriter.java(4 hunks)core/src/main/java/io/questdb/cairo/wal/WalWriter.java(13 hunks)core/src/main/java/io/questdb/std/AbstractOffsetCharSequenceHashSet.java(1 hunks)core/src/test/java/io/questdb/test/cairo/wal/SymbolMapTest.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (5)
core/src/main/java/io/questdb/cairo/wal/WalWriter.java (5)
132-132: LGTM: Field type successfully migrated toSymbolMap.The field declaration cleanly switches from
CharSequenceIntHashMapto the new GC-freeSymbolMapimplementation.
1101-1102: LGTM: SymbolMap instantiation is correct.The constructor parameters (initial capacity, load factor, and no-entry value) are appropriate for symbol mapping usage.
1316-1316: LGTM: Proper resource management forSymbolMapinstances.The code correctly closes
SymbolMapinstances in both cleanup paths:
- General cleanup via
freeSymbolMapReaders()called fromdoClose()- Individual removal via
removeSymbolMapReader()with null checks and assignmentAlso applies to: 1578-1582
1857-1878: LGTM: Correct usage of newSymbolMapAPI in symbol insertion paths.The code correctly implements the new API pattern:
- Explicitly computes
hashCodebefore callingkeyIndex- Uses
keyIndexto probe for existing entries (returns positive for new, negative for existing)- Calls
putAtwith all required parameters includinghashCode- Retrieves values using
valueAtfor negative indicesThe pattern is consistently applied in both
ConversionSymbolMapWriter.putSym0()andRowImpl.putSym0().Also applies to: 2630-2656
1923-1929: LGTM: Correct implementation of offset-based iteration pattern.The new iteration approach using
nextOffset()andnextOffset(int)correctly traverses theSymbolMap:
- Starts with
nextOffset()to get first offset- Continues with
nextOffset(offset)to get subsequent offsets- Terminates when offset < 0
- Uses
get(offset)to retrieve keys andget(CharSequence)to retrieve valuesThis offset-based traversal is consistent with the insertion-ordered iteration requirement mentioned in the PR objectives.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
core/src/test/java/io/questdb/test/cairo/wal/SymbolMapTest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/questdb/cairo/wal/DirectCharSequenceIntHashMap.java
Outdated
Show resolved
Hide resolved
[PR Coverage check]😍 pass : 177 / 179 (98.88%) file detail
|
This PR introduces a new hash map implementation that maps
CharSequencetointwithout relying on heap allocation or garbage collection. Its primary goal is to replace the existing CharSequenceIntHashMap used in WalWriter, which currently allocates heap-based String objects, leading to GC-churn during WAL tx commits.The
WalWriterimplementation is updated to use this new hash map.Please feel free to comment about improvements I could make to the PR.
Benchmark
In addition, this PR includes a benchmark to measure and compare the performance and memory allocations of the original
CharSequenceIntHashMapversus the newSymbolMap.Here is the raw output from the benchmark:
As we can see, we have almost no allocations in the
SymbolMapversion in comparison to theCharSequenceIntHashMapimplementation, we also have a ~12% performance improvements and less jitter.Tests
There are a few tests to verify that the NonGC hash map implementation works properly:
put/getandclear.