Skip to content

perf(sql): remove litter generated by ordered map rehashing#6399

Merged
bluestreak01 merged 2 commits intomasterfrom
puzpuzpuz_simplify_ordered_map2
Nov 16, 2025
Merged

perf(sql): remove litter generated by ordered map rehashing#6399
bluestreak01 merged 2 commits intomasterfrom
puzpuzpuz_simplify_ordered_map2

Conversation

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@puzpuzpuz puzpuzpuz commented Nov 15, 2025

OrderedMap was creating a DirectIntList instance each time it was resized. This patch gets rid of the litter and also removes redundant address calculations for offset/hash code low.

Microbencmarks

Before:

Benchmark                             (size)  Mode  Cnt   Score    Error  Units
MapReadLongBenchmark.testOrderedMap     5000  avgt    3  16.467 ±  0.084  ns/op
MapReadLongBenchmark.testOrderedMap    50000  avgt    3  15.075 ±  0.209  ns/op
MapReadLongBenchmark.testOrderedMap   500000  avgt    3  19.724 ±  0.346  ns/op
MapReadLongBenchmark.testOrderedMap  5000000  avgt    3  90.479 ± 13.519  ns/op

Benchmark                              (size)  Mode  Cnt    Score   Error  Units
MapWriteLongBenchmark.testOrderedMap     5000  avgt    3   17.023 ± 0.204  ns/op
MapWriteLongBenchmark.testOrderedMap    50000  avgt    3   16.193 ± 0.134  ns/op
MapWriteLongBenchmark.testOrderedMap   500000  avgt    3   21.427 ± 0.728  ns/op
MapWriteLongBenchmark.testOrderedMap  5000000  avgt    3  101.549 ± 3.921  ns/op

Benchmark                           Mode  Cnt    Score    Error  Units
OrderedMapMergeBenchmark.testMerge  avgt    3  108.692 ± 14.597  ms/op

After:

Benchmark                             (size)  Mode  Cnt   Score   Error  Units
MapReadLongBenchmark.testOrderedMap     5000  avgt    3  14.605 ± 0.324  ns/op
MapReadLongBenchmark.testOrderedMap    50000  avgt    3  13.804 ± 0.304  ns/op
MapReadLongBenchmark.testOrderedMap   500000  avgt    3  17.277 ± 2.727  ns/op
MapReadLongBenchmark.testOrderedMap  5000000  avgt    3  86.909 ± 4.518  ns/op

Benchmark                              (size)  Mode  Cnt   Score   Error  Units
MapWriteLongBenchmark.testOrderedMap     5000  avgt    3  14.935 ± 0.235  ns/op
MapWriteLongBenchmark.testOrderedMap    50000  avgt    3  14.477 ± 0.147  ns/op
MapWriteLongBenchmark.testOrderedMap   500000  avgt    3  19.356 ± 2.241  ns/op
MapWriteLongBenchmark.testOrderedMap  5000000  avgt    3  94.322 ± 7.029  ns/op

Benchmark                           Mode  Cnt   Score   Error  Units
OrderedMapMergeBenchmark.testMerge  avgt    3  99.361 ± 4.071  ms/op

@puzpuzpuz puzpuzpuz self-assigned this Nov 15, 2025
@puzpuzpuz puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Nov 15, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 15, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR refactors OrderedMap's internal memory management from using DirectIntList for offset storage to raw off-heap memory buffers managed via Unsafe. The changes systematically replace pointer addressing schemes, introduce offset compression methods, and update memory allocation and access patterns throughout the class.

Changes

Cohort / File(s) Summary
OrderedMap memory management refactoring
core/src/main/java/io/questdb/cairo/map/OrderedMap.java
Replaces DirectIntList-based offset storage with raw Unsafe-managed buffers (offsetsAddr); introduces compressOffset/decompressOffset methods for offset handling; updates heapStart/heapOffset pointers to heapAddr; modifies memory allocation/resizing/freeing paths to use Unsafe.malloc/realloc/free; adjusts all offset storage, retrieval, and address arithmetic throughout key/value operations and probing logic.
Test formatting
core/src/test/java/io/questdb/test/cairo/map/OrderedMapTest.java
Adds blank line after class opening brace; no logic changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–75 minutes

  • Memory safety considerations: Extensive Unsafe operations with malloc/realloc/free patterns require careful verification of pointer lifecycle and bounds checking.
  • Systematic pointer refactoring: All offset addressing logic (compressOffset/decompressOffset, offset probing, rehashing) must be validated for correctness across multiple methods.
  • Off-heap layout semantics: Changes to heapAddr, appendAddr, and startAddr usage in reads/writes, value creation, and cloning paths need consistent verification.
  • Boundary condition coverage: Memory resizing and reallocation paths should be traced to ensure no off-by-one or overflow issues.

Suggested reviewers

  • bluestreak01

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: removing unnecessary DirectIntList allocations during OrderedMap rehashing for performance improvement.
Description check ✅ Passed The PR description clearly explains the problem (DirectIntList allocation overhead), the solution (removing it and optimizing address calculations), and provides comprehensive benchmark results demonstrating improvements.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 15, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/main/java/io/questdb/cairo/map/OrderedMap.java (1)

96-99: Offset compression overflow issue confirmed — entries silently lost at 16+ GiB heap sizes

The analysis in the review comment is correct. The code allows heaps up to ~32 GiB (via MAX_HEAP_SIZE validation in resize, line 594), but compressOffset (line 393–395) overflows into negative values for offsets exceeding ~16 GiB:

  • Integer.MAX_VALUE = 2,147,483,647, so max safe offset is 2,147,483,646 * 8 ≈ 17.2 GiB
  • For larger offsets, (offset >> 3) + 1 wraps to a negative int on cast
  • During rehash (line 568), the condition while (Unsafe.getUnsafe().getInt(newOffsetAddr) > 0) skips negative rawOffset values
  • Entries are never copied to the new hash table → silent data loss
  • Rehash (line 572) propagates corrupted rawOffset values verbatim

Add overflow guard in compressOffset and treat rawOffset as unsigned in decompressOffset:

     private static int compressOffset(long offset) {
+        long compressed = (offset >> 3) + 1L;
+        if (compressed > 0xFFFFFFFFL) {
+            throw LimitOverflowException.instance().put("heap offset overflow");
+        }
         return (int) ((offset >> 3) + 1);
     }
     
     private static long decompressOffset(int rawOffset) {
-        return ((long) rawOffset - 1) << 3;
+        return (Integer.toUnsignedLong(rawOffset) - 1L) << 3;
     }

Without this, maps growing beyond ~16 GiB corrupt their hash table state silently instead of failing predictably.

🧹 Nitpick comments (1)
core/src/main/java/io/questdb/cairo/map/OrderedMap.java (1)

802-879: Key / VarSizeKey: capacity checks and var‑size encoding are coherent

For Key and VarSizeKey:

  • Key.reset() seeds startAddr = kPos and appendAddr = kPos + keyOffset, so var‑size keys reserve space for the 4‑byte length header (keyOffset == VAR_KEY_HEADER_SIZE) and fixed‑size keys start at kPos.
  • checkCapacity(requiredKeySize) always considers the upcoming write plus the full valueSize. For var‑size keys (where it’s called per put*), this ensures that by the time commit()/createValue() run, appendAddr + valueSize <= heapLimit. On overflow it calls resize(requiredSize, appendAddr) and adjusts startAddr/appendAddr by the returned delta.
  • VarSizeKey.commit() writes the key length into the reserved header and uses it for hashing (hash() over [startAddr + keyOffset, len]) and equality checks.
  • All var‑size encodings (putStr*, putVarchar, putBin, putArray, putInterval, etc.):
    • Check capacity based on their byte count.
    • Lay out data starting at appendAddr with length prefixes where appropriate.
    • Advance appendAddr by the exact number of bytes written.
  • VarSizeKey.eq(long offset) correctly:
    • Treats offset as a relative heap offset.
    • Compares the stored length headers at heapAddr + offset vs. startAddr.
    • Uses Vect.memeq on the key body [header + keyOffset, header + keyOffset + len).

Overall, the var‑size key path remains internally consistent with the refactored heap addressing and should behave the same as before, modulo the offset‑compression limit noted earlier.

You might consider adding a brief comment near checkCapacity explaining that it is intended to be called per incremental var‑size write with requiredKeySize == bytesToAppend, as this is non‑obvious from the signature and helps future maintainers avoid misusing it.

Also applies to: 886-907, 914-925, 941-955, 967-979, 981-993, 1000-1005, 1007-1013, 1015-1028, 1030-1048, 1050-1055, 1057-1072, 1074-1084, 1086-1113, 1120-1126, 1128-1138, 1140-1149

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6ee66e and d99ad8e.

📒 Files selected for processing (2)
  • core/src/main/java/io/questdb/cairo/map/OrderedMap.java (29 hunks)
  • core/src/test/java/io/questdb/test/cairo/map/OrderedMapTest.java (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-04T09:16:27.366Z
Learnt from: jerrinot
Repo: questdb/questdb PR: 5996
File: core/src/test/java/io/questdb/test/cutlass/http/line/LineHttpSenderTest.java:350-356
Timestamp: 2025-08-04T09:16:27.366Z
Learning: In QuestDB ILP tests, maximum dimension values like `(1 << 28) - 1` are intentionally hard-coded instead of using constants like `ArrayView.DIM_MAX_LEN` to serve as "canaries" that will cause test failures if the API limits change, ensuring developers are aware of breaking changes to the ILP API.

Applied to files:

  • core/src/main/java/io/questdb/cairo/map/OrderedMap.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
  • GitHub Check: New pull request (Check Changes Check changes)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (4)
core/src/test/java/io/questdb/test/cairo/map/OrderedMapTest.java (1)

82-84: Formatting-only change in test class is fine

The added blank line after the class declaration is harmless and keeps the file readable; no test behavior is affected.

core/src/main/java/io/questdb/cairo/map/OrderedMap.java (3)

176-193: Heap/offset allocation, clear, close, and reopen logic look consistent

The refactor to track heapAddr/heapLimit/kPos and raw offsetsAddr appears sound:

  • Constructor allocates both heap (heapAddr = kPos = Unsafe.malloc(...)) and offsets (offsetsAddr = Unsafe.malloc(...)) and zero‑fills the offsets block.
  • clear() correctly resets kPos to heapAddr, resets free, size, and nResizes, and memset’s the offset table without reallocating.
  • close() frees offsetsAddr and heap using the recorded sizes, zeros out pointers and capacity fields, and leaves the object in a reusable but "closed" state (heapAddr == 0).
  • reopen(..) / reopen() guard on heapAddr == 0 and route through restoreInitialCapacity(), which reallocates both the heap and offsets blocks (handling the 0 case via realloc) and then calls clear().

This keeps the invariants (heapAddr != 0 iff open, offsets sized to keyCapacity << 3) coherent across lifecycle transitions.

Also applies to: 260-267, 269-281, 330-345, 347-363


283-295: Probe/insert path and heap accounting maintain 8‑byte alignment

The hot path for lookups and inserts—valueAt, probe0, probeReadOnly, asNew, getAppendOffset, getHeapSize, getUsedHeapSize, and resize—is internally consistent after the refactor:

  • All entry addresses stored in offsetsAddr are based on heapAddr and are 8‑byte aligned (kPos = Bytes.align8b(keyWriter.appendAddr + valueSize) and alignedEntrySize = Bytes.align8b(entrySize) in merge code).
  • getHeapSize() and getUsedHeapSize() compute sizes as heapLimit - heapAddr and kPos - heapAddr, which match the base/root pointer semantics.
  • probe0/probeReadOnly interpret decompressed offsets as relative to heapAddr and reconstruct startAddr via heapAddr + offset, matching how asNew stores compressed offsets (compressOffset(keyWriter.startAddr - heapAddr)).
  • resize() grows the heap in terms of bytes (kCapacity), enforces an upper bound, and returns a delta that callers apply to kPos (and Key.startAddr/appendAddr via checkCapacity), so existing in‑progress keys remain valid after reallocation.

Assuming the offset compression issue is addressed, this layout should avoid misalignment and keep heap usage reporting correct.

Also applies to: 307-310, 317-319, 379-386, 401-412, 514-539, 581-608


627-646: FixedSizeKey: direct Unsafe writes and hashing align with new heap layout

Within FixedSizeKey:

  • commit() now asserts that appendAddr has not overshot startAddr + keySize, which guards misuse of the put* methods.
  • copyFrom/copyFromRawKey copy raw key bytes from another key via Vect.memcpy, and the new signature copyFromRawKey(srcFixedKey.startAddr, keySize) is compatible with cross‑map copies (start addresses are absolute).
  • All put* primitives (putBool, putByte, putChar, putInt, putLong, putLong128, putLong256, putInterval, etc.) write via Unsafe at appendAddr and then increment appendAddr by the exact size of the written type, matching the static keySize computed at construction.
  • hash() uses Hash.hashMem64(startAddr, keySize), and eq(long offset) compares against heapAddr + offset, which lines up with the new compressed offset representation.

Given that FixedSizeKey.init() calls checkCapacity(keySize) before any writes, this path looks correct and safe under the new addressing scheme.

Also applies to: 671-679, 693-707, 721-725, 734-745, 748-754, 757-769, 771-779, 786-794, 796-800

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 207 / 219 (94.52%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/map/OrderedMap.java 207 219 94.52%

@bluestreak01
Copy link
Copy Markdown
Member

there is valid nitpick, 16GiB RAM is achievable. We should error out in case of overflow. What do you think?

@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

there is valid nitpick, 16GiB RAM is achievable. We should error out in case of overflow. What do you think?

@bluestreak01 that nitpick is not valid since we have this check in the rehash() method:

    private void rehash(long newKeyCapacity) {
        if (newKeyCapacity > MAX_SAFE_INT_POW_2) {
            throw CairoException.nonCritical().put("map capacity overflow");
        }

There is no need to check the offsets each time we compress/decompress them - instead we do the check once when growing the hash table.

@bluestreak01 bluestreak01 merged commit e88abf7 into master Nov 16, 2025
41 checks passed
@bluestreak01 bluestreak01 deleted the puzpuzpuz_simplify_ordered_map2 branch November 16, 2025 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants