Skip to content

fix(core): process memory may not be released on Linux when jemalloc is enabled#6619

Merged
bluestreak01 merged 17 commits intomasterfrom
puzpuzpuz_stuck_jemalloc_threads
Jan 13, 2026
Merged

fix(core): process memory may not be released on Linux when jemalloc is enabled#6619
bluestreak01 merged 17 commits intomasterfrom
puzpuzpuz_stuck_jemalloc_threads

Conversation

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@puzpuzpuz puzpuzpuz commented Jan 9, 2026

Jemalloc background threads were failing to start reliably (~20% failure rate) when QuestDB was launched via questdb.sh. This prevented memory from being released back to the OS properly.

To fix this race, we're switching to QuestDB's own jemalloc fork. See questdb/jemalloc#1 for more details on the race, as well as the fix.

Important note. Previously we were using the latest jemalloc release which is pretty outdated (May 2022). So, along with the fix, we're switching to the latest dev branch commit (June 2025). Other databases also seem to use jemalloc's dev branch, e.g. ClickHouse is also using the latest dev commit.

Other than that, includes the following changes:

  • Use env LD_PRELOAD=... to load jemalloc only in the Java process since it makes no sense to use jemalloc for the questdb.sh bash script

Test Results

  • Before: ~20% failure rate
  • After: 0% failure rate (200/200 successful runs)

@puzpuzpuz puzpuzpuz self-assigned this Jan 9, 2026
@puzpuzpuz puzpuzpuz added Bug Incorrect or unexpected behavior Core Related to storage, data type, etc. labels Jan 9, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 9, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The PR updates jemalloc integration and refactors the shell startup script. It isolates jemalloc's LD_PRELOAD to the Java process only through a JAVA_CMD wrapper, adds new jemalloc source files to the build, introduces compile-time configuration flags, and updates the submodule pointer.

Changes

Cohort / File(s) Summary
Shell Startup Script
core/src/main/bin/questdb.sh
Refactored jemalloc library selection to use libjemalloc.so* with head -1 filtering. Introduced JAVA_CMD wrapper that applies LD_PRELOAD exclusively to the Java process via env LD_PRELOAD=${QDB_JEMALLOC_LIB}. Updated all Java invocation points (container mode, background, final start) to use the new wrapper, confining jemalloc preloading to Java only.
Jemalloc Build Configuration
core/src/main/c/share/jemalloc-cmake/CMakeLists.txt
Added batcher.c and prof_threshold.c source files to the jemalloc build compilation list.
Jemalloc Internal Headers
core/src/main/c/share/jemalloc-cmake/include/jemalloc/internal/jemalloc_preamble.h, core/src/main/c/share/jemalloc-cmake/include_linux_*/jemalloc/internal/jemalloc_internal_defs.h.in
Added two new compile-time boolean flags: have_process_madvise and config_prof_frameptr in preamble header. Introduced commented-out JEMALLOC_PROF_FRAME_POINTER macro placeholder in configuration headers for both x86_64 and aarch64 architectures.
Jemalloc Submodule
core/src/main/c/share/jemalloc
Updated submodule pointer from commit 8d8379da... to f6655d68... with no observable functional changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • questdb/questdb#6150 — Both PRs modify questdb.sh Java startup flow; one adds JAVA_CMD wrapper for jemalloc isolation while the other injects async-profiler agent, sharing and potentially interacting with the same invocation logic.

Suggested labels

integration, Performance

Suggested reviewers

  • bluestreak01
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main fix: addressing memory release issues on Linux when jemalloc is enabled, which directly matches the changeset's focus on jemalloc thread startup and preloading isolation.
Description check ✅ Passed The description comprehensively explains the jemalloc thread startup race condition, the fork/patch solution, the version upgrade, and the LD_PRELOAD isolation change, all of which are directly reflected in the changeset modifications across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@puzpuzpuz puzpuzpuz force-pushed the puzpuzpuz_stuck_jemalloc_threads branch from e49394f to cb804f3 Compare January 9, 2026 13:05
@puzpuzpuz puzpuzpuz marked this pull request as ready for review January 9, 2026 13:08
@puzpuzpuz puzpuzpuz marked this pull request as draft January 9, 2026 13:55
@puzpuzpuz puzpuzpuz added the DO NOT MERGE These changes should not be merged to main branch label Jan 10, 2026
@puzpuzpuz puzpuzpuz marked this pull request as ready for review January 12, 2026 14:55
@puzpuzpuz puzpuzpuz removed the DO NOT MERGE These changes should not be merged to main branch label Jan 12, 2026
# Set LD_PRELOAD only for the Java process, not the shell
JAVA_CMD="${JAVA}"
if [ -n "${QDB_JEMALLOC_LIB}" ]; then
JAVA_CMD="env LD_PRELOAD=${QDB_JEMALLOC_LIB} ${JAVA}"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sklarsa do you happen to know if all Docker base images that we use include env utility? Usually it's provided as a part of coreutils, so Debian, Alpine and CentOS/RHEL-based images should have it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@puzpuzpuz you'd have to test docker builds...., speaking of which, should we also have docker use jemalloc?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was around custom Docker images that users may have. Though it's probably not a big deal since questdb.sh already relies on a few utilities.

As for enabling jemalloc in our Docker image, I'd wait a few releases after 9.3.1, so that we're more confident that there are no remaining issues around jemalloc. I've run tests locally and it seems to work fine, but extra cautiousness won't hurt.

@puzpuzpuz
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @core/src/main/c/share/jemalloc:
- Line 1: The .gitmodules entry for the jemalloc submodule still points to
upstream (https://github.com/jemalloc/jemalloc.git) while the PR expects
QuestDB's fork; update the url for the jemalloc submodule (path
core/src/main/c/share/jemalloc) to https://github.com/questdb/jemalloc.git so
commit f6655d686bcd57e3373d4b113fea2fd5c4a4785e can be resolved, then run git
submodule sync and git submodule update --init --recursive (and update
.git/config if necessary) to ensure the local submodule config matches the
.gitmodules change.
🧹 Nitpick comments (1)
core/src/main/bin/questdb.sh (1)

383-398: Isolating LD_PRELOAD to Java process—key fix for the jemalloc race condition.

The JAVA_CMD wrapper correctly confines LD_PRELOAD to the Java process only, preventing jemalloc from being loaded into bash and other subprocesses. This addresses the root cause where jemalloc background threads could race during shell operations.

Minor robustness note: The unquoted ${JAVA_CMD} expansion relies on paths not containing spaces. While uncommon in typical installations, paths with spaces in $BASE or $JAVA would cause incorrect word splitting.

♻️ Optional: More robust alternative using array or function

If robustness to paths with spaces is desired:

-    JAVA_CMD="${JAVA}"
-    if [ -n "${QDB_JEMALLOC_LIB}" ]; then
-        JAVA_CMD="env LD_PRELOAD=${QDB_JEMALLOC_LIB} ${JAVA}"
-    fi
+    run_java() {
+        if [ -n "${QDB_JEMALLOC_LIB}" ]; then
+            env LD_PRELOAD="${QDB_JEMALLOC_LIB}" "${JAVA}" "$@"
+        else
+            "${JAVA}" "$@"
+        fi
+    }

Then replace ${JAVA_CMD} ${JAVA_OPTS} ... with run_java ${JAVA_OPTS} ....

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a5f01d and 321dd09.

⛔ Files ignored due to path filters (3)
  • core/src/main/bin/linux-aarch64/libjemalloc.so is excluded by !**/*.so
  • core/src/main/bin/linux-x86-64/libjemalloc.so is excluded by !**/*.so
  • core/src/main/resources/io/questdb/bin/windows-x86-64/libquestdb.dll is excluded by !**/*.dll
📒 Files selected for processing (6)
  • core/src/main/bin/questdb.sh
  • core/src/main/c/share/jemalloc
  • core/src/main/c/share/jemalloc-cmake/CMakeLists.txt
  • core/src/main/c/share/jemalloc-cmake/include/jemalloc/internal/jemalloc_preamble.h
  • core/src/main/c/share/jemalloc-cmake/include_linux_aarch64/jemalloc/internal/jemalloc_internal_defs.h.in
  • core/src/main/c/share/jemalloc-cmake/include_linux_x86_64/jemalloc/internal/jemalloc_internal_defs.h.in
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
  • GitHub Check: New pull request (Coverage Report Coverage Report)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
  • GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
  • GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (6)
core/src/main/c/share/jemalloc-cmake/include_linux_x86_64/jemalloc/internal/jemalloc_internal_defs.h.in (1)

157-159: LGTM!

The new JEMALLOC_PROF_FRAME_POINTER configuration option is correctly placed among the other profiling backtrace options and follows the established pattern. It's appropriately disabled by default, which is consistent with the other JEMALLOC_PROF_* options in this file. This addition aligns with the jemalloc update mentioned in the PR objectives.

core/src/main/c/share/jemalloc-cmake/CMakeLists.txt (1)

39-39: LGTM!

The new source files batcher.c and prof_threshold.c are correctly added to the build in alphabetical order, consistent with the jemalloc dev branch update mentioned in the PR.

Also applies to: 84-84

core/src/main/c/share/jemalloc-cmake/include_linux_aarch64/jemalloc/internal/jemalloc_internal_defs.h.in (1)

157-159: LGTM!

The JEMALLOC_PROF_FRAME_POINTER placeholder is correctly added alongside other profiling-related macros. The commented-out #undef follows the existing pattern for optional features in this configuration template.

core/src/main/c/share/jemalloc-cmake/include/jemalloc/internal/jemalloc_preamble.h (2)

81-87: LGTM!

The have_process_madvise flag follows the established pattern and is logically placed after have_madvise_huge. The process_madvise() syscall provides batch memory advice capabilities on newer Linux kernels.


124-130: LGTM!

The config_prof_frameptr flag correctly corresponds to JEMALLOC_PROF_FRAME_POINTER and is placed alongside other profiling configuration options (config_prof_libgcc, config_prof_libunwind).

core/src/main/bin/questdb.sh (1)

172-174: Code is correct; remove outdated concern about multiple library versions.

In the actual distributed binaries, each architecture directory (linux-aarch64, linux-x86-64) contains exactly one libjemalloc.so file. The glob pattern libjemalloc.so* will match only this single file, making the use of head -1 straightforward. The code properly handles the case where the library is missing via the 2>/dev/null redirect and the [[ -r "${jemalloc_so}" ]] test. No issues to address.

@bluestreak01 bluestreak01 merged commit bc62d3c into master Jan 13, 2026
44 checks passed
@bluestreak01 bluestreak01 deleted the puzpuzpuz_stuck_jemalloc_threads branch January 13, 2026 15:49
ideoma pushed a commit that referenced this pull request Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior Core Related to storage, data type, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants