fix(core): process memory may not be released on Linux when jemalloc is enabled#6619
fix(core): process memory may not be released on Linux when jemalloc is enabled#6619bluestreak01 merged 17 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe PR updates jemalloc integration and refactors the shell startup script. It isolates jemalloc's LD_PRELOAD to the Java process only through a JAVA_CMD wrapper, adds new jemalloc source files to the build, introduces compile-time configuration flags, and updates the submodule pointer. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e49394f to
cb804f3
Compare
| # Set LD_PRELOAD only for the Java process, not the shell | ||
| JAVA_CMD="${JAVA}" | ||
| if [ -n "${QDB_JEMALLOC_LIB}" ]; then | ||
| JAVA_CMD="env LD_PRELOAD=${QDB_JEMALLOC_LIB} ${JAVA}" |
There was a problem hiding this comment.
@sklarsa do you happen to know if all Docker base images that we use include env utility? Usually it's provided as a part of coreutils, so Debian, Alpine and CentOS/RHEL-based images should have it.
There was a problem hiding this comment.
@puzpuzpuz you'd have to test docker builds...., speaking of which, should we also have docker use jemalloc?
There was a problem hiding this comment.
There was a problem hiding this comment.
There was a problem hiding this comment.
My concern was around custom Docker images that users may have. Though it's probably not a big deal since questdb.sh already relies on a few utilities.
As for enabling jemalloc in our Docker image, I'd wait a few releases after 9.3.1, so that we're more confident that there are no remaining issues around jemalloc. I've run tests locally and it seems to work fine, but extra cautiousness won't hurt.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @core/src/main/c/share/jemalloc:
- Line 1: The .gitmodules entry for the jemalloc submodule still points to
upstream (https://github.com/jemalloc/jemalloc.git) while the PR expects
QuestDB's fork; update the url for the jemalloc submodule (path
core/src/main/c/share/jemalloc) to https://github.com/questdb/jemalloc.git so
commit f6655d686bcd57e3373d4b113fea2fd5c4a4785e can be resolved, then run git
submodule sync and git submodule update --init --recursive (and update
.git/config if necessary) to ensure the local submodule config matches the
.gitmodules change.
🧹 Nitpick comments (1)
core/src/main/bin/questdb.sh (1)
383-398: Isolating LD_PRELOAD to Java process—key fix for the jemalloc race condition.The
JAVA_CMDwrapper correctly confinesLD_PRELOADto the Java process only, preventing jemalloc from being loaded into bash and other subprocesses. This addresses the root cause where jemalloc background threads could race during shell operations.Minor robustness note: The unquoted
${JAVA_CMD}expansion relies on paths not containing spaces. While uncommon in typical installations, paths with spaces in$BASEor$JAVAwould cause incorrect word splitting.♻️ Optional: More robust alternative using array or function
If robustness to paths with spaces is desired:
- JAVA_CMD="${JAVA}" - if [ -n "${QDB_JEMALLOC_LIB}" ]; then - JAVA_CMD="env LD_PRELOAD=${QDB_JEMALLOC_LIB} ${JAVA}" - fi + run_java() { + if [ -n "${QDB_JEMALLOC_LIB}" ]; then + env LD_PRELOAD="${QDB_JEMALLOC_LIB}" "${JAVA}" "$@" + else + "${JAVA}" "$@" + fi + }Then replace
${JAVA_CMD} ${JAVA_OPTS} ...withrun_java ${JAVA_OPTS} ....
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
core/src/main/bin/linux-aarch64/libjemalloc.sois excluded by!**/*.socore/src/main/bin/linux-x86-64/libjemalloc.sois excluded by!**/*.socore/src/main/resources/io/questdb/bin/windows-x86-64/libquestdb.dllis excluded by!**/*.dll
📒 Files selected for processing (6)
core/src/main/bin/questdb.shcore/src/main/c/share/jemalloccore/src/main/c/share/jemalloc-cmake/CMakeLists.txtcore/src/main/c/share/jemalloc-cmake/include/jemalloc/internal/jemalloc_preamble.hcore/src/main/c/share/jemalloc-cmake/include_linux_aarch64/jemalloc/internal/jemalloc_internal_defs.h.incore/src/main/c/share/jemalloc-cmake/include_linux_x86_64/jemalloc/internal/jemalloc_internal_defs.h.in
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
- GitHub Check: New pull request (Coverage Report Coverage Report)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (6)
core/src/main/c/share/jemalloc-cmake/include_linux_x86_64/jemalloc/internal/jemalloc_internal_defs.h.in (1)
157-159: LGTM!The new
JEMALLOC_PROF_FRAME_POINTERconfiguration option is correctly placed among the other profiling backtrace options and follows the established pattern. It's appropriately disabled by default, which is consistent with the otherJEMALLOC_PROF_*options in this file. This addition aligns with the jemalloc update mentioned in the PR objectives.core/src/main/c/share/jemalloc-cmake/CMakeLists.txt (1)
39-39: LGTM!The new source files
batcher.candprof_threshold.care correctly added to the build in alphabetical order, consistent with the jemalloc dev branch update mentioned in the PR.Also applies to: 84-84
core/src/main/c/share/jemalloc-cmake/include_linux_aarch64/jemalloc/internal/jemalloc_internal_defs.h.in (1)
157-159: LGTM!The
JEMALLOC_PROF_FRAME_POINTERplaceholder is correctly added alongside other profiling-related macros. The commented-out#undeffollows the existing pattern for optional features in this configuration template.core/src/main/c/share/jemalloc-cmake/include/jemalloc/internal/jemalloc_preamble.h (2)
81-87: LGTM!The
have_process_madviseflag follows the established pattern and is logically placed afterhave_madvise_huge. Theprocess_madvise()syscall provides batch memory advice capabilities on newer Linux kernels.
124-130: LGTM!The
config_prof_frameptrflag correctly corresponds toJEMALLOC_PROF_FRAME_POINTERand is placed alongside other profiling configuration options (config_prof_libgcc,config_prof_libunwind).core/src/main/bin/questdb.sh (1)
172-174: Code is correct; remove outdated concern about multiple library versions.In the actual distributed binaries, each architecture directory (
linux-aarch64,linux-x86-64) contains exactly onelibjemalloc.sofile. The glob patternlibjemalloc.so*will match only this single file, making the use ofhead -1straightforward. The code properly handles the case where the library is missing via the2>/dev/nullredirect and the[[ -r "${jemalloc_so}" ]]test. No issues to address.
Jemalloc background threads were failing to start reliably (~20% failure rate) when QuestDB was launched via questdb.sh. This prevented memory from being released back to the OS properly.
To fix this race, we're switching to QuestDB's own jemalloc fork. See questdb/jemalloc#1 for more details on the race, as well as the fix.
Important note. Previously we were using the latest jemalloc release which is pretty outdated (May 2022). So, along with the fix, we're switching to the latest
devbranch commit (June 2025). Other databases also seem to use jemalloc'sdevbranch, e.g. ClickHouse is also using the latestdevcommit.Other than that, includes the following changes:
env LD_PRELOAD=...to load jemalloc only in the Java process since it makes no sense to use jemalloc for thequestdb.shbash scriptTest Results