-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
In an application that uses jemalloc statically linked, I am seeing an ever-increasing value of the process' vm.max_map_count value. The overcommit_memory setting value is 2, so no overcommitting.
It seems that jemalloc reads the overcommit setting at startup, and later takes this setting's value into account when "returning" memory.
When overcommit_memory is set to 2, it seems to call mmap on the returned range, with a protection of PROT_NONE. It seems that this punches holes into existing mappings, so that the kernel will split them and create more of them. This would not be a problem if it happened only seldomly, but we have several use cases in which it happens so often that even increasing the value of vm.max_map_count to tens of millions does not help much.
I have created some (contrived) standalone test program which shows the behavior. I hope it is somewhat deterministic so others can reproduce it:
#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <thread>
#include <algorithm>
#include <cstdlib>
#include <cstring>
void fun() {
std::string line;
size_t allocated = 0;
size_t allocations = 0;
std::deque<std::pair<void*, size_t>> all;
size_t n = 1024 * 1024 * 1024;
::srand(2848583);
for (size_t i = 0; i < n; ++i) {
if (allocations % 500000 == 0) {
std::ifstream ifs("/proc/self/maps", std::ios_base::in);
if (!ifs.is_open()) {
std::cerr << "unable to open mappings file" << std::endl;
std::abort();
}
size_t mappings = 0;
while (std::getline(ifs, line)) {
++mappings;
}
std::cout << "- i: " << i << ", allocations: " << allocations << ", mappings: " << mappings << ", allocated: " << allocated << "\n";
}
size_t s = ::rand() % (4096 * 128 / 8);
if (s > 0) {
auto* p = ::malloc(s);
if (p == nullptr) {
std::cerr << "OOM. s: " << s << std::endl;
std::abort();
}
++allocations;
allocated += s;
all.push_back(std::make_pair(p, s));
}
while (!all.empty() && allocated > 1024 * 1024 * 128) {
auto& f = all.front();
::free(f.first);
allocated -= f.second;
all.pop_front();
}
}
}
int main() {
std::vector<std::thread> threads;
size_t n = 8;
for (size_t i = 0; i < n; ++i) {
threads.emplace_back(fun);
}
for (auto& it : threads) {
it.join();
}
}The test program can be compiled and run as follows:
g++ -std=c++11 -O2 -Wall -Wextra test.cc -o test -lpthread
GLIBCXX_FORCE_NEW=1 LD_PRELOAD=~/jemalloc-5.1.0/lib/libjemalloc.so ./testThe program allocates memory of pseudo-random sizes and returns some of the memory. It does so with a few parallel threads. Each thread will not exceed a certain size of allocated memory, so it should not leak.
Each thread is writing out some values to std::cout. The only interesting figure to look at is the "mappings" value reported, e.g.
- i: 7500131, allocations: 7500000, mappings: 18347, allocated: 134196266
That "mappings" value is calculated as the number of lines in /proc/self/maps, which is not 100% accurate but should be a good-enough approximation.
The problem is that when overcommit_memory is set to 2, the number of mappings will grow crazily, both with jemalloc 5.0.1 and jemalloc 5.1.0.
A "fix" for the problem is to apply the following patch:
diff --git a/3rdParty/jemalloc/v5.1.0/src/pages.c b/3rdParty/jemalloc/v5.1.0/src/pages.c
index 26002692d6..3fbad076ad 100644
--- a/3rdParty/jemalloc/v5.1.0/src/pages.c
+++ b/3rdParty/jemalloc/v5.1.0/src/pages.c
@@ -23,7 +23,7 @@ static size_t os_page;
#ifndef _WIN32
# define PAGES_PROT_COMMIT (PROT_READ | PROT_WRITE)
-# define PAGES_PROT_DECOMMIT (PROT_NONE)
+# define PAGES_PROT_DECOMMIT (PROT_READ | PROT_WRITE)
static int mmap_flags;
#endif
static bool os_overcommits;This makes the test program run with a very low number of memory mappings. It is obviously not a good fix, because it will leave the memory around with read & write access allowed. So please consider it just a demo.
I think it would be good to make jemalloc more usable with an overcommit_memory setting value of 2. Right now, it is kind of risky to use it, because applications may too quickly hit the default vm.max_map_count value of 65K. And even increasing that setting does not help much, because the number of mappings can increase much over time, which means long-running server processes can hit the threshold easily, even if increased.
I guess the current implementation is as it is for a reason, so I guess you will be pretty reluctant to change it. However, it would be good to suggest how to avoid that behavior on systems that don't use overcommit and where vm settings cannot be adjusted. Can an option be added to jemalloc to adjust the behavior on commit in this case, when explicitly configured as such? I think this would help plenty of users, as I have seen several issues in this repository that may have the same root cause. The last one I checked was #1324.
Thanks!
(btw. although I think it does make any difference: the above was tried on Linux kernels 4.15 both on bare metal and an Azure cloud instance, compilers in use were g++-7.3.0 and g++-5.4.0)