vm.max_map_count growing steadily when vm.overcommit_memory is 2

In an application that uses jemalloc statically linked, I am seeing an ever-increasing value of the process' `vm.max_map_count` value. The `overcommit_memory` setting value is `2`, so no overcommitting.
It seems that jemalloc reads the overcommit setting at startup, and later takes this setting's value into account when "returning" memory. 
When `overcommit_memory` is set to `2`, it seems to call `mmap` on the returned range, with a protection of `PROT_NONE`. It seems that this punches holes into existing mappings, so that the kernel will split them and create more of them. This would not be a problem if it happened only seldomly, but we have several use cases in which it happens so often that even increasing the value of `vm.max_map_count` to tens of millions does not help much.
 
I have created some (contrived) standalone test program which shows the behavior. I hope it is somewhat deterministic so others can reproduce it:
```cpp
#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <thread>
#include <algorithm>
#include <cstdlib>
#include <cstring>

void fun() {
  std::string line;
  size_t allocated = 0;
  size_t allocations = 0;
  std::deque<std::pair<void*, size_t>> all;

  size_t n = 1024 * 1024 * 1024;
  ::srand(2848583);
  for (size_t i = 0; i < n; ++i) {
    if (allocations % 500000 == 0) {
      std::ifstream ifs("/proc/self/maps", std::ios_base::in);
      if (!ifs.is_open()) {
        std::cerr << "unable to open mappings file" << std::endl;
        std::abort();
      }
      size_t mappings = 0;
      while (std::getline(ifs, line)) {
        ++mappings;
      }
      std::cout << "- i: " << i << ", allocations: " << allocations << ", mappings: " << mappings << ", allocated: " << allocated << "\n";
    }
    size_t s = ::rand() % (4096 * 128 / 8);
    if (s > 0) {
      auto* p = ::malloc(s);
      if (p == nullptr) {
        std::cerr << "OOM. s: " << s << std::endl;
        std::abort();
      }
      ++allocations;
      allocated += s;
      all.push_back(std::make_pair(p, s));
    }
    while (!all.empty() && allocated > 1024 * 1024 * 128) {
      auto& f = all.front();
      ::free(f.first);
      allocated -= f.second;
      all.pop_front();
    }
  }
}

int main() {
  std::vector<std::thread> threads;

  size_t n = 8; 
  for (size_t i = 0; i < n; ++i) {
    threads.emplace_back(fun);
  }
  
  for (auto& it : threads) {
    it.join();
  }
}
```
The test program can be compiled and run as follows:
```bash
g++ -std=c++11 -O2 -Wall -Wextra test.cc -o test -lpthread
GLIBCXX_FORCE_NEW=1 LD_PRELOAD=~/jemalloc-5.1.0/lib/libjemalloc.so ./test
```
The program allocates memory of pseudo-random sizes and returns some of the memory. It does so with a few parallel threads. Each thread will not exceed a certain size of allocated memory, so it should not leak. 
Each thread is writing out some values to `std::cout`. The only interesting figure to look at is the "mappings" value reported, e.g.
```
- i: 7500131, allocations: 7500000, mappings: 18347, allocated: 134196266
```
That "mappings" value is calculated as the number of lines in `/proc/self/maps`, which is not 100% accurate but should be a good-enough approximation.

The problem is that when `overcommit_memory` is set to `2`, the number of mappings will grow crazily, both with jemalloc 5.0.1 and jemalloc 5.1.0.

A "fix" for the problem is to apply the following patch:
```diff
diff --git a/3rdParty/jemalloc/v5.1.0/src/pages.c b/3rdParty/jemalloc/v5.1.0/src/pages.c
index 26002692d6..3fbad076ad 100644
--- a/3rdParty/jemalloc/v5.1.0/src/pages.c
+++ b/3rdParty/jemalloc/v5.1.0/src/pages.c
@@ -23,7 +23,7 @@ static size_t os_page;
 
 #ifndef _WIN32
 #  define PAGES_PROT_COMMIT (PROT_READ | PROT_WRITE)
-#  define PAGES_PROT_DECOMMIT (PROT_NONE)
+#  define PAGES_PROT_DECOMMIT (PROT_READ | PROT_WRITE)
 static int     mmap_flags;
 #endif
 static bool    os_overcommits;
```
This makes the test program run with a very low number of memory mappings. It is obviously not a good fix, because it will leave the memory around with read & write access allowed. So please consider it just a demo.

I think it would be good to make jemalloc more usable with an `overcommit_memory` setting value of `2`. Right now, it is kind of risky to use it, because applications may too quickly hit the default `vm.max_map_count` value of 65K. And even increasing that setting does not help much, because the number of mappings can increase much over time, which means long-running server processes can hit the threshold easily, even if increased.

I guess the current implementation is as it is for a reason, so I guess you will be pretty reluctant to change it. However, it would be good to suggest how to avoid that behavior on systems that don't use overcommit and where vm settings cannot be adjusted. Can an option be added to jemalloc to adjust the behavior on commit in this case, when explicitly configured as such? I think this would help plenty of users, as I have seen several issues in this repository that may have the same root cause. The last one I checked was #1324.
Thanks!

(btw. although I think it does make any difference: the above was tried on Linux kernels 4.15 both on bare metal and an Azure cloud instance, compilers in use were g++-7.3.0 and g++-5.4.0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vm.max_map_count growing steadily when vm.overcommit_memory is 2 #1328

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vm.max_map_count growing steadily when vm.overcommit_memory is 2 #1328

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions