Skip to content

[backport] Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu)#1

Merged
azat merged 1 commit intoClickHouse:chfrom
azat-archive:ch-backports/madvise-MADV_DONTNEED_zeros-runtime-check
Dec 18, 2020
Merged

[backport] Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu)#1
azat merged 1 commit intoClickHouse:chfrom
azat-archive:ch-backports/madvise-MADV_DONTNEED_zeros-runtime-check

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Dec 18, 2020

qemu does not support this, yet 1, and you can get very tricky assert
if you will run program with jemalloc in use under qemu:

<jemalloc>: ../contrib/jemalloc/src/extent.c:1195: Failed assertion: "p[i] == 0"

Here is a simple example that shows the problem 2:

// Gist to check possible issues with MADV_DONTNEED
// For example it does not supported by qemu user
// There is a patch for this [1], but it hasn't been applied.
//   [1]: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05422.html

#include <sys/mman.h>
#include <stdio.h>
#include <stddef.h>
#include <assert.h>
#include <string.h>

int main(int argc, char **argv)
{
    void *addr = mmap(NULL, 1<<16, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    if (addr == MAP_FAILED) {
        perror("mmap");
        return 1;
    }
    memset(addr, 'A', 1<<16);

    if (!madvise(addr, 1<<16, MADV_DONTNEED)) {
        puts("MADV_DONTNEED does not return error. Check memory.");
        for (int i = 0; i < 1<<16; ++i) {
            assert(((unsigned char *)addr)[i] == 0);
        }
    } else {
        perror("madvise");
    }

    if (munmap(addr, 1<<16)) {
        perror("munmap");
        return 1;
    }

    return 0;
}

unpatched qemu

  $ qemu-x86_64-static /tmp/test-MADV_DONTNEED
  MADV_DONTNEED does not return error. Check memory.
  test-MADV_DONTNEED: /tmp/test-MADV_DONTNEED.c:19: main: Assertion `((unsigned char *)addr)[i] == 0' failed.
  qemu: uncaught target signal 6 (Aborted) - core dumped
  Aborted (core dumped)

patched qemu (by returning ENOSYS error)

  $ qemu-x86_64 /tmp/test-MADV_DONTNEED
  madvise: Success

patch for qemu to return ENOSYS

  diff --git a/linux-user/syscall.c b/linux-user/syscall.c
  index 897d20c076..5540792e0e 100644
  --- a/linux-user/syscall.c
  +++ b/linux-user/syscall.c
  @@ -11775,7 +11775,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
              turns private file-backed mappings into anonymous mappings.
              This will break MADV_DONTNEED.
              This is a hint, so ignoring and returning success is ok.  */
  -        return 0;
  +        return ENOSYS;
   #endif
   #ifdef TARGET_NR_fcntl64
       case TARGET_NR_fcntl64:

Upstream PR: jemalloc#2005

@azat azat force-pushed the ch-backports/madvise-MADV_DONTNEED_zeros-runtime-check branch from 7fcaaf3 to 10eda68 Compare December 18, 2020 21:51
…ly for qemu)

qemu does not support this, yet [1], and you can get very tricky assert
if you will run program with jemalloc in use under qemu:

    <jemalloc>: ../contrib/jemalloc/src/extent.c:1195: Failed assertion: "p[i] == 0"

  [1]: https://patchwork.kernel.org/patch/10576637/

Here is a simple example that shows the problem [2]:

    // Gist to check possible issues with MADV_DONTNEED
    // For example it does not supported by qemu user
    // There is a patch for this [1], but it hasn't been applied.
    //   [1]: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05422.html

    #include <sys/mman.h>
    #include <stdio.h>
    #include <stddef.h>
    #include <assert.h>
    #include <string.h>

    int main(int argc, char **argv)
    {
        void *addr = mmap(NULL, 1<<16, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        if (addr == MAP_FAILED) {
            perror("mmap");
            return 1;
        }
        memset(addr, 'A', 1<<16);

        if (!madvise(addr, 1<<16, MADV_DONTNEED)) {
            puts("MADV_DONTNEED does not return error. Check memory.");
            for (int i = 0; i < 1<<16; ++i) {
                assert(((unsigned char *)addr)[i] == 0);
            }
        } else {
            perror("madvise");
        }

        if (munmap(addr, 1<<16)) {
            perror("munmap");
            return 1;
        }

        return 0;
    }

  ### unpatched qemu

      $ qemu-x86_64-static /tmp/test-MADV_DONTNEED
      MADV_DONTNEED does not return error. Check memory.
      test-MADV_DONTNEED: /tmp/test-MADV_DONTNEED.c:19: main: Assertion `((unsigned char *)addr)[i] == 0' failed.
      qemu: uncaught target signal 6 (Aborted) - core dumped
      Aborted (core dumped)

  ### patched qemu (by returning ENOSYS error)

      $ qemu-x86_64 /tmp/test-MADV_DONTNEED
      madvise: Success

  ### patch for qemu to return ENOSYS

      diff --git a/linux-user/syscall.c b/linux-user/syscall.c
      index 897d20c076..5540792e0e 100644
      --- a/linux-user/syscall.c
      +++ b/linux-user/syscall.c
      @@ -11775,7 +11775,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
                  turns private file-backed mappings into anonymous mappings.
                  This will break MADV_DONTNEED.
                  This is a hint, so ignoring and returning success is ok.  */
      -        return 0;
      +        return ENOSYS;
       #endif
       #ifdef TARGET_NR_fcntl64
           case TARGET_NR_fcntl64:

  [2]: https://gist.github.com/azat/12ba2c825b710653ece34dba7f926ece
@azat azat force-pushed the ch-backports/madvise-MADV_DONTNEED_zeros-runtime-check branch from 10eda68 to e6891d9 Compare December 18, 2020 21:58
@azat azat merged commit e6891d9 into ClickHouse:ch Dec 18, 2020
@azat azat deleted the ch-backports/madvise-MADV_DONTNEED_zeros-runtime-check branch December 18, 2020 21:59
azat added a commit to azat/ClickHouse that referenced this pull request Dec 19, 2020
qemu does not support MADV_DONTNEED, and by not support it simply ignore
it (i.e. return 0 -- no error).

This issue has been "fixed" in ClickHouse#15590, however it just
terminates the process, and completely breaks clickhouse under qemu
(see also ClickHouse#15174).

But there is no need in such strong protection, we can stop using
madvise in jemalloc if MADV_DONTNEED does not work properly.
And this is what ClickHouse#18169 was tried to do (by override madvise), however
this will break sanitizers, at least TSAN and UBSAN.
The problem there is that sanitizers initialization code uses madvise
(and there is no way to turn this off with TSAN_OPTIONS) and overwritten
madvise function will have sanitizers traits (__tsan_func_entry), while
TSAN is not ready for this, and eventually it SIGSEGV.
Interesting thing is that in the recent clang-12, madvise was replaced
with direct syscall [1].

  [1]: llvm/llvm-project@9f8c403

But it is better to make clickhouse compatible with clang < 12 too, so
instead of override madvise completely, the runtime check was moved into
the jemalloc code [2].

  [2]: ClickHouse/jemalloc#1
antonio2368 pushed a commit that referenced this pull request Apr 1, 2026
We tried to load `g` from `bitmap[i]` before checking it is actually a
valid load. Tweaked a loop a bit to `break` early, when we are done
scanning for bits.

Before this commit undefined behaviour sanitizer from GCC 14+ was
unhappy at `test/unit/bitmap` test with following error.

```
../include/jemalloc/internal/bitmap.h:293:5: runtime error: load of
address 0x7bb1c2e08008 with insufficient space for an object of type
'const bitmap_t'
<...>
    #0 0x62671a149954 in bitmap_ffu ../include/jemalloc/internal/bitmap.h:293
    #1 0x62671a149954 in test_bitmap_xfu_body ../test/unit/bitmap.c:275
    #2 0x62671a14b767 in test_bitmap_xfu ../test/unit/bitmap.c:323
    #3 0x62671a376ad1 in p_test_impl ../test/src/test.c:149
    #4 0x62671a377135 in p_test ../test/src/test.c:200
    jemalloc#5 0x62671a13da06 in main ../test/unit/bitmap.c:336
<...>
```
antonio2368 pushed a commit that referenced this pull request Apr 1, 2026
This commit allows to enable sanitizers with autoconf options, instead
of modifying `CFLAGS`, `CXXFLAGS` and `LDFLAGS` directly.

* `--enable-tsan` option to enable Thread Sanitizer.
* `--enable-ubsan` option to enable Undefined Behaviour Sanitizer.

End goal is to speedup development by finding problems quickly, early
and easier. Eventually, when all current issues will be fixed, we can
enable sanitizers in CI. Fortunately, there are not a lot of problems we
need to fix.

Address Sanitizer is a bit controversial, because it replaces memory
allocator, so we decided to left it out for a while.

Below are couple of examples of how tests look like under different
sanitizers at the moment.

```
$  ../configure --enable-tsan --enable-debug
<...>
asan               : 0
tsan               : 1
ubsan              : 0
$ make -j`nproc` check
<...>
  Thread T13 (tid=332043, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x61748)
    #1 thd_create ../test/src/thd.c:25 (bin_batching+0x5631ca)
    #2 stress_run ../test/unit/bin_batching.c:148
(bin_batching+0x40364c)
    #3 test_races ../test/unit/bin_batching.c:249
(bin_batching+0x403d79)
    #4 p_test_impl ../test/src/test.c:149 (bin_batching+0x562811)
    jemalloc#5 p_test_no_reentrancy ../test/src/test.c:213
(bin_batching+0x562d35)
    jemalloc#6 main ../test/unit/bin_batching.c:268 (bin_batching+0x40417e)

SUMMARY: ThreadSanitizer: data race
../include/jemalloc/internal/edata.h:498 in edata_nfree_inc
```

```
$ ../configure --enable-ubsan --enable-debug
<...>
asan               : 0
tsan               : 0
ubsan              : 1
$ make -j`nproc` check
<...>
=== test/unit/hash ===
../test/unit/hash.c:119:16: runtime error: left shift of 176 by 24
places cannot be represented in type 'int'
<...>
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant