Skip to content

[RFC] Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu)#2005

Merged
davidtgoldblatt merged 1 commit intojemalloc:devfrom
azat-archive:madvise-MADV_DONTNEED_zeros-runtime-check
Jan 21, 2021
Merged

[RFC] Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu)#2005
davidtgoldblatt merged 1 commit intojemalloc:devfrom
azat-archive:madvise-MADV_DONTNEED_zeros-runtime-check

Conversation

@azat
Copy link
Copy Markdown
Contributor

@azat azat commented Dec 18, 2020

qemu does not support this, yet 1, and you can get very tricky assert
if you will run program with jemalloc in use under qemu:

<jemalloc>: ../contrib/jemalloc/src/extent.c:1195: Failed assertion: "p[i] == 0"

Here is a simple example that shows the problem 2:

// Gist to check possible issues with MADV_DONTNEED
// For example it does not supported by qemu user
// There is a patch for this [1], but it hasn't been applied.
//   [1]: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05422.html

#include <sys/mman.h>
#include <stdio.h>
#include <stddef.h>
#include <assert.h>
#include <string.h>

int main(int argc, char **argv)
{
    void *addr = mmap(NULL, 1<<16, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    if (addr == MAP_FAILED) {
        perror("mmap");
        return 1;
    }
    memset(addr, 'A', 1<<16);

    if (!madvise(addr, 1<<16, MADV_DONTNEED)) {
        puts("MADV_DONTNEED does not return error. Check memory.");
        for (int i = 0; i < 1<<16; ++i) {
            assert(((unsigned char *)addr)[i] == 0);
        }
    } else {
        perror("madvise");
    }

    if (munmap(addr, 1<<16)) {
        perror("munmap");
        return 1;
    }

    return 0;
}

unpatched qemu

  $ qemu-x86_64-static /tmp/test-MADV_DONTNEED
  MADV_DONTNEED does not return error. Check memory.
  test-MADV_DONTNEED: /tmp/test-MADV_DONTNEED.c:19: main: Assertion `((unsigned char *)addr)[i] == 0' failed.
  qemu: uncaught target signal 6 (Aborted) - core dumped
  Aborted (core dumped)

patched qemu (by returning ENOSYS error)

  $ qemu-x86_64 /tmp/test-MADV_DONTNEED
  madvise: Success

patch for qemu to return ENOSYS

  diff --git a/linux-user/syscall.c b/linux-user/syscall.c
  index 897d20c076..5540792e0e 100644
  --- a/linux-user/syscall.c
  +++ b/linux-user/syscall.c
  @@ -11775,7 +11775,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
              turns private file-backed mappings into anonymous mappings.
              This will break MADV_DONTNEED.
              This is a hint, so ignoring and returning success is ok.  */
  -        return 0;
  +        return ENOSYS;
   #endif
   #ifdef TARGET_NR_fcntl64
       case TARGET_NR_fcntl64:

Refs: #1844 (comment)

This change is needed for ClickHouse to enable tests of AArch64 builds under QEMU

@azat azat changed the title Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu) [RFC] Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu) Dec 18, 2020
@azat azat force-pushed the madvise-MADV_DONTNEED_zeros-runtime-check branch 3 times, most recently from 6089af1 to 6a2f26f Compare December 18, 2020 21:39
@azat azat force-pushed the madvise-MADV_DONTNEED_zeros-runtime-check branch from 6a2f26f to cb757fc Compare December 18, 2020 21:52
@alexey-milovidov
Copy link
Copy Markdown

This change is needed for ClickHouse to enable tests of AArch64 builds under QEMU.

AArch64 becomes more popular as a server platform and we see increasing demand of running ClickHouse on AArch64 servers. But there are difficulties in integrating it in CI environment that only has x86_64. That's why we are trying to run it under QEMU.

@davidtgoldblatt
Copy link
Copy Markdown
Contributor

Everyone's on vacation now, so it'll probably be early next year before anyone looks at this closely. That being said, my first thought is that I don't love adding an extra syscall to every malloc-using process; this is something that the FreeBSD people have tried to get down quite a bit recently (out of concern for minimizing fixed-cost overhead of process creation). I wonder if we can get by with something in configuration. Something like one off:

  • Building with ./configure --dont_trust_madvise
  • Trying out a purge during configuration time, and only turning on the runtime check if that fails

@azat
Copy link
Copy Markdown
Contributor Author

azat commented Dec 21, 2020

@davidtgoldblatt thank you for the feedback.

That being said, my first thought is that I don't love adding an extra syscall to every malloc-using process

I'm kind of understand, but this quirk is pretty simple and has almost zero overhead, and to me it looks like better to have the check instead of silently works incorrectly.

Building with ./configure --dont_trust_madvise

But this will require building separate binaries to run in qemu

Trying out a purge during configuration time, and only turning on the runtime check if that fails

Same here

@azat
Copy link
Copy Markdown
Contributor Author

azat commented Jan 17, 2021

@davidtgoldblatt holidays are passed, maybe someone can take a look is suitable for upstream or not? (Feel free to close if not)

@davidtgoldblatt
Copy link
Copy Markdown
Contributor

Sorry, we celebrate the new year by immediately doing our performance reviews for the previous 6 months, which ends up sucking up lots of time.

I think I'm sold on this generally, so long as it's guarded by a config option (which can at least be off by default outside of Linux). I've got a couple other nits (e.g. using atomics for the testing, some constant choices, etc.). Up to you if we iterate on this, or I can just put up a parallel PR if you'd rather not deal with our configure scripts.

@azat
Copy link
Copy Markdown
Contributor Author

azat commented Jan 19, 2021

Sure no problem, I can finish this to make it ready for upstream:

  • config option - ok
  • as for atomic, is it a problem? the flag is set only in pages_boot, like other flags and they don't atomic
  • you can post what constant naming you will prefer, or if you prefer to finish this by yourself, I'm completely fine with this.

Copy link
Copy Markdown
Contributor

@davidtgoldblatt davidtgoldblatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing I forgot to mention -- the configure option should be called out in the installation documentation.

@davidtgoldblatt
Copy link
Copy Markdown
Contributor

Oh, forgot to mention -- good point about the atomic, didn't realize that this wasn't an initialize-on-first-use thing.

@azat azat force-pushed the madvise-MADV_DONTNEED_zeros-runtime-check branch from cb757fc to 59eb08f Compare January 19, 2021 21:17
@azat azat requested a review from davidtgoldblatt January 19, 2021 21:18
Copy link
Copy Markdown
Contributor

@davidtgoldblatt davidtgoldblatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good -- one substantive naming change and a few style nits (full rules here: https://github.com/jemalloc/jemalloc/wiki/Coding-Style).

qemu does not support this, yet [1], and you can get very tricky assert
if you will run program with jemalloc in use under qemu:

    <jemalloc>: ../contrib/jemalloc/src/extent.c:1195: Failed assertion: "p[i] == 0"

  [1]: https://patchwork.kernel.org/patch/10576637/

Here is a simple example that shows the problem [2]:

    // Gist to check possible issues with MADV_DONTNEED
    // For example it does not supported by qemu user
    // There is a patch for this [1], but it hasn't been applied.
    //   [1]: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05422.html

    #include <sys/mman.h>
    #include <stdio.h>
    #include <stddef.h>
    #include <assert.h>
    #include <string.h>

    int main(int argc, char **argv)
    {
        void *addr = mmap(NULL, 1<<16, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        if (addr == MAP_FAILED) {
            perror("mmap");
            return 1;
        }
        memset(addr, 'A', 1<<16);

        if (!madvise(addr, 1<<16, MADV_DONTNEED)) {
            puts("MADV_DONTNEED does not return error. Check memory.");
            for (int i = 0; i < 1<<16; ++i) {
                assert(((unsigned char *)addr)[i] == 0);
            }
        } else {
            perror("madvise");
        }

        if (munmap(addr, 1<<16)) {
            perror("munmap");
            return 1;
        }

        return 0;
    }

  ### unpatched qemu

      $ qemu-x86_64-static /tmp/test-MADV_DONTNEED
      MADV_DONTNEED does not return error. Check memory.
      test-MADV_DONTNEED: /tmp/test-MADV_DONTNEED.c:19: main: Assertion `((unsigned char *)addr)[i] == 0' failed.
      qemu: uncaught target signal 6 (Aborted) - core dumped
      Aborted (core dumped)

  ### patched qemu (by returning ENOSYS error)

      $ qemu-x86_64 /tmp/test-MADV_DONTNEED
      madvise: Success

  ### patch for qemu to return ENOSYS

      diff --git a/linux-user/syscall.c b/linux-user/syscall.c
      index 897d20c076..5540792e0e 100644
      --- a/linux-user/syscall.c
      +++ b/linux-user/syscall.c
      @@ -11775,7 +11775,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
                  turns private file-backed mappings into anonymous mappings.
                  This will break MADV_DONTNEED.
                  This is a hint, so ignoring and returning success is ok.  */
      -        return 0;
      +        return ENOSYS;
       #endif
       #ifdef TARGET_NR_fcntl64
           case TARGET_NR_fcntl64:

  [2]: https://gist.github.com/azat/12ba2c825b710653ece34dba7f926ece

v2:
- review fixes
- add opt_dont_trust_madvise
v3:
- review fixes
- rename opt_dont_trust_madvise to opt_trust_madvise
@azat azat force-pushed the madvise-MADV_DONTNEED_zeros-runtime-check branch from 59eb08f to e516b65 Compare January 20, 2021 18:51
Copy link
Copy Markdown
Contributor

@davidtgoldblatt davidtgoldblatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants