[RFC] Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu)#2005
Conversation
6089af1 to
6a2f26f
Compare
6a2f26f to
cb757fc
Compare
|
This change is needed for ClickHouse to enable tests of AArch64 builds under QEMU. AArch64 becomes more popular as a server platform and we see increasing demand of running ClickHouse on AArch64 servers. But there are difficulties in integrating it in CI environment that only has x86_64. That's why we are trying to run it under QEMU. |
|
Everyone's on vacation now, so it'll probably be early next year before anyone looks at this closely. That being said, my first thought is that I don't love adding an extra syscall to every malloc-using process; this is something that the FreeBSD people have tried to get down quite a bit recently (out of concern for minimizing fixed-cost overhead of process creation). I wonder if we can get by with something in configuration. Something like one off:
|
|
@davidtgoldblatt thank you for the feedback.
I'm kind of understand, but this quirk is pretty simple and has almost zero overhead, and to me it looks like better to have the check instead of silently works incorrectly.
But this will require building separate binaries to run in qemu
Same here |
|
@davidtgoldblatt holidays are passed, maybe someone can take a look is suitable for upstream or not? (Feel free to close if not) |
|
Sorry, we celebrate the new year by immediately doing our performance reviews for the previous 6 months, which ends up sucking up lots of time. I think I'm sold on this generally, so long as it's guarded by a config option (which can at least be off by default outside of Linux). I've got a couple other nits (e.g. using atomics for the testing, some constant choices, etc.). Up to you if we iterate on this, or I can just put up a parallel PR if you'd rather not deal with our configure scripts. |
|
Sure no problem, I can finish this to make it ready for upstream:
|
davidtgoldblatt
left a comment
There was a problem hiding this comment.
One other thing I forgot to mention -- the configure option should be called out in the installation documentation.
|
Oh, forgot to mention -- good point about the atomic, didn't realize that this wasn't an initialize-on-first-use thing. |
cb757fc to
59eb08f
Compare
davidtgoldblatt
left a comment
There was a problem hiding this comment.
Looks good -- one substantive naming change and a few style nits (full rules here: https://github.com/jemalloc/jemalloc/wiki/Coding-Style).
qemu does not support this, yet [1], and you can get very tricky assert
if you will run program with jemalloc in use under qemu:
<jemalloc>: ../contrib/jemalloc/src/extent.c:1195: Failed assertion: "p[i] == 0"
[1]: https://patchwork.kernel.org/patch/10576637/
Here is a simple example that shows the problem [2]:
// Gist to check possible issues with MADV_DONTNEED
// For example it does not supported by qemu user
// There is a patch for this [1], but it hasn't been applied.
// [1]: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05422.html
#include <sys/mman.h>
#include <stdio.h>
#include <stddef.h>
#include <assert.h>
#include <string.h>
int main(int argc, char **argv)
{
void *addr = mmap(NULL, 1<<16, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
return 1;
}
memset(addr, 'A', 1<<16);
if (!madvise(addr, 1<<16, MADV_DONTNEED)) {
puts("MADV_DONTNEED does not return error. Check memory.");
for (int i = 0; i < 1<<16; ++i) {
assert(((unsigned char *)addr)[i] == 0);
}
} else {
perror("madvise");
}
if (munmap(addr, 1<<16)) {
perror("munmap");
return 1;
}
return 0;
}
### unpatched qemu
$ qemu-x86_64-static /tmp/test-MADV_DONTNEED
MADV_DONTNEED does not return error. Check memory.
test-MADV_DONTNEED: /tmp/test-MADV_DONTNEED.c:19: main: Assertion `((unsigned char *)addr)[i] == 0' failed.
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted (core dumped)
### patched qemu (by returning ENOSYS error)
$ qemu-x86_64 /tmp/test-MADV_DONTNEED
madvise: Success
### patch for qemu to return ENOSYS
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 897d20c076..5540792e0e 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -11775,7 +11775,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
turns private file-backed mappings into anonymous mappings.
This will break MADV_DONTNEED.
This is a hint, so ignoring and returning success is ok. */
- return 0;
+ return ENOSYS;
#endif
#ifdef TARGET_NR_fcntl64
case TARGET_NR_fcntl64:
[2]: https://gist.github.com/azat/12ba2c825b710653ece34dba7f926ece
v2:
- review fixes
- add opt_dont_trust_madvise
v3:
- review fixes
- rename opt_dont_trust_madvise to opt_trust_madvise
59eb08f to
e516b65
Compare
qemu does not support this, yet 1, and you can get very tricky assert
if you will run program with jemalloc in use under qemu:
Here is a simple example that shows the problem 2:
unpatched qemu
patched qemu (by returning ENOSYS error)
patch for qemu to return ENOSYS
Refs: #1844 (comment)
This change is needed for ClickHouse to enable tests of AArch64 builds under QEMU