Skip to content

Move check of MADV_DONTNEED into runtime (fixes clickhouse under qemu)#18169

Closed
azat wants to merge 2 commits intoClickHouse:masterfrom
azat:MADV_DONTNEED-in-runtime
Closed

Move check of MADV_DONTNEED into runtime (fixes clickhouse under qemu)#18169
azat wants to merge 2 commits intoClickHouse:masterfrom
azat:MADV_DONTNEED-in-runtime

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Dec 16, 2020

Move check of MADV_DONTNEED (that is required by jemalloc) into runtime (fixes clickhouse under qemu)

Refs: #15174
Refs: #15590
Сс: @alexey-milovidov

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

P.S. I've checked it manually

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Dec 16, 2020
@azat azat force-pushed the MADV_DONTNEED-in-runtime branch 3 times, most recently from be41611 to 9774c88 Compare December 16, 2020 22:49
@azat azat force-pushed the MADV_DONTNEED-in-runtime branch from 9774c88 to 8a7027d Compare December 16, 2020 22:51
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

BTW, extern "C" in c source file looks pointless (as it cannot be included in cpp file).

@alexey-milovidov alexey-milovidov self-assigned this Dec 16, 2020
@azat
Copy link
Copy Markdown
Member Author

azat commented Dec 16, 2020

BTW, extern "C" in c source file looks pointless (as it cannot be included in cpp file).

Indeed, this is a copy-paste from glibc-compatibility.c, let's remove it

There is no need in extern "C" when the is not processed as c++ module.
@azat
Copy link
Copy Markdown
Member Author

azat commented Dec 17, 2020

ClickHouse build check — 12/14 builds are OK

Details

https://clickhouse-builds.s3.yandex.net/18169/f57fa06ececd343c5c88e78b71da4fc95346ca8c/clickhouse_build_check/build_log_848560605_1608166238.txt

2020-12-16 23:57:26 FAILED: contrib/llvm/llvm/include/llvm/IR/Attributes.inc 
2020-12-16 23:57:26 cd /build/obj-x86_64-linux-gnu && /build/obj-x86_64-linux-gnu/contrib/llvm/llvm/bin/llvm-tblgen -gen-attrs -I /build/contrib/llvm/llvm/include/llvm/IR -I /build/contrib/llvm/llvm/include /build/contrib/llvm/llvm/include/llvm/IR/Attributes.td -o contrib/llvm/llvm/include/llvm/IR/Attributes.inc -d contrib/llvm/llvm/include/llvm/IR/Attributes.inc.d
2020-12-16 23:57:26 Segmentation fault (core dumped)

https://clickhouse-builds.s3.yandex.net/18169/f57fa06ececd343c5c88e78b71da4fc95346ca8c/clickhouse_build_check/build_log_848560680_1608166240.txt

2020-12-16 23:58:36 FAILED: contrib/llvm/llvm/include/llvm/IR/Attributes.inc 
2020-12-16 23:58:36 cd /build/obj-x86_64-linux-gnu && /build/obj-x86_64-linux-gnu/contrib/llvm/llvm/bin/llvm-tblgen -gen-attrs -I /build/contrib/llvm/llvm/include/llvm/IR -I /build/contrib/llvm/llvm/include /build/contrib/llvm/llvm/include/llvm/IR/Attributes.td -o contrib/llvm/llvm/include/llvm/IR/Attributes.inc -d contrib/llvm/llvm/include/llvm/IR/Attributes.inc.d
2020-12-16 23:58:36 MemorySanitizer:DEADLYSIGNAL
2020-12-16 23:58:36 ==25562==ERROR: MemorySanitizer: SEGV on unknown address 0x50000174350c (pc 0x00000149a72d bp 0x010000000000 sp 0x7ffeb586aba0 T25562)
2020-12-16 23:58:36 ==25562==The signal is caused by a READ memory access.
2020-12-16 23:58:36     #0 0x149a72d in __madvise_safe /build/obj-x86_64-linux-gnu/../base/glibc-compatibility/madvise.c:76:9
2020-12-16 23:58:36     #1 0x149a72d in madvise /build/obj-x86_64-linux-gnu/../base/glibc-compatibility/madvise.c:91:12
2020-12-16 23:58:36     #2 0x3d89ea in __sanitizer::DontDumpShadowMemory(unsigned long, unsigned long) (/build/obj-x86_64-linux-gnu/contrib/llvm/llvm/bin/llvm-tblgen+0x3d89ea)
2020-12-16 23:58:36     #3 0x3cd15a in __msan::InitShadow(bool) (/build/obj-x86_64-linux-gnu/contrib/llvm/llvm/bin/llvm-tblgen+0x3cd15a)
2020-12-16 23:58:36     #4 0x375db5 in __msan_init (/build/obj-x86_64-linux-gnu/contrib/llvm/llvm/bin/llvm-tblgen+0x375db5)
2020-12-16 23:58:36     #5 0x380378 in __interceptor___getrlimit (/build/obj-x86_64-linux-gnu/contrib/llvm/llvm/bin/llvm-tblgen+0x380378)
2020-12-16 23:58:36     #6 0x7f37538e6f88 in __pthread_initialize_minimal_internal (/lib/x86_64-linux-gnu/libpthread.so.0+0x7f88)
2020-12-16 23:58:36 
2020-12-16 23:58:36 MemorySanitizer can not provide additional info.
2020-12-16 23:58:36 SUMMARY: MemorySanitizer: SEGV /build/obj-x86_64-linux-gnu/../base/glibc-compatibility/madvise.c:76:9 in __madvise_safe

The problem is that sanitizer is not "initialized" yet, and sanitizers traits (__tsan_func_entry) called, and no_sanitize attribute will not work, since the code will contain the traits anyway (since this is required for proper tracking). And also note that none of TSAN_OPTIONS will help here too (there is one, that will remove one madvise call, but there is another madvise call anyway)
So that said that you cannot override madvise in this way.
WIll get back with better patch.

@azat azat marked this pull request as draft December 17, 2020 07:22
@azat
Copy link
Copy Markdown
Member Author

azat commented Dec 18, 2020

Closed in favor of #18238

@azat azat closed this Dec 18, 2020
azat added a commit to azat/ClickHouse that referenced this pull request Dec 19, 2020
qemu does not support MADV_DONTNEED, and by not support it simply ignore
it (i.e. return 0 -- no error).

This issue has been "fixed" in ClickHouse#15590, however it just
terminates the process, and completely breaks clickhouse under qemu
(see also ClickHouse#15174).

But there is no need in such strong protection, we can stop using
madvise in jemalloc if MADV_DONTNEED does not work properly.
And this is what ClickHouse#18169 was tried to do (by override madvise), however
this will break sanitizers, at least TSAN and UBSAN.
The problem there is that sanitizers initialization code uses madvise
(and there is no way to turn this off with TSAN_OPTIONS) and overwritten
madvise function will have sanitizers traits (__tsan_func_entry), while
TSAN is not ready for this, and eventually it SIGSEGV.
Interesting thing is that in the recent clang-12, madvise was replaced
with direct syscall [1].

  [1]: llvm/llvm-project@9f8c403

But it is better to make clickhouse compatible with clang < 12 too, so
instead of override madvise completely, the runtime check was moved into
the jemalloc code [2].

  [2]: ClickHouse/jemalloc#1
@azat azat deleted the MADV_DONTNEED-in-runtime branch December 21, 2020 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants