Skip to content

MADV_DONTNEED check in runtime for qemu (via patching jemalloc)#18238

Merged
alexey-milovidov merged 3 commits intoClickHouse:masterfrom
azat:jemalloc-MADV_DONTNEED-runtime-check
Dec 21, 2020
Merged

MADV_DONTNEED check in runtime for qemu (via patching jemalloc)#18238
alexey-milovidov merged 3 commits intoClickHouse:masterfrom
azat:jemalloc-MADV_DONTNEED-runtime-check

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Dec 18, 2020

qemu does not support MADV_DONTNEED, and by not support it simply ignore
it (i.e. return 0 -- no error).

This issue has been "fixed" in #15590, however it just
terminates the process, and completely breaks clickhouse under qemu
(see also #15174).

But there is no need in such strong protection, we can stop using
madvise in jemalloc if MADV_DONTNEED does not work properly.
And this is what #18169 was tried to do (by override madvise), however
this will break sanitizers, at least TSAN and UBSAN.
The problem there is that sanitizers initialization code uses madvise
(and there is no way to turn this off with TSAN_OPTIONS) and overwritten
madvise function will have sanitizers traits (__tsan_func_entry), while
TSAN is not ready for this, and eventually it SIGSEGV.
Interesting thing is that in the recent clang-12, madvise was replaced
with direct syscall 1.

But it is better to make clickhouse compatible with clang < 12 too, so
instead of override madvise completely, the runtime check was moved into
the jemalloc code 2.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Cc: @alexey-milovidov

P.S. I will keep this as draft for a couple of days, to wait for feedback from jemalloc maintainers

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Dec 18, 2020
@azat azat marked this pull request as draft December 18, 2020 22:17
@robot-clickhouse robot-clickhouse added the submodule changed At least one submodule changed in this PR. label Dec 18, 2020
@alexey-milovidov alexey-milovidov self-assigned this Dec 18, 2020
@azat
Copy link
Copy Markdown
Member Author

azat commented Dec 19, 2020

ClickHouse build check Pending — Started

Does the build hung? Or it will be started eventually?

qemu does not support MADV_DONTNEED, and by not support it simply ignore
it (i.e. return 0 -- no error).

This issue has been "fixed" in ClickHouse#15590, however it just
terminates the process, and completely breaks clickhouse under qemu
(see also ClickHouse#15174).

But there is no need in such strong protection, we can stop using
madvise in jemalloc if MADV_DONTNEED does not work properly.
And this is what ClickHouse#18169 was tried to do (by override madvise), however
this will break sanitizers, at least TSAN and UBSAN.
The problem there is that sanitizers initialization code uses madvise
(and there is no way to turn this off with TSAN_OPTIONS) and overwritten
madvise function will have sanitizers traits (__tsan_func_entry), while
TSAN is not ready for this, and eventually it SIGSEGV.
Interesting thing is that in the recent clang-12, madvise was replaced
with direct syscall [1].

  [1]: llvm/llvm-project@9f8c403

But it is better to make clickhouse compatible with clang < 12 too, so
instead of override madvise completely, the runtime check was moved into
the jemalloc code [2].

  [2]: ClickHouse/jemalloc#1
@azat azat force-pushed the jemalloc-MADV_DONTNEED-runtime-check branch from d34f0bf to e379b80 Compare December 19, 2020 12:36
@alexey-milovidov
Copy link
Copy Markdown
Member

Performance: 7340839

@alexey-milovidov
Copy link
Copy Markdown
Member

Functional stateless tests flaky check (address) — fail: 0, passed: 0, skipped: 100

It's marked as failed due to "passed: 0", actually it's Ok.

@alexey-milovidov
Copy link
Copy Markdown
Member

Yandex ... check

Don't care.

@alexey-milovidov alexey-milovidov marked this pull request as ready for review December 20, 2020 04:13
@alexey-milovidov
Copy link
Copy Markdown
Member

@azat I think it is ready to merge.

@azat
Copy link
Copy Markdown
Member Author

azat commented Dec 20, 2020

@azat I think it is ready to merge.

@alexey-milovidov this should be a correct fix, however I want to wait for the feedback from maintainers, so if i.e. there will some wish to implement it in a different way or some cosmetic changes would rebase the backported patch (to avoid further conflicts and similar issues)

So I'm suggesting wait for at least couple of work days (ideally week), and merge

@alexey-milovidov
Copy link
Copy Markdown
Member

A link to upstream PR: jemalloc/jemalloc#2005

@alexey-milovidov
Copy link
Copy Markdown
Member

@azat Let's add a reference to the description of upstream PR, like this:

This change is needed for [ClickHouse](https://github.com/ClickHouse/ClickHouse/) to enable tests of [AArch64 builds under QEMU](https://github.com/ClickHouse/ClickHouse/issues/15174).

@azat
Copy link
Copy Markdown
Member Author

azat commented Dec 21, 2020

@azat Let's add a reference to the description of upstream PR, like this:

Sure, done (although this does not show the whole problem - any program with jemalloc under qemu is broken now)

@alexey-milovidov
Copy link
Copy Markdown
Member

So I'm suggesting wait for at least couple of work days (ideally week), and merge

We can always change our mind later.

@alexey-milovidov alexey-milovidov merged commit d665062 into ClickHouse:master Dec 21, 2020
@azat azat deleted the jemalloc-MADV_DONTNEED-runtime-check branch December 21, 2020 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants