handle mprotect error properly#8
handle mprotect error properly#8hanfei1991 merged 2 commits intobuild-libcxx-and-libcxxabi-from-llvm-projectfrom
Conversation
| Check(mprotect(Ptr, Size, PROT_READ | PROT_WRITE) == 0, | ||
| "Failed to allocate in guarded pool allocator memory"); | ||
| MaybeSetMappingName(Ptr, Size, kGwpAsanAliveSlotName); | ||
| // mprotect might fail because of system limitation. we don't expect a die |
There was a problem hiding this comment.
Looking at the documented return values / errors of mprotect, there can be errors
- because the programmer provided mprotect() bad arguments (e.g. EINVAL, EACCES) or
- because of reasons the programmer has no influence over (e.g. ENOMEM).
While before, every error was treated overly pessimistically (I assume that Check kills the process in Debug builds and probably also in Release builds), we now treat errors overly optimistically. For example, errors of the first category are ignored which doesn't sound very appealing to me.
Question: Do you know more specifically, which type of error this fix addresses? If yes, it would be nice to fix the problem more fine-granularly.
There was a problem hiding this comment.
Some stress tests fail at this point but unfortunately, I don't know why. See https://play.clickhouse.com/play?user=play#c2VsZWN0CnRlc3RfY29udGV4dF9yYXcsCnB1bGxfcmVxdWVzdF9udW1iZXIgYXMgcHIsIGNoZWNrX25hbWUsIHRlc3RfbmFtZSwgdGVzdF9zdGF0dXMgYXMgc3RhdHVzLCByZXBvcnRfdXJsCmZyb20gY2hlY2tzIHdoZXJlIGNoZWNrX3N0YXJ0X3RpbWUgPj0gbm93KCkgLSBJTlRFUlZBTCAyNCBIT1VSIGFuZCB0ZXN0X2NvbnRleHRfcmF3IT0nJwphbmQgcHVsbF9yZXF1ZXN0X251bWJlcj0wIGFuZCBjaGVja19uYW1lIGlsaWtlICclc3RyZXNzJScgYW5kICh0ZXN0X25hbWUgaWxpa2UgJyVmYXRhbCUnIG9yIHRlc3RfbmFtZSBpbGlrZSAnJXNpZ25hbCUnKSBhbmQgdGVzdF9jb250ZXh0X3JhdyBsaWtlICclZ3dwJScKb3JkZXIgYnkgY2hlY2tfc3RhcnRfdGltZSBkZXNj
There was a problem hiding this comment.
But this fix is safe because we simply go aground gwp-asan as long as it gets mprotect failure.
tcmalloc just does the same thing without checking errno.https://github.com/google/tcmalloc/blob/5034f8cecdbe559bf24e0ae7f7eb7c10b873ac9e/tcmalloc/guarded_page_allocator.cc#L94
but they think failure when dealloc pointer is unacceptable, so I didn't change other logic
There was a problem hiding this comment.
Okay, I am fine with the fix then and I will approve.
Just a few minor things:
- "we don't expect a die" --> "we don't expect to die".
- in 3rd-party (contrib) code, it is nice to wrap custom patches in comments, e.g.
/// Start Clickhouse-specific code
...
/// End ClickHouse-specific codePerhaps that is something you could add?
Btw., to find out due to which error the stress tests failed, you could have made a test PR (marked "wip"/not for merging) where an exception with the actual error code is thrown. These exceptions should (theoretically) surface in the CI stress tests. This will of course only work if the failures occur not too sporadically.
There was a problem hiding this comment.
Good suggestion. Although the failure occurs very occasionally, it's always good to write a test PR
There was a problem hiding this comment.
But if GWP ASan it exceeds VMA limit (which is pretty high by default - 65K), and indeed mprotect likely splits VMA, then it can hit production setup and any following allocation may fail...
So enabling it by default does not sounds like a good idea, though looking at a few production setups I don't see problems with VMA usage there (with GWP ASan included).
There was a problem hiding this comment.
splitting VMA will increase the map size by at most 2, so it should not be a bug problem. But actually I don't know why CI failed, and right now only fails in debug mode.
There was a problem hiding this comment.
and right now only fails in debug mode
Likely because in debug mode MMAP_THRESHOLD is 16K, while in release - 64MiB - https://github.com/ClickHouse/ClickHouse/blob/23281c73482ab9285bf6f592488b2d683dd8abb3/src/Common/Allocator.cpp#L20
| // unmapped. | ||
| const size_t PageSize = State.PageSize; | ||
| allocateInGuardedPool( | ||
| bool success = allocateInGuardedPool( |
There was a problem hiding this comment.
C++17 if with initializer.
if (bool success = allocateInGuardedPool(...); !success)
{
...
}| Check(mprotect(Ptr, Size, PROT_READ | PROT_WRITE) == 0, | ||
| "Failed to allocate in guarded pool allocator memory"); | ||
| MaybeSetMappingName(Ptr, Size, kGwpAsanAliveSlotName); | ||
| // mprotect might fail because of system limitation. we don't expect a die |
There was a problem hiding this comment.
Okay, I am fine with the fix then and I will approve.
Just a few minor things:
- "we don't expect a die" --> "we don't expect to die".
- in 3rd-party (contrib) code, it is nice to wrap custom patches in comments, e.g.
/// Start Clickhouse-specific code
...
/// End ClickHouse-specific codePerhaps that is something you could add?
Btw., to find out due to which error the stress tests failed, you could have made a test PR (marked "wip"/not for merging) where an exception with the actual error code is thrown. These exceptions should (theoretically) surface in the CI stress tests. This will of course only work if the failures occur not too sporadically.
(motivated by [0]) PRs [0]/[1] switched the git reference in ClickHouse's main repo to a feature branch derived from the correct branch (ClickHouse/releases/15.x). As per this PR, this is now fixed. The actual code state of llvm-project doesn't change. [0] ClickHouse/llvm-project#10 [1] ClickHouse/llvm-project#8 [2] #46600
For the following program,
$ cat t.c
struct t {
int (__attribute__((btf_type_tag("rcu"))) *f)();
int a;
};
int foo(struct t *arg) {
return arg->a;
}
Compiling with 'clang -g -O2 -S t.c' will cause a failure like below:
clang: /home/yhs/work/llvm-project/clang/lib/Sema/SemaType.cpp:6391: void {anonymous}::DeclaratorLocFiller::VisitParenTypeLoc(clang::ParenTypeLoc):
Assertion `Chunk.Kind == DeclaratorChunk::Paren' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
......
#5 0x00007f89e4280ea5 abort (/lib64/libc.so.6+0x21ea5)
#6 0x00007f89e4280d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79)
#7 0x00007f89e42a6456 (/lib64/libc.so.6+0x47456)
#8 0x00000000045c2596 GetTypeSourceInfoForDeclarator((anonymous namespace)::TypeProcessingState&, clang::QualType, clang::TypeSourceInfo*) SemaType.cpp:0:0
#9 0x00000000045ccfa5 GetFullTypeForDeclarator((anonymous namespace)::TypeProcessingState&, clang::QualType, clang::TypeSourceInfo*) SemaType.cpp:0:0
......
The reason of the failure is due to the mismatch of TypeLoc and D.getTypeObject().Kind. For example,
the TypeLoc is
BTFTagAttributedType 0x88614e0 'int btf_type_tag(rcu)()' sugar
|-ParenType 0x8861480 'int ()' sugar
| `-FunctionNoProtoType 0x8861450 'int ()' cdecl
| `-BuiltinType 0x87fd500 'int'
while corresponding D.getTypeObject().Kind points to DeclaratorChunk::Paren, and
this will cause later assertion.
To fix the issue, similar to AttributedTypeLoc, let us skip BTFTagAttributedTypeLoc in
GetTypeSourceInfoForDeclarator().
Differential Revision: https://reviews.llvm.org/D136807
try to resolve llvm#60815