Skip to content

handle mprotect error properly#8

Merged
hanfei1991 merged 2 commits intobuild-libcxx-and-libcxxabi-from-llvm-projectfrom
hanfei/fix-mprotect
Feb 20, 2023
Merged

handle mprotect error properly#8
hanfei1991 merged 2 commits intobuild-libcxx-and-libcxxabi-from-llvm-projectfrom
hanfei/fix-mprotect

Conversation

@hanfei1991
Copy link
Copy Markdown
Member

try to resolve llvm#60815

@hanfei1991 hanfei1991 requested a review from rschu1ze February 17, 2023 12:15
Check(mprotect(Ptr, Size, PROT_READ | PROT_WRITE) == 0,
"Failed to allocate in guarded pool allocator memory");
MaybeSetMappingName(Ptr, Size, kGwpAsanAliveSlotName);
// mprotect might fail because of system limitation. we don't expect a die
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the documented return values / errors of mprotect, there can be errors

  • because the programmer provided mprotect() bad arguments (e.g. EINVAL, EACCES) or
  • because of reasons the programmer has no influence over (e.g. ENOMEM).

While before, every error was treated overly pessimistically (I assume that Check kills the process in Debug builds and probably also in Release builds), we now treat errors overly optimistically. For example, errors of the first category are ignored which doesn't sound very appealing to me.

Question: Do you know more specifically, which type of error this fix addresses? If yes, it would be nice to fix the problem more fine-granularly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this fix is safe because we simply go aground gwp-asan as long as it gets mprotect failure.
tcmalloc just does the same thing without checking errno.https://github.com/google/tcmalloc/blob/5034f8cecdbe559bf24e0ae7f7eb7c10b873ac9e/tcmalloc/guarded_page_allocator.cc#L94

but they think failure when dealloc pointer is unacceptable, so I didn't change other logic

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I am fine with the fix then and I will approve.

Just a few minor things:

  • "we don't expect a die" --> "we don't expect to die".
  • in 3rd-party (contrib) code, it is nice to wrap custom patches in comments, e.g.
/// Start Clickhouse-specific code
...
/// End ClickHouse-specific code

Perhaps that is something you could add?

Btw., to find out due to which error the stress tests failed, you could have made a test PR (marked "wip"/not for merging) where an exception with the actual error code is thrown. These exceptions should (theoretically) surface in the CI stress tests. This will of course only work if the failures occur not too sporadically.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Although the failure occurs very occasionally, it's always good to write a test PR

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if GWP ASan it exceeds VMA limit (which is pretty high by default - 65K), and indeed mprotect likely splits VMA, then it can hit production setup and any following allocation may fail...

So enabling it by default does not sounds like a good idea, though looking at a few production setups I don't see problems with VMA usage there (with GWP ASan included).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitting VMA will increase the map size by at most 2, so it should not be a bug problem. But actually I don't know why CI failed, and right now only fails in debug mode.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and right now only fails in debug mode

Likely because in debug mode MMAP_THRESHOLD is 16K, while in release - 64MiB - https://github.com/ClickHouse/ClickHouse/blob/23281c73482ab9285bf6f592488b2d683dd8abb3/src/Common/Allocator.cpp#L20

// unmapped.
const size_t PageSize = State.PageSize;
allocateInGuardedPool(
bool success = allocateInGuardedPool(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++17 if with initializer.

if (bool success = allocateInGuardedPool(...); !success)
{
    ...
}

Check(mprotect(Ptr, Size, PROT_READ | PROT_WRITE) == 0,
"Failed to allocate in guarded pool allocator memory");
MaybeSetMappingName(Ptr, Size, kGwpAsanAliveSlotName);
// mprotect might fail because of system limitation. we don't expect a die
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I am fine with the fix then and I will approve.

Just a few minor things:

  • "we don't expect a die" --> "we don't expect to die".
  • in 3rd-party (contrib) code, it is nice to wrap custom patches in comments, e.g.
/// Start Clickhouse-specific code
...
/// End ClickHouse-specific code

Perhaps that is something you could add?

Btw., to find out due to which error the stress tests failed, you could have made a test PR (marked "wip"/not for merging) where an exception with the actual error code is thrown. These exceptions should (theoretically) surface in the CI stress tests. This will of course only work if the failures occur not too sporadically.

@hanfei1991 hanfei1991 merged commit a8bf69e into build-libcxx-and-libcxxabi-from-llvm-project Feb 20, 2023
@azat azat mentioned this pull request Mar 17, 2023
rschu1ze added a commit to ClickHouse/ClickHouse that referenced this pull request Mar 19, 2023
(motivated by [0])

PRs [0]/[1] switched the git reference in ClickHouse's main repo to a
feature branch derived from the correct branch
(ClickHouse/releases/15.x). As per this PR, this is now fixed. The
actual code state of llvm-project doesn't change.

[0] ClickHouse/llvm-project#10
[1] ClickHouse/llvm-project#8
[2] #46600
rschu1ze pushed a commit that referenced this pull request Jul 3, 2024
For the following program,
  $ cat t.c
  struct t {
   int (__attribute__((btf_type_tag("rcu"))) *f)();
   int a;
  };
  int foo(struct t *arg) {
    return arg->a;
  }
Compiling with 'clang -g -O2 -S t.c' will cause a failure like below:
  clang: /home/yhs/work/llvm-project/clang/lib/Sema/SemaType.cpp:6391: void {anonymous}::DeclaratorLocFiller::VisitParenTypeLoc(clang::ParenTypeLoc):
         Assertion `Chunk.Kind == DeclaratorChunk::Paren' failed.
  PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
  Stack dump:
  ......
  #5 0x00007f89e4280ea5 abort (/lib64/libc.so.6+0x21ea5)
  #6 0x00007f89e4280d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79)
  #7 0x00007f89e42a6456 (/lib64/libc.so.6+0x47456)
  #8 0x00000000045c2596 GetTypeSourceInfoForDeclarator((anonymous namespace)::TypeProcessingState&, clang::QualType, clang::TypeSourceInfo*) SemaType.cpp:0:0
  #9 0x00000000045ccfa5 GetFullTypeForDeclarator((anonymous namespace)::TypeProcessingState&, clang::QualType, clang::TypeSourceInfo*) SemaType.cpp:0:0
  ......

The reason of the failure is due to the mismatch of TypeLoc and D.getTypeObject().Kind. For example,
the TypeLoc is
  BTFTagAttributedType 0x88614e0 'int  btf_type_tag(rcu)()' sugar
  |-ParenType 0x8861480 'int ()' sugar
  | `-FunctionNoProtoType 0x8861450 'int ()' cdecl
  |   `-BuiltinType 0x87fd500 'int'
while corresponding D.getTypeObject().Kind points to DeclaratorChunk::Paren, and
this will cause later assertion.

To fix the issue, similar to AttributedTypeLoc, let us skip BTFTagAttributedTypeLoc in
GetTypeSourceInfoForDeclarator().

Differential Revision: https://reviews.llvm.org/D136807
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants