Skip to content

LF allocator#7

Merged
KochetovNicolai merged 10 commits intomasterfrom
lf-allocator
Dec 12, 2020
Merged

LF allocator#7
KochetovNicolai merged 10 commits intomasterfrom
lf-allocator

Conversation

@KochetovNicolai
Copy link
Copy Markdown
Member

No description provided.

@KochetovNicolai
Copy link
Copy Markdown
Member Author

PR with tests: ClickHouse/ClickHouse#17965

@akuzm
Copy link
Copy Markdown

akuzm commented Dec 11, 2020

What's the purpose of this patch?

@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Dec 11, 2020

@akuzm

libunwind should not use malloc to be signal safe.
It also should not use mmap for every allocation as it is too slow.
It can use pre-allocated buffer on stack, but it cannot be large (for coroutines) and cannot be small (it will be not enough).
It also cannot use thread-local preallocated buffer because of coroutines.

It can use pre-allocated global buffer, but thread synchronization is required.
It cannot use mutex because of signal safety.
And it cannot use standard C++ library headers because of libunwind is used to build C++.


static bool atomic_compare_exchange_weak_wrapper(void * _Atomic & ptr, void *& expected, void * desired)
{
#ifdef __GNUC__
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang++ defines GNUC:

$ clang++ -E -dM -xc++ - <<< '' | grep GNUC
#define __GNUC_GNU_INLINE__ 1
#define __GNUC_MINOR__ 2
#define __GNUC_PATCHLEVEL__ 1
#define __GNUC__ 4


private:
static constexpr size_t entrySize = sizeof(FreeListEntry);
static constexpr size_t buffer_size = LIBUNWIND_MAX_STACK_SIZE * entrySize;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically it can fail on concurrent unwinding of very long stacks from multiple threads.
But it's Ok.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? 1024 / 32 (our usual stack size limit) = 128 threads getting stack trace simultaneously, and if we're over that, we abort. Combine that with query profiler running on a large server, and it doesn't look good.

In general, the previous code had hard guarantee that the abort() doesn't happen if the program is correct, but this one doesn't. So it will happen.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added mmap for buffer and changed total limit to 1024 * 1024 frames

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it complete -- either we shoud mmap enough memory so that the allocation never fails (may be inconvenient to push the server settings here?), or we can add another mmap in case there was not enough memory. If we use the latter option, we should set the initial size low enough, so that this logic is tested. Maybe we can lower the initial size in debug even more.

In case we add second mmap, aborting on its errors looks bad -- i.e. we're out of mappings, several queries fail and start to unwind stack, and then mmap for stack also fails, and we abort. Maybe we can return empty stacks in this case?

Copy link
Copy Markdown

@akuzm akuzm Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a third option, simlified -- allocate a static buffer once (maybe slightly larger than 1024 entries), but don't abort if we're out of memory, just truncate the stack. We might see some truncated stacks in profiler from time to time, but this is probably not very important.

@akuzm
Copy link
Copy Markdown

akuzm commented Dec 11, 2020

@akuzm

libunwind should not use malloc to be signal safe.
It also should not use mmap for every allocation as it is too slow.
It can use pre-allocated buffer on stack, but it cannot be large (for coroutines) and cannot be small (it will be not enough).
It also cannot use thread-local preallocated buffer because of coroutines.

It can use pre-allocated global buffer, but thread synchronization is required.
It cannot use mutex because of signal safety.
And it cannot use standard C++ library headers because of libunwind is used to build C++.

Nice, I guessed almost correctly. Let's put this into comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants