Disable percpu arena in case of non deterministic CPU count#2
Disable percpu arena in case of non deterministic CPU count#2alexey-milovidov merged 3 commits intochfrom
Conversation
|
What will happen if per-cpu arenas are disabled? |
It may take more time to release the memory to the OS, since jemalloc has per-thread pools, and if it is not active nothing will be released, so it depends on the scheduler. But the difference should not be huge though (default narenas is cpu_cores*2).
|
…enas" This reverts commit 93e27e4.
10655f3 to
afce492
Compare
|
Patch merged to upstream. Also this PR reverts custom patches from jemalloc (93e27e4 and 3b877a1) that was a workaround for percpu arenas, so now we are using almost upstream version, except for our fork based on 5.2.1 while patch had been merged into dev branch (cafe9a3) and we also need one more patch that exists only in dev branch (a943172), so this fork is still required. |
This reverts commit 3b877a1.
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:
<jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
Aborted (core dumped)
Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)
For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:
$ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>
_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it
[1]: lxc/lxcfs#301
_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.
So if all three of these are equal, percpu arenas should work correctly.
And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.
Also note, that you can check is percpu arena really applied using
abort_conf:true.
Refs: jemalloc#1939
Refs: ClickHouse/ClickHouse#32806
v2: move malloc_cpu_count_is_deterministic() into
malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
allocations for readdir()
v3:
- mark cpu_count_is_deterministic static
- check only if percpu arena is enabled
- check narenas
afce492 to
189c070
Compare
We tried to load `g` from `bitmap[i]` before checking it is actually a
valid load. Tweaked a loop a bit to `break` early, when we are done
scanning for bits.
Before this commit undefined behaviour sanitizer from GCC 14+ was
unhappy at `test/unit/bitmap` test with following error.
```
../include/jemalloc/internal/bitmap.h:293:5: runtime error: load of
address 0x7bb1c2e08008 with insufficient space for an object of type
'const bitmap_t'
<...>
#0 0x62671a149954 in bitmap_ffu ../include/jemalloc/internal/bitmap.h:293
#1 0x62671a149954 in test_bitmap_xfu_body ../test/unit/bitmap.c:275
#2 0x62671a14b767 in test_bitmap_xfu ../test/unit/bitmap.c:323
#3 0x62671a376ad1 in p_test_impl ../test/src/test.c:149
#4 0x62671a377135 in p_test ../test/src/test.c:200
jemalloc#5 0x62671a13da06 in main ../test/unit/bitmap.c:336
<...>
```
This commit allows to enable sanitizers with autoconf options, instead
of modifying `CFLAGS`, `CXXFLAGS` and `LDFLAGS` directly.
* `--enable-tsan` option to enable Thread Sanitizer.
* `--enable-ubsan` option to enable Undefined Behaviour Sanitizer.
End goal is to speedup development by finding problems quickly, early
and easier. Eventually, when all current issues will be fixed, we can
enable sanitizers in CI. Fortunately, there are not a lot of problems we
need to fix.
Address Sanitizer is a bit controversial, because it replaces memory
allocator, so we decided to left it out for a while.
Below are couple of examples of how tests look like under different
sanitizers at the moment.
```
$ ../configure --enable-tsan --enable-debug
<...>
asan : 0
tsan : 1
ubsan : 0
$ make -j`nproc` check
<...>
Thread T13 (tid=332043, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x61748)
#1 thd_create ../test/src/thd.c:25 (bin_batching+0x5631ca)
#2 stress_run ../test/unit/bin_batching.c:148
(bin_batching+0x40364c)
#3 test_races ../test/unit/bin_batching.c:249
(bin_batching+0x403d79)
#4 p_test_impl ../test/src/test.c:149 (bin_batching+0x562811)
jemalloc#5 p_test_no_reentrancy ../test/src/test.c:213
(bin_batching+0x562d35)
jemalloc#6 main ../test/unit/bin_batching.c:268 (bin_batching+0x40417e)
SUMMARY: ThreadSanitizer: data race
../include/jemalloc/internal/edata.h:498 in edata_nfree_inc
```
```
$ ../configure --enable-ubsan --enable-debug
<...>
asan : 0
tsan : 0
ubsan : 1
$ make -j`nproc` check
<...>
=== test/unit/hash ===
../test/unit/hash.c:119:16: runtime error: left shift of 176 by 24
places cannot be represented in type 'int'
<...>
```
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:
Number of CPUs can be obtained from the following places:
For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:
_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file 1, and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it
_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.
So if all three of these are equal, percpu arenas should work correctly.
And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.
Also note, that you can check is percpu arena really applied using
abort_conf:true.
Refs: jemalloc#1939
Refs: ClickHouse/ClickHouse#32806
Refs: jemalloc#2181
v2: move malloc_cpu_count_is_deterministic() into
malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
allocations for readdir()