Skip to content

CI: combine ASan+UBSan into single builds, remove UBSan#99657

Merged
fm4v merged 32 commits intomasterfrom
nik/asan-ubsan-combined
Mar 20, 2026
Merged

CI: combine ASan+UBSan into single builds, remove UBSan#99657
fm4v merged 32 commits intomasterfrom
nik/asan-ubsan-combined

Conversation

@fm4v
Copy link
Copy Markdown
Member

@fm4v fm4v commented Mar 16, 2026

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Combines ASan and UBSan sanitizer builds (AMD_ASAN_UBSAN, ARM_ASAN_UBSAN) since they are fully compatible in clang (-fsanitize=address,undefined). Removes the standalone AMD_UBSAN build (now redundant) and disables the ARM_TSAN build job (data races are architecture-independent, AMD_TSAN is sufficient).

…d ARM TSan

- Replace `AMD_ASAN` with `AMD_ASAN_UBSAN` (`-DSANITIZE=address,undefined`) and add `ENABLE_BUZZHOUSE=1`
- Replace `ARM_ASAN` with `ARM_ASAN_UBSAN` (`-DSANITIZE=address,undefined`) and add `ENABLE_BUZZHOUSE=1`
- Remove `AMD_UBSAN` build and all jobs that depended on it (redundant now that UBSan is included in both ASAN builds)
- Disable `ARM_TSAN` build job (commented out; `AMD_TSAN` covers data races which are architecture-independent)
- Add `address,undefined` branch to `cmake/sanitize.cmake` combining ASan and UBSan flags
- Regenerate `.github/workflows/` via `python3 -m ci.praktika yaml`

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
@fm4v fm4v changed the title CI: combine ASan+UBSan into single builds, remove standalone UBSan and ARM TSan CI: combine ASan+UBSan into single builds, remove UBSan Mar 16, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 16, 2026

Workflow [PR], commit [8e04136]

Summary:

job_name test_name status info comment
Integration tests (amd_asan_ubsan, targeted) failure
test_rocksdb_read_only/test.py::test_dirctory_missing_after_stop[10-10] FAIL cidb IGNORED
test_rocksdb_read_only/test.py::test_dirctory_missing_after_stop[2-10] FAIL cidb IGNORED
test_rocksdb_read_only/test.py::test_read_only[9-10] FAIL cidb IGNORED
test_rocksdb_read_only/test.py::test_dirctory_missing_after_stop[7-10] FAIL cidb IGNORED
test_rocksdb_read_only/test.py::test_read_only[1-10] FAIL cidb IGNORED
test_rocksdb_read_only/test.py::test_dirctory_missing_after_stop[9-10] FAIL cidb IGNORED
test_rocksdb_read_only/test.py::test_read_only[3-10] FAIL cidb IGNORED
test_rocksdb_read_only/test.py::test_dirctory_missing_after_stop[4-10] FAIL cidb IGNORED
test_rocksdb_read_only/test.py::test_read_only[6-10] FAIL cidb IGNORED
test_rocksdb_read_only/test.py::test_dirctory_missing_after_stop[8-10] FAIL cidb IGNORED
8 more test cases not shown
Stateless tests (amd_debug, parallel) failure
04004_integral_col_comparison_with_float_key_condition FAIL cidb, issue ISSUE EXISTS
AST fuzzer (amd_tsan) failure
ThreadSanitizer: data race (STID: 1367-305e) FAIL cidb, issue ISSUE EXISTS
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: (STID: 1941-1bfa) FAIL cidb, issue ISSUE EXISTS

@clickhouse-gh clickhouse-gh bot added the pr-ci label Mar 16, 2026
@alexey-milovidov alexey-milovidov self-assigned this Mar 16, 2026
fm4v and others added 2 commits March 16, 2026 22:00
- Add address,undefined branch to build_clang_builtin.cmake to include
  both ASan (asan_static, asan, asan_cxx) and UBSan (ubsan_standalone,
  ubsan_standalone_cxx) runtimes when linking host tools such as protoc.
  Without this, the linker fails with undefined __asan_* and __ubsan_*
  symbols because -fno-sanitize-link-runtime is applied but no runtime
  is provided.
- Add address,undefined branch to cmake/sanitize.cmake combining ASan
  and UBSan compiler flags.
- Fix collect_build_flags in clickhouse-test to parse all sanitizer
  values from -fsanitize=... instead of using elif chain, so both asan
  and ubsan tags are set for combined builds.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
libclang_rt.asan and libclang_rt.ubsan_standalone both contain
sanitizer_common symbols, causing duplicate symbol errors when linked
together. When combining -fsanitize=address,undefined, clang uses the
ASan runtime to cover UBSan instrumentation as well, so only the ASan
runtime libraries (asan_static, asan, asan_cxx) should be linked.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
*[
Job.ParamSet(
parameter=f"amd_asan, distributed plan, parallel, {batch}/{total_batches}",
parameter=f"amd_asan_ubsan, distributed plan, parallel, {batch}/{total_batches}",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rename breaks the dependency filter in ci/workflows/pull_request.py.

FUNCTIONAL_TESTS_PARALLEL_BLOCKING_JOB_NAMES still matches "_asan, distributed plan, parallel", so the renamed jobs (amd_asan_ubsan, distributed plan, parallel) are no longer included. I validated this locally: both amd_asan_ubsan parallel jobs are now excluded from the blocking list.

Impact: release/special/integration jobs can start without waiting for these heavy parallel sanitizer tests, which can increase CI contention and make scheduling behavior inconsistent.

Please update the filter in ci/workflows/pull_request.py to match "_asan_ubsan, distributed plan, parallel" (or use a sanitizer-agnostic predicate).

contrib/CMakeLists.txt: extend the UBSan suppression for third-party
libraries to also cover the combined address,undefined sanitizer. The
old check (SANITIZE STREQUAL "undefined") didn't match the new value,
causing OpenSSL and other contrib libraries to be compiled with UBSan,
which aborted ClickHouse at startup due to type-mismatched function
pointer calls in OpenSSL (ossl_bsearch).

ci/workflows/pull_request.py: fix FUNCTIONAL_TESTS_PARALLEL_BLOCKING_JOB_NAMES
predicate from "_asan, distributed plan, parallel" to
"_asan_ubsan, distributed plan, parallel" after the rename, so that
release/special/integration jobs correctly wait for the heavy parallel
sanitizer tests before starting.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
@clickhouse-gh clickhouse-gh bot added the submodule changed At least one submodule changed in this PR. label Mar 17, 2026
contrib/CMakeLists.txt: extend the UBSan suppression for third-party
libraries to also cover the combined address,undefined sanitizer. The
old check (SANITIZE STREQUAL "undefined") didn't match the new value,
causing OpenSSL and other contrib libraries to be compiled with UBSan,
which aborted ClickHouse at startup due to type-mismatched function
pointer calls in OpenSSL (ossl_bsearch).

contrib/isa-l-cmake, lz4-cmake, arrow-cmake: extend the same
STREQUAL "undefined" guards that strip -fno-sanitize=undefined from
NASM/target compile options to also cover address,undefined. Without
this, nasm receives -fno-sanitize=undefined and interprets it as an
output format, causing a fatal build error.

ci/workflows/pull_request.py: fix FUNCTIONAL_TESTS_PARALLEL_BLOCKING_JOB_NAMES
predicate from "_asan, distributed plan, parallel" to
"_asan_ubsan, distributed plan, parallel" after the rename, so that
release/special/integration jobs correctly wait for the heavy parallel
sanitizer tests before starting.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
ArtifactNames.CH_ARM_TSAN,
],
runs_on=RunnerLabels.ARM_LARGE,
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ extra_validation_build_jobs no longer contains BuildTypes.ARM_TSAN, but ArtifactConfigs.clickhouse_binaries still includes CH_ARM_TSAN (from ci/defs/defs.py). This leaves CH_ARM_TSAN without a producer and drops ARM TSAN signal for PR/branch workflows.

Please either restore the ARM_TSAN Job.ParamSet here, or remove/update all CH_ARM_TSAN artifact wiring consistently in the same PR.

fm4v and others added 4 commits March 17, 2026 11:34
The build_llvm_coverage_job had no dependency set, so it could start
immediately without waiting for style check, fast test and tidy builds,
unlike all other build jobs.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
The combined ASan+UBSan binary is larger than pure ASan — each
clickhouse-client process uses ~350MB RSS. Under parallel test mode
with multiple concurrent clients the 5GB cgroup limit is exceeded,
causing OOM kills. Raise it to 10GB for asan_ubsan jobs only.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
The is_coverage/JIT settings block was accidentally included from the
working tree. Only the collect_build_flags refactoring (parsing all
sanitizer values from -fsanitize=... instead of elif chain) belongs
to this branch.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
`clickhouse local` (used for post-test log collection) was aborting at
exit under ASan+LSan with 45 KB of leaks from OpenSSL provider
initialization (`ossl_provider_new`, `OSSL_PROVIDER_try_load_ex`, etc.).
OpenSSL 3.x intentionally does not free all global state at exit, so
these are false positives.

Added `__lsan_default_suppressions` alongside the existing
`__lsan_default_options` in `programs/main.cpp` to suppress the five
known OpenSSL leak patterns. This fixes the "Unknown error / Lost
connection to server" failures in:
  - Stateless tests (amd_asan_ubsan, distributed plan, parallel, 1/2)
  - Stateless tests (amd_asan_ubsan, distributed plan, parallel, 2/2)
  - Stateless tests (arm_asan_ubsan, azure, sequential)

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=99657&sha=24b9445c3f9c8dfc420b55f1a2fb369ae17220be&name_0=PR

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
flag = sanitize_flag_map.get(san)
if flag is not None:
result.append(flag)
SANITIZED = bool(result)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SANITIZED here is assigned without global SANITIZED, so this write only updates a local variable inside collect_build_flags.

kill_process_group reads the module-level SANITIZED flag, which therefore stays False and skips the 60s wait before termination. In sanitizer jobs this can truncate sanitizer reports.

Please add global SANITIZED in collect_build_flags (next to global RELEASE_NON_SANITIZED) before assigning it.

fm4v and others added 3 commits March 18, 2026 00:15
The `check_fatal_messages_in_logs` function uses `sed -n '/.*anitizer/,$p'`
to detect sanitizer output in `stderr*.log`. When ASan runs on a machine
where the stack address space wraps around, it emits:

  ==N==WARNING: ASan is ignoring requested __asan_handle_no_return:
    stack type: default; top: 0x...; bottom 0x...; size: 0x...
  False positive error reports may follow
  For details see google/sanitizers#189

These lines match `/.*anitizer/` but were not excluded by the existing
filter (which only excluded the `makecontext/swapcontext` line). As a
result `parse_failure()` found no actionable error pattern and returned
the fallback "Unknown error / Lost connection to server", causing
stateless test jobs to fail even when all tests passed.

Add the two additional boilerplate lines to the exclusion filter.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
Two issues remained after the previous fix:

1. The ASan stack-size warning block (4 lines) was only partially
   filtered in `clickhouse_proc.py`. The last line:
     "For details see google/sanitizers#189"
   was not excluded, leaving `sanitizer_hits` non-empty and triggering
   a spurious "Unknown error / Lost connection to server" result for
   jobs that had no real sanitizer errors. Add the missing exclusion.

2. OpenSSL's EVP method cache (`evp_md_new`, `construct_evp_method`,
   `CRYPTO_THREAD_lock_new`) is never freed at process exit. These
   objects are allocated when the S3 client initializes HMAC via
   `AWS SDK -> HMAC_Init_ex -> EVP_MD_fetch`. LSan reports them as
   leaks, causing the server to abort with SIGABRT (`halt_on_error=1`),
   which `clickhouse-test` records as "Unknown error / Lost connection
   to server". Add suppressions to `__lsan_default_suppressions`.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
fm4v and others added 7 commits March 18, 2026 11:58
…ation tests

Combining ASan and UBSan into a single build (PR #99657) runs UBSan in
integration tests for the first time, exposing two pre-existing issues:

1. Arrow IPC buffers (`array_binary.h`): `BaseBinaryArray::value_offset`
   reads offset arrays that may not be 4-byte aligned relative to the IPC
   buffer base. This is intentional in the Arrow format and safe on x86 —
   suppress the misaligned-load check for that file.

2. EmbeddedRocksDB full scan (`BufferBase::BufferBase`): when iterating over
   a RocksDB table in read-only restart mode, `rocksdb::Slice::size()` returns
   `0xFFFFFFFFFFFFFFFE` (corrupted), causing pointer arithmetic to overflow in
   `BufferBase`. This is a real pre-existing memory corruption bug specific to
   `test_rocksdb_read_only`; suppress here until it is fixed.
   Issue: #99862

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
…fect

The previous suppressions were ineffective:

1. `src:*/contrib/arrow/cpp/src/arrow/array/array_binary.h*` - the path
   glob did not match the absolute path used during compilation; broadened
   to `src:*/contrib/arrow/*` to cover all Arrow contrib sources.

2. `fun:*DB::BufferBase::BufferBase*` - per the file's own caveats, `fun:`
   does not work for inlined functions, and `BufferBase::BufferBase` is a
   header-only constructor that is always inlined at call sites. Replaced
   with `src:*/src/IO/BufferBase.h`.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
UBSan's compile-time `src:` suppression for inlined functions must target
the TRANSLATION UNIT (.cpp file) where the inlined code is compiled, not
the header file where it is defined. All the inlined constructors in the
call chain compile down to instructions attributed to the outer .cpp file.

- Arrow: `arrow::BaseBinaryArray::value_offset` (misaligned load in
  `array_binary.h`) is inlined into `ArrowColumnToCHColumn.cpp`.
  Suppress that translation unit.

- RocksDB: `BufferBase::BufferBase` and friends (pointer overflow) are
  inlined through `ReadBufferFromString` → `KVStorageUtils.h` into
  `StorageEmbeddedRocksDB.cpp`. Suppress that translation unit.
  Issue: #99862

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
The ubsan_ignorelist.txt approach was ineffective because sccache keys
its object-file cache on compiler arguments (including the ignorelist
PATH) but not on the ignorelist FILE CONTENT. Changing the ignorelist
without changing source files serves stale cached objects, leaving the
UBSan instrumentation in place.

Fix by adding NO_SANITIZE_UNDEFINED directly to the affected functions,
which changes the preprocessed source content and forces sccache to
recompile:

1. ArrowColumnToCHColumn.cpp: `readColumnWithStringData<ArrowArray>` —
   Arrow IPC offset arrays may be unaligned (misaligned-load). This is
   intentional in the Arrow IPC format and safe on x86.

2. StorageEmbeddedRocksDB.cpp: `EmbeddedRocksDBSource::generateFullScan`
   — inlined BufferBase pointer arithmetic overflows when a RocksDB slice
   returns a corrupted size. Pre-existing bug tracked in:
   #99862

Also revert the now-unnecessary src: entries added to ubsan_ignorelist.txt
in the previous attempts.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
…geEmbeddedRocksDB

Source annotations (NO_SANITIZE_UNDEFINED) were ineffective because the
UBSan checks for inlined callees are attributed to the callee's source
location, not the annotated caller. ubsan_ignorelist.txt was also
ineffective due to sccache not including ignorelist file content in its
cache key.

Fix: use set_source_files_properties in CMakeLists.txt to pass
-fno-sanitize=<check> flags for specific translation units. Changing
compiler flags is part of sccache's cache key, guaranteeing a recompile.

- ArrowColumnToCHColumn.cpp: -fno-sanitize=alignment
  Arrow IPC offset arrays are intentionally unaligned (packed int32).
  The misaligned read is safe on x86.

- StorageEmbeddedRocksDB.cpp: -fno-sanitize=pointer-overflow
  Pre-existing bug: corrupted RocksDB slice (size = 0xFFFFFFFFFFFFFFFE)
  causes pointer arithmetic overflow in BufferBase when iterating in
  read-only mode. Tracked in:
  #99862

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
Using relative paths in `set_source_files_properties` was not reliably
targeting the correct source files. Switch to `${CMAKE_CURRENT_SOURCE_DIR}`
absolute paths to ensure the suppression flags are applied.

Also remove `NO_SANITIZE_UNDEFINED` source annotations from
`ArrowColumnToCHColumn.cpp` and `StorageEmbeddedRocksDB.cpp` now that
the CMake approach is fixed.

Remove `amd_release` and `arm_asan_ubsan` targeted integration test
job configs from `job_configs.py` as these artifact variants don't
exist yet.

CI: #99657

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
…ol function

Convert coverage addresses to file offsets by subtracting the binary's load base
so that dumps are portable across processes with different ASLR bases. This matches
how `addressToSymbol` works: the fallback path in `SymbolIndex::findSymbol` treats
values as raw file offsets when they're not in any mapped object.

Add `SymbolIndex::diagnose` to explain why symbol resolution succeeds or fails:
returns `"found:<name>"`, `"no_object"`, or `"no_symbol[object=<path>:offset=0x<hex>]"`.

Add `debugAddressToSymbol` SQL function backed by `diagnose`, useful for
investigating why `coverageCurrent()` produces empty symbols.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
fm4v and others added 6 commits March 19, 2026 09:33
…mands

Add stateless tests to cover previously-untested code paths:
- `04035_json_extract_null_handling`: all `JSONExtract` null-value paths in
  `JSONExtractTree.cpp` (numeric, string, UUID, date/time, decimal, IP, enum,
  array, tuple nodes) and the `jsonElementToString` null case
- `04036_system_commands_coverage`: `SYSTEM` commands that had no test coverage
  (`CLEAR_PRIMARY_INDEX_CACHE`, `CLEAR_PAGE_CACHE`, `START/STOP_THREAD_FUZZER`,
  `RELOAD_FUNCTIONS`, `RELOAD_EMBEDDED_DICTIONARIES`)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
When targeted integration tests run with `--count 10 --random-order`,
the same test functions execute 10 times in a single cluster session.
`test_read_only` and `test_dirctory_missing_after_stop` were not
idempotent: a failed run would leave tables or RocksDB lock files behind,
causing all subsequent runs to fail:

- After a clean run of `test_read_only`, the directory
  `/var/lib/clickhouse/user_files/test_rocksdb_read_only` persists on
  disk (dropping a table does not remove its RocksDB directory). In the
  next run, `CREATE TABLE ... read_only=1` on that path succeeds instead
  of raising, because a valid database now exists in the directory.
- A mid-run failure in either test can leave a table alive, keeping the
  RocksDB LOCK file held by the server process. Subsequent runs then fail
  with "lock hold by current process".

Fix: add cleanup at the start of each test that drops any leftover tables
and removes the RocksDB directories before the test body runs.

CI: #99657

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
requires=[ArtifactNames.UNITTEST_AMD_UBSAN],
),
)
stress_test_jobs = common_stress_job_config.parametrize(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stress_test_jobs matrix was narrowed more than the PR scope describes: this hunk drops amd_release and the non-s3 ARM sanitizer stress variant, leaving only amd_debug, amd_tsan, amd_msan, and arm_asan_ubsan, s3.

That reduces stress coverage for release binaries and plain ARM sanitizer runs across all workflows that consume JobConfigs.stress_test_jobs (PR, master, release). Please either restore those jobs, or document why this coverage reduction is intentional.

fm4v and others added 4 commits March 19, 2026 10:42
When targeted integration tests run with --count 10, pytest-xdist spawns
multiple workers that all reuse the same ClickHouseCluster (same Docker
container, same server process).  RocksDB keeps a per-process in-memory
lock map: once a path is open, any concurrent Open() from another thread
in the same server fails with "lock hold by current process".

Fix: derive a per-worker suffix from PYTEST_XDIST_WORKER so each worker
uses distinct RocksDB paths and table names.  Also add cleanup at the
start of each test function to handle leftover state from a failed
previous run on the same worker (sequential reruns within one worker).

CI: #99657

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
When a Buffer is default-constructed (_ptr = nullptr, _used = 0),
calling setCapacity with newCapacity > 0 and preserveContent = true
computes newSz = 0 and then calls memcpy(dst, nullptr, 0).

Passing a null pointer to memcpy is undefined behaviour even when size
is 0 (the nonnull attribute on argument 2 applies regardless of size).
UBSan correctly flags this and with abort_on_error=1 the ClickHouse
server process aborts during Poco::CompressedLogFile construction
before binding port 9000, causing integration tests to time out.

Fix: skip the memcpy when newSz == 0.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
@fm4v
Copy link
Copy Markdown
Member Author

fm4v commented Mar 19, 2026

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=99657&sha=4d99a63a7fba01902f9f534ec95fef589c0861b3&name_0=PR

     /ClickHouse/base/poco/Foundation/include/Poco/Buffer.h:158:38: runtime error: null pointer passed as argument 2, which is declared to never be null
     /ClickHouse/cmake/linux/../../contrib/sysroot/linux-x86_64/x86_64-linux-gnu/libc/usr/include/string.h:43:28: note: nonnull attribute specified here
         #0 0x558022d70634 in Poco::Buffer<char>::setCapacity(unsigned long, bool) ci/tmp/build/./base/poco/Foundation/include/Poco/Buffer.h:158:21
         #1 0x558022d6f423 in Poco::CompressedLogFile::CompressedLogFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)
     ci/tmp/build/./base/poco/Foundation/src/CompressedLogFile.cpp:32:10
         #2 0x558022d5a974 in Poco::FileChannel::newLogFile() ci/tmp/build/./base/poco/Foundation/src/FileChannel.cpp:449:14
         #3 0x558022d59c37 in Poco::FileChannel::unsafeOpen() ci/tmp/build/./base/poco/Foundation/src/FileChannel.cpp:341:12
         #4 0x558022d59ac1 in Poco::FileChannel::open() ci/tmp/build/./base/poco/Foundation/src/FileChannel.cpp:93:2
         #5 0x558015f9dc8a in Loggers::buildLoggers(Poco::Util::AbstractConfiguration&, Poco::Logger&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)
     ci/tmp/build/./src/Loggers/Loggers.cpp:156:19
         #6 0x5580038d5454 in BaseDaemon::initialize(Poco::Util::Application&) ci/tmp/build/./src/Daemon/BaseDaemon.cpp:395:5
         #7 0x557ff8a0cdfe in DB::Server::initialize(Poco::Util::Application&) ci/tmp/build/./programs/server/Server.cpp:720:17
         #8 0x558022f804de in Poco::Util::Application::run() ci/tmp/build/./base/poco/Util/src/Application.cpp:310:2
         #9 0x557ff8a0ca7c in DB::Server::run() ci/tmp/build/./programs/server/Server.cpp:714:25
         #10 0x557ff8a080aa in mainEntryClickHouseServer(int, char**) ci/tmp/build/./programs/server/Server.cpp:501:20
         #11 0x557fe3f7c5dd in main ci/tmp/build/./programs/main.cpp:415:21
         #12 0x7f07fbe78d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
         #13 0x7f07fbe78e3f in __libc_start_main csu/../csu/libc-start.c:392:3
         #14 0x557fe3e95f2d in _start (/usr/bin/clickhouse+0x19396f2d) (BuildId: feeb10e06928d9c6aa9918b30c0aa3745d5b58e4)

     SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /ClickHouse/base/poco/Foundation/include/Poco/Buffer.h:158:38

● Yes, it's a real (though benign) bug in ClickHouse's vendored Poco. Look at line 158:

std::memcpy(ptr, _ptr, newSz * sizeof(T));

The scenario:

  • Buffer was default-constructed → _ptr = nullptr, _used = 0, _capacity = 0
  • setCapacity is called with newCapacity > 0, preserveContent = true
  • newSz = min(_used=0, newCapacity) = 0
  • → memcpy(ptr, nullptr, 0) — null pointer to memcpy, even with size 0

The C standard says memcpy arguments must be valid pointers regardless of size (the nonnull attribute on argument 2 confirms this). So UBSan is correct to flag it. In practice it's harmless, but with
abort_on_error=1 the server dies before binding port 9000.

The fix is a one-liner in base/poco/Foundation/include/Poco/Buffer.h:

Fix: 9bd11e0

fm4v and others added 2 commits March 19, 2026 14:35
All test jobs previously using amd_asan_ubsan now use arm_asan_ubsan:
- stateless tests flaky check
- functional tests distributed plan parallel/sequential
- integration tests db disk (master + required + flaky + targeted)

Corresponding runner labels and artifact names updated to ARM variants.
backport_branches.py build filter extended to include arm_asan_ubsan.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 19, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.80% +0.00%
Functions 23.90% 23.90% +0.00%
Branches 76.30% 76.30% +0.00%

PR changed lines: PR changed-lines coverage: N/A (no coverable changed lines found in .info).
Diff coverage report
Uncovered code

@fm4v
Copy link
Copy Markdown
Member Author

fm4v commented Mar 19, 2026

10 jobs fewer actually

Net result: 32 removed, 21 added = 11 fewer jobs in pull_request.yml.

  The PR collapses separate ASan + UBSan builds/tests into combined ASan+UBSan ones:

  ┌───────────────────┬───────────────────────────────────────────────────┬────────────────────────────────────────────┬───────┐
  │     Category      │                      Removed                      │                   Added                    │ Delta │
  ├───────────────────┼───────────────────────────────────────────────────┼────────────────────────────────────────────┼───────┤
  │ Builds            │ build_amd_asan, build_amd_ubsan, build_arm_asan   │ build_amd_asan_ubsan, build_arm_asan_ubsan │ −1    │
  ├───────────────────┼───────────────────────────────────────────────────┼────────────────────────────────────────────┼───────┤
  │ Unit tests        │ unit_tests_asan, unit_tests_ubsan                 │ unit_tests_asan_ubsan                      │ −1    │
  ├───────────────────┼───────────────────────────────────────────────────┼────────────────────────────────────────────┼───────┤
  │ Stateless tests   │ 7 (amd_asan×4, amd_ubsan×3)                       │ 5 (amd_asan_ubsan×3, arm×2 azure)          │ −2    │
  ├───────────────────┼───────────────────────────────────────────────────┼────────────────────────────────────────────┼───────┤
  │ Integration tests │ 8 (amd_asan 6/6 + flaky + targeted)               │ 8 (amd_asan_ubsan 6/6 + flaky + targeted)  │ 0     │
  ├───────────────────┼───────────────────────────────────────────────────┼────────────────────────────────────────────┼───────┤
  │ Stress tests      │ 4 (amd_ubsan, arm_asan, arm_asan_s3, amd_release) │ 1 (arm_asan_ubsan_s3)                      │ −3    │
  ├───────────────────┼───────────────────────────────────────────────────┼────────────────────────────────────────────┼───────┤
  │ AST fuzzer        │ 2 (amd_ubsan, arm_asan)                           │ 1 (arm_asan_ubsan)                         │ −1    │
  ├───────────────────┼───────────────────────────────────────────────────┼────────────────────────────────────────────┼───────┤
  │ BuzzHouse         │ 2 (amd_ubsan, arm_asan)                           │ 1 (arm_asan_ubsan)                         │ −1    │
  ├───────────────────┼───────────────────────────────────────────────────┼────────────────────────────────────────────┼───────┤
  │ ARM tsan build    │ build_arm_tsan                                    │ —                                          │ −1    │
  ├───────────────────┼───────────────────────────────────────────────────┼────────────────────────────────────────────┼───────┤
  │ Misc              │ stress_test_amd_release, arm_asan_targeted        │ —                                          │ −1    │
  └───────────────────┴───────────────────────────────────────────────────┴────────────────────────────────────────────┴───────┘

  −11 total in PR workflow. Similar numbers apply to master and release branch workflows.

@fm4v fm4v enabled auto-merge March 19, 2026 20:57
@fm4v fm4v added this pull request to the merge queue Mar 20, 2026
Merged via the queue into master with commit 5cefee4 Mar 20, 2026
147 of 152 checks passed
@fm4v fm4v deleted the nik/asan-ubsan-combined branch March 20, 2026 00:28
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 20, 2026
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${UBSAN_FLAGS}")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} ${UBSAN_FLAGS}")

elseif (SANITIZE STREQUAL "address,undefined")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is error prone, can't we just use MATCHES for address and undefined, or they have some incompatible flags?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants