Stricter checks when attaching/detaching threads to groups#75474
Stricter checks when attaching/detaching threads to groups#75474nikitamikhaylov merged 11 commits intomasterfrom
Conversation
|
This is an automated comment for commit f0eae2c with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
|
(The new checks fail in CI, investigating.) |
azat
left a comment
There was a problem hiding this comment.
LGTM
But, AFAICS it should not affect incorrect tracking of total memory tracker?
The check that |
I mean that even if the |
|
Ah, you're right, I didn't notice the fallback in |
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250220) * Fix build due to ClickHouse/ClickHouse#75474 * Fix build due to ClickHouse/ClickHouse#75452 * Rework "gluten_cache.local.enabled" config due to ClickHouse/ClickHouse#75452 Now, we still use "gluten_cache.local.enabled" in spark, and we will convert it to "enable.gluten_cache.local" when initlize native backend. * Fix memory access error exposed by ClickHouse/ClickHouse#75452 * Fix "<Error>... Possibly incorrect column size subtraction" due to ClickHouse/ClickHouse#75938 * Unified temporary directory --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang Chen <[email protected]>
) * [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250220) * Fix build due to ClickHouse/ClickHouse#75474 * Fix build due to ClickHouse/ClickHouse#75452 * Rework "gluten_cache.local.enabled" config due to ClickHouse/ClickHouse#75452 Now, we still use "gluten_cache.local.enabled" in spark, and we will convert it to "enable.gluten_cache.local" when initlize native backend. * Fix memory access error exposed by ClickHouse/ClickHouse#75452 * Fix "<Error>... Possibly incorrect column size subtraction" due to ClickHouse/ClickHouse#75938 * Unified temporary directory --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang Chen <[email protected]>
Changelog category (leave one):
Closes #59114
In https://s3.amazonaws.com/clickhouse-test-reports/74014/dcdccd54f13e74c5018f6d40e5a4caa77858dc22/stress_test__debug_.html
CurrentThread::getGroup()returned nullptr when called fromTCPHandler::processOrdinaryQuery. This means that calls toCurrentThread::attachToGroup/detachFromGroupIfNotDetachedare not properly nested: someone detached a thread that they didn't attach. (The thread is expected to be attached to a group because the caller ofprocessOrdinaryQueryholds a QueryScope for the duration of the call.)Almost all calls to
CurrentThread::attachToGroupIfDetached/detachFromGroupIfNotDetachedfollow this pattern that looks incorrect:If the thread is already attached to a group, it'll end up detached after this function returns. Normally such
thread_funcis immediately scheduled on a thread pool; then the thread shouldn't be already attached to a group (... or so I'm assuming; there were no asserts about this, so CI on this PR may prove this analysis wrong). But some code sites sometimes call such function directly, e.g.Aggregator::prepareBlocksAndFillTwoLevelImpl. AFAICT, this would incorrectly detach the thread from the thread group. I'm guessing this is what caused these test failures (but I didn't try to reproduce this and don't have an explanation for why the crash happens so rarely.)This PR replaces this pattern with a more correct one, with a bunch of asserts, wrapped in a RAII thing.
Most changes are mechanical. Most substantial ones are in
ThreadStatus.h,ThreadStatusExt.cpp, andProfileEventsExt.cpp.