Skip to content

Fix drop partition in rare case#24321

Merged
tavplubix merged 6 commits intoClickHouse:masterfrom
amosbird:fixdroppartition
May 25, 2021
Merged

Fix drop partition in rare case#24321
tavplubix merged 6 commits intoClickHouse:masterfrom
amosbird:fixdroppartition

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

@amosbird amosbird commented May 20, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number.

Detailed description / Documentation draft:

.

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label May 20, 2021
@qoega
Copy link
Copy Markdown
Member

qoega commented May 20, 2021

can you check if 01149_zookeeper_mutation_stuck_after_replace_partition failure is related?

Comment on lines +4905 to +4906
// Make sure we cover all parts in partition. In rare cases there might be parts with
// mutation version greater than current block number.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, it should never happen due to

if (queue.virtual_parts.getContainingPart(part->info) != part->name)
return {};

Copy link
Copy Markdown
Collaborator Author

@amosbird amosbird May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I fail to construct a minimal reproduceable case for this, but it somehow happens.

Part 20210419_0_5263_999999999 intersects previous part 20210419_0_3820_6_5263. It is a bug.

and

Part 20210218_0_12_999999999_10 intersects previous part 20210218_0_0_0_727

Mutation gets out of sync with the max block number.

However, it seems reasonable to have drop/detach partition to cover all parts, so that this ill-behaved case can be fixed by detach partition and attach partition without data loss.

It's not a bug fix. I'll mark it as improvement.

@amosbird
Copy link
Copy Markdown
Collaborator Author

can you check if 01149_zookeeper_mutation_stuck_after_replace_partition failure is related?

Cannot reproduce with local unsan build. It doesn't seem to be related though.

@amosbird amosbird added the pr-improvement Pull request with some product improvements label May 20, 2021
@tavplubix
Copy link
Copy Markdown
Member

@tavplubix tavplubix self-assigned this May 24, 2021
@tavplubix
Copy link
Copy Markdown
Member

ClickHouse special build check - build is ok, but Command 'docker run ... ' returned non-zero exit status 1, looks unrelated

Functional stateless tests (ANTLR debug) - 01155_old_mutation_parts_to_do - Can't parse input: no viable alternative at input 'modifysetting' - looks like ALTER ... SETTINGS is not implemented in ANTLR parser, I will add it to skip list, cc: @abyss7

Functional stateless tests (release, DatabaseReplicated) - Watching task ... is executing longer - known bug in DatabaseReplicatedDDLWorker::enqueueQuery, I will fix it

Functional stateless tests flaky check (address) - 01414_mutations_and_errors - Exception happened during execution of mutation 'mutation_4.txt' with part '20191002_2_2_0' reason: 'Code: 236, e.displayText() = DB::Exception: Cancelled mutating parts (version 21.7.1.6946)'. This error maybe retryable or not. In case of unretryable error, mutation can be killed with KILL MUTATION query. - But test is not modified, so it was flaky earlier

Stress test (thread) - ALTER TABLE sticking_mutations - looks suspicious, but there are many messages like Not enough idle threads to apply mutations at the moment in server log, so it's probably not related

Waiting for integration tests

@qoega
Copy link
Copy Markdown
Member

qoega commented May 26, 2021

@tavplubix "ClickHouse special build check 4/5 builds" is

2021-05-26 08:29:17 /build/obj-x86_64-linux-gnu/../src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp:271:5: error: use range-based for loop instead [modernize-loop-convert,-warnings-as-errors]
2021-05-26 08:29:17     for (auto it = in_partition->second.begin(); it != in_partition->second.end(); ++it)
2021-05-26 08:29:17     ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-05-26 08:29:17         (auto & it : in_partition->second)
2021-05-26 08:29:17 100814 warnings generated.
2021-05-26 08:29:17 Suppressed 100815 warnings (100812 in non-user code, 2 NOLINT, 1 with check filters).
2021-05-26 08:29:17 Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2021-05-26 08:29:17 1 warning treated as error

It is now red in master

tavplubix added a commit that referenced this pull request May 27, 2021
Backport #24321 to 21.4: Fix drop partition in rare case
tavplubix added a commit that referenced this pull request May 27, 2021
Backport #24321 to 21.6: Fix drop partition in rare case
tavplubix added a commit that referenced this pull request May 27, 2021
Backport #24321 to 21.5: Fix drop partition in rare case
@den-crane
Copy link
Copy Markdown
Contributor

21.3 LTS ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants