Skip to content

Improve lost replica recovery (ReplicatedMergeTree)#42134

Merged
tavplubix merged 10 commits intomasterfrom
improve_replica_recovery
Oct 20, 2022
Merged

Improve lost replica recovery (ReplicatedMergeTree)#42134
tavplubix merged 10 commits intomasterfrom
improve_replica_recovery

Conversation

@tavplubix
Copy link
Copy Markdown
Member

@tavplubix tavplubix commented Oct 6, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improved stale replica recovery process for ReplicatedMergeTree. If lost replica have some parts which absent on a healthy replica, but these parts should appear in future according to replication queue of the healthy replica, then lost replica will keep such parts instead of detaching them.

Also fixes #39022

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Oct 6, 2022
@KochetovNicolai KochetovNicolai self-assigned this Oct 20, 2022
Copy link
Copy Markdown
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is awesome!

@tavplubix
Copy link
Copy Markdown
Member Author

ClickHouse build check - some issue with CI infrastructure:

error during connect: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/images/create?fromImage=clickhouse%2Fbinary-builder&tag=latest": EOF
2022-10-19 18:27:33,056 Cannot pull image clickhouse/binary-builder:latest
2022-10-19 18:27:33,056 Will build image with cmd: 'docker build --network=host -t clickhouse/binary-builder:latest -f /home/ubuntu/actions-runner/_work/_temp/build_check/ClickHouse/docker/packager/../../docker/packager/binary/Dockerfile /home/ubuntu/actions-runner/_work/_temp/build_check/ClickHouse/docker/packager/../../docker/packager/binary'
error during connect: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/build?buildargs=%7B%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&labels=%7B%7D&memory=0&memswap=0&networkmode=host&rm=1&shmsize=0&t=clickhouse%2Fbinary-builder%3Alatest&target=&ulimits=null&version=1": EOF
Traceback (most recent call last):
  File "/home/ubuntu/actions-runner/_work/_temp/build_check/ClickHouse/docker/packager/./packager", line 415, in <module>
    build_image(image_with_version, dockerfile)
  File "/home/ubuntu/actions-runner/_work/_temp/build_check/ClickHouse/docker/packager/./packager", line 37, in build_image
    subprocess.check_call(
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'docker build --network=host -t clickhouse/binary-builder:latest -f /home/ubuntu/actions-runner/_work/_temp/build_check/ClickHouse/docker/packager/../../docker/packager/binary/Dockerfile /home/ubuntu/actions-runner/_work/_temp/build_check/ClickHouse/docker/packager/../../docker/packager/binary' returned non-zero exit status 1.

cc: @Felixoid

Stress test (tsan) - #42528

@tavplubix tavplubix merged commit f33ae3c into master Oct 20, 2022
@tavplubix tavplubix deleted the improve_replica_recovery branch October 20, 2022 12:55
Enmk pushed a commit to Altinity/ClickHouse that referenced this pull request Feb 8, 2023
…ecovery

Improve lost replica recovery (ReplicatedMergeTree)
Enmk added a commit to Altinity/ClickHouse that referenced this pull request Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tried to lock part all_0_0_0 for removal second time by

3 participants