Skip to content

Revert "Reduce lock contention for MergeTree tables (by renaming parts without holding lock)"#64899

Merged
alesapin merged 1 commit intomasterfrom
revert-61973-mt/rename-without-lock
Jun 6, 2024
Merged

Revert "Reduce lock contention for MergeTree tables (by renaming parts without holding lock)"#64899
alesapin merged 1 commit intomasterfrom
revert-61973-mt/rename-without-lock

Conversation

@alesapin
Copy link
Copy Markdown
Member

@alesapin alesapin commented Jun 6, 2024

Reverts #61973

Sorry, was merged with broken CI.

@alesapin alesapin enabled auto-merge June 6, 2024 09:38
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jun 6, 2024
@robot-clickhouse-ci-2
Copy link
Copy Markdown
Contributor

robot-clickhouse-ci-2 commented Jun 6, 2024

This is an automated comment for commit 7900fe5 with description of existing statuses. It's updated for the latest CI running

⏳ Click here to open a full report in a separate page

Check nameDescriptionStatus
A SyncIf it fails, ask a maintainer for help⏳ pending
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Successful checks
Check nameDescriptionStatus
Docs checkBuilds and tests the documentation✅ success
PR CheckChecks correctness of the PR's body✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Jun 6, 2024

To apply it back please introduce a test which will show performance gains.

@alesapin alesapin added this pull request to the merge queue Jun 6, 2024
Merged via the queue into master with commit 71f3791 Jun 6, 2024
@alesapin alesapin deleted the revert-61973-mt/rename-without-lock branch June 6, 2024 10:06
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 6, 2024
@azat
Copy link
Copy Markdown
Member

azat commented Jun 6, 2024

@alesapin which one of the failures had been related to this PR? (also, even though it is not related, please take a look at #61973 (comment))

To apply it back please introduce a test which will show performance gains.

fsync is a tricky thing
And likely this test will be came flaky, and eventually will be removed due to flakiness of perf tests.

Also can you please tag me on reverts, since I saw this due to pure luck.

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jun 6, 2024

which one of the failures had been related to this PR?

It broke our private repo sync, the PR was merged without checking "A sync" check which was not green, the reviewer should fix the sync before the PR is merged.

@azat
Copy link
Copy Markdown
Member

azat commented Jun 6, 2024

It broke our private repo sync, the PR was merged without checking "A sync" check which was not green, the reviewer should fix the sync before the PR is merged.

@kssenii thanks! Any insights on what had been broken? Maybe some functions signatures overlaps?

( Or maybe even this one - 6c3db34 :) )

@kssenii
Copy link
Copy Markdown
Member

kssenii commented Jun 6, 2024

There was a small conflict in MergeTreeData.cpp, near these lines in your PR https://github.com/ClickHouse/ClickHouse/pull/61973/files#diff-b7d33a7abda5f3365d5e4fb58ca98b756f6401372b2e2d5eb449a2bd3444fb5eL6636-L6638, but it was non-obvious that quick conflict resolution will not break anything, so to avoid private sync being stuck - it was better to revert.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Jun 6, 2024

@azat This change:

  1. Makes code worse and more complex
  2. Nobody ever reported similar issue
  3. No proofs that it helps even for the case that you are describing.

@azat
Copy link
Copy Markdown
Member

azat commented Jun 6, 2024

Makes code worse and more complex

I have some doubts about this as well (#61973 (comment)), but I ran out of ideas, the code does not allow to make this clean without rewriting (usually I'm trying to keep patches small, this avoids extra conflicts)

If you have any ideas, please let me know.

Nobody ever reported similar issue

Sure, I doubt that there are lots of users who enables fsync on production

No proofs that it helps even for the case that you are describing.

It is actually pretty straightforward - if you are holding lock during rename you depends on how long will the rename takes, and under load it could take awhile, especially on HDDs (it also depends on the journal settings for ext4, and where this journal is located and so on and so forth). And it will be magnitude worser in case of fsync_part_directory=1.

And things could be even worse by some "tricky" HDDs, for instance I found out that on one of our production we had T10 protection enabled, which slows down disks a lot (up to 2x)

Sure it is nice to have a performance test, but it is unlikely will be deterministic, and eventually will be removed (I already had such experience in the past with complex tests). I will try to come up with some numbers for v2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants