Skip to content

Fix race condition in clickhouse cache dictionary#12566

Merged
alexey-milovidov merged 8 commits intomasterfrom
race_condition_cache_dictionaries
Jul 21, 2020
Merged

Fix race condition in clickhouse cache dictionary#12566
alexey-milovidov merged 8 commits intomasterfrom
race_condition_cache_dictionaries

Conversation

@alesapin
Copy link
Copy Markdown
Member

@alesapin alesapin commented Jul 17, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix race condition in external dictionaries with cache layout which can lead server crash.

@alesapin alesapin added the bug Confirmed user-visible misbehaviour in official release label Jul 17, 2020
@alesapin alesapin marked this pull request as draft July 17, 2020 15:27
@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Jul 17, 2020
@alesapin
Copy link
Copy Markdown
Member Author

Also, setting max_threads_for_updates cannot be tuned both from SQL query and xml config because of https://github.com/ClickHouse/ClickHouse/blob/master/src/Dictionaries/DictionaryFactory.cpp#L37-L41.

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Jul 17, 2020

If we remove these two unlockers https://github.com/ClickHouse/ClickHouse/blob/master/src/Dictionaries/CacheDictionary.cpp#L882-L892 than it should work as expected, but the whole idea of non-blocking read will be broken.

When I add more granular locks (both read for update and write for rest) my server hangs without any information from thread sanitizer :(

@alesapin
Copy link
Copy Markdown
Member Author

alesapin commented Jul 17, 2020

Traces from gdb when server hangs https://gist.github.com/alesapin/46a52d33d89f100c67cff3c4cff4bea8
https://gist.github.com/alesapin/df27abfa0db390a35cab43ecd66ee838

I don't understand why the server is not able to drop the view or underlying table. Nobody holds table's drop lock, new select queries also wait for table drop lock (because DROP already in lock's queue) or some locks inside CacheDictionary. The only idea is that the timed part of our fair RWLock has a bug. Two reasons:

  1. When I remove all code about waitings everything works fine https://gist.github.com/alesapin/317b292cb323050febd43f79d392b67a.
  2. Seems like a query from clickhouse dictionary is internal and has no query_id, and in RWLock.cpp we have suspiciously many lines about locks without query_id.

@alesapin alesapin force-pushed the race_condition_cache_dictionaries branch from 3463f04 to 9ed4204 Compare July 20, 2020 11:53
@alesapin alesapin changed the title Bug report: Race condition in clickhouse cache dictionary Fix race condition in clickhouse cache dictionary Jul 20, 2020
@alesapin alesapin marked this pull request as ready for review July 20, 2020 13:46
@alesapin alesapin removed the bug Confirmed user-visible misbehaviour in official release label Jul 20, 2020
@alesapin
Copy link
Copy Markdown
Member Author

All tests passed, I have just renamed method.

@alexey-milovidov alexey-milovidov merged commit 16e8c61 into master Jul 21, 2020
@alexey-milovidov alexey-milovidov deleted the race_condition_cache_dictionaries branch July 21, 2020 08:27
alesapin pushed a commit that referenced this pull request Jul 21, 2020
…naries

Fix race condition in clickhouse cache dictionary

(cherry picked from commit 16e8c61)
alesapin pushed a commit that referenced this pull request Jul 21, 2020
…naries

Fix race condition in clickhouse cache dictionary

(cherry picked from commit 16e8c61)
alexey-milovidov added a commit that referenced this pull request Jul 21, 2020
…naries

Fix race condition in clickhouse cache dictionary

(cherry picked from commit 16e8c61)
alexey-milovidov added a commit that referenced this pull request Jul 21, 2020
…naries

Fix race condition in clickhouse cache dictionary

(cherry picked from commit 16e8c61)
alesapin added a commit that referenced this pull request Jul 23, 2020
…ctionary (#12681)

* Race condition test

* remove comment

* Ugly code without races

* Comments for ugly fix

* Better code

* Better comment

* Remove redundant new line

* Better function name and comment

Co-authored-by: alesapin <[email protected]>
Co-authored-by: alexey-milovidov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants