Skip to content

Fix data-race between LogSink::writeMarks() and LogSource in StorageLog#29946

Merged
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
azat:log-fix-data-race
Oct 14, 2021
Merged

Fix data-race between LogSink::writeMarks() and LogSource in StorageLog#29946
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
azat:log-fix-data-race

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Oct 10, 2021

Changelog category (leave one):

  • Bug Fix (user-visible misbehaviour in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix data-race between LogSink::writeMarks() and LogSource in StorageLog

Detailed description / Documentation draft:
CI founds [1], TSan report:

Details
WARNING: ThreadSanitizer: data race (pid=497)
  Write of size 8 at 0x7b1c00e4e8d8 by thread T922:
    4 DB::LogSink::writeMarks() obj-x86_64-linux-gnu/../src/Storages/StorageLog.cpp:462:72 (clickhouse+0x1665cdaa)
    5 DB::LogSink::consume() obj-x86_64-linux-gnu/../src/Storages/StorageLog.cpp:324:5 (clickhouse+0x1665c216)

  Previous read of size 8 at 0x7b1c00e4e8d8 by thread T711:
    1 DB::LogSource::readData()::$_0::operator()(bool) const::'lambda'()::operator()() const obj-x86_64-linux-gnu/../src/Storages/StorageLog.cpp:188:26 (clickhouse+0x16661baf)
    9 DB::LogSource::readData() obj-x86_64-linux-gnu/../src/Storages/StorageLog.cpp:204:20 (clickhouse+0x1665bbcb)
  [1]: https://clickhouse-test-reports.s3.yandex.net/29930/4bc90d1dd7dbd4b8a9b6920d00ca24e8b160358e/stress_test_(thread).html#fail1

Fixes: #29470

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Oct 10, 2021
@azat azat marked this pull request as draft October 10, 2021 20:09
@azat azat force-pushed the log-fix-data-race branch from 3a47e54 to a5738a3 Compare October 10, 2021 20:39
@azat azat marked this pull request as ready for review October 10, 2021 20:45
@alexey-milovidov alexey-milovidov self-assigned this Oct 10, 2021
@alexey-milovidov
Copy link
Copy Markdown
Member

Thank you!

It's interesting how this issue is found only now. Is it broken only recently?
(Or maybe it's very rare and enabling Thread Fuzzer in stress test helped.)

@azat
Copy link
Copy Markdown
Member Author

azat commented Oct 10, 2021

(Or maybe it's very rare and enabling Thread Fuzzer in stress test helped.)

I think this one, fuzzer is really worthwhile thing!

… in StorageLog

CI founds [1], TSan report:

    WARNING: ThreadSanitizer: data race (pid=497)
      Write of size 8 at 0x7b1c00e4e8d8 by thread T922:
        4 DB::LogSink::writeMarks() obj-x86_64-linux-gnu/../src/Storages/StorageLog.cpp:462:72 (clickhouse+0x1665cdaa)
        5 DB::LogSink::consume() obj-x86_64-linux-gnu/../src/Storages/StorageLog.cpp:324:5 (clickhouse+0x1665c216)

      Previous read of size 8 at 0x7b1c00e4e8d8 by thread T711:
        1 DB::LogSource::readData()::$_0::operator()(bool) const::'lambda'()::operator()() const obj-x86_64-linux-gnu/../src/Storages/StorageLog.cpp:188:26 (clickhouse+0x16661baf)
        9 DB::LogSource::readData() obj-x86_64-linux-gnu/../src/Storages/StorageLog.cpp:204:20 (clickhouse+0x1665bbcb)

      [1]: https://clickhouse-test-reports.s3.yandex.net/29930/4bc90d1dd7dbd4b8a9b6920d00ca24e8b160358e/stress_test_(thread).html#fail1

Fixes: ClickHouse#29470
v2: rework locking, but this produce deadlock for 01499_log_deadlock test
v3: introduce per-file marks mutex
@azat azat force-pushed the log-fix-data-race branch from a5738a3 to eba6410 Compare October 13, 2021 21:18
@azat
Copy link
Copy Markdown
Member Author

azat commented Oct 14, 2021

Functional stateless tests (address) — fail: 1, passed: 3444, skipped: 12
Functional stateless tests (release, DatabaseReplicated) — fail: 1, passed: 3338, skipped: 118
Functional stateless tests (thread) — fail: 1, passed: 3429, skipped: 27

Integration tests (release) — fail: 2, passed: 1597, flaky: 0

Stress test (thread) — Test script failed

Code: 241. DB::Exception: Received from localhost:9000. DB::Exception: Memory limit (total) exceeded: would use 68.51 GiB (attempt to allocate chunk of 2105374 bytes), maximum: 62.94 GiB. (MEMORY_LIMIT_EXCEEDED)

@qoega qoega added the testing Special issue with list of bugs found by CI label Oct 14, 2021
@azat azat requested a review from alexey-milovidov October 14, 2021 21:57
@alexey-milovidov alexey-milovidov merged commit bc1662b into ClickHouse:master Oct 14, 2021
robot-clickhouse pushed a commit that referenced this pull request Oct 15, 2021
robot-clickhouse pushed a commit that referenced this pull request Oct 15, 2021
robot-clickhouse pushed a commit that referenced this pull request Oct 15, 2021
robot-clickhouse pushed a commit that referenced this pull request Oct 15, 2021
robot-clickhouse pushed a commit that referenced this pull request Oct 15, 2021
@azat azat deleted the log-fix-data-race branch October 15, 2021 07:31
alexey-milovidov added a commit that referenced this pull request Oct 16, 2021
Backport #29946 to 21.10: Fix data-race between LogSink::writeMarks() and LogSource in StorageLog
alexey-milovidov added a commit that referenced this pull request Oct 17, 2021
Backport #29946 to 21.9: Fix data-race between LogSink::writeMarks() and LogSource in StorageLog
alexey-milovidov added a commit that referenced this pull request Oct 23, 2021
Backport #29946 to 21.8: Fix data-race between LogSink::writeMarks() and LogSource in StorageLog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default testing Special issue with list of bugs found by CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ThreadSanitizer: data race DB::LogSink::writeMarks and DB::StorageLog::addFiles

4 participants