Skip to content

rocksdb: fix race condition during multiple DB opening#29393

Merged
kitaisreal merged 3 commits intoClickHouse:masterfrom
azat:rocksdb-fix-data-race
Sep 27, 2021
Merged

rocksdb: fix race condition during multiple DB opening#29393
kitaisreal merged 3 commits intoClickHouse:masterfrom
azat:rocksdb-fix-data-race

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Sep 26, 2021

Changelog category (leave one):

  • Bug Fix (user-visible misbehaviour in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
rocksdb: fix race condition during multiple DB opening (and get back some tests that triggers the problem on CI)

Detailed description / Documentation draft:
This should fix the following SIGSEGV, that was found on CI 1:

<Fatal> BaseDaemon: Address: NULL pointer. Access: read. Unknown si_code.
<Fatal> BaseDaemon: 4.4. inlined from ../contrib/rocksdb/utilities/object_registry.cc:19: rocksdb::ObjectLibrary::FindEntry() const
...
<Fatal> BaseDaemon: 7.3. inlined from ../contrib/rocksdb/options/cf_options.cc:678: rocksdb::$_7::operator()()

And also I checked manually with TSan binary, and here is a data race
reported by TSan:

WARNING: ThreadSanitizer: data race (pid=3356)
  Read of size 8 at 0x7b0c0008cca8 by thread T40:
    2 rocksdb::ObjectLibrary::FindEntry() const obj-x86_64-linux-gnu/../contrib/rocksdb/utilities/object_registry.cc:18:27 (clickhouse-tsan+0x1b839a6c)
    ...
    6 rocksdb::$_7::operator()() const obj-x86_64-linux-gnu/../contrib/rocksdb/options/cf_options.cc:676:32 (clickhouse-tsan+0x1b6bfa63)
    ...
    28 rocksdb::GetColumnFamilyOptionsFromMap() obj-x86_64-linux-gnu/../contrib/rocksdb/options/options_helper.cc:727:10 (clickhouse-tsan+0x1b6fffd2)
    29 DB::StorageEmbeddedRocksDB::initDb() obj-x86_64-linux-gnu/../src/Storages/RocksDB/StorageEmbeddedRocksDB.cpp:359:26 (clickhouse-tsan+0x14195e31)
    ...

  Previous write of size 8 at 0x7b0c0008cca8 by thread T41:
    ...
    9 rocksdb::ObjectLibrary::AddEntry() obj-x86_64-linux-gnu/../contrib/rocksdb/utilities/object_registry.cc:31:19 (clickhouse-tsan+0x1b8392fc)
    ...
    11 rocksdb::RegisterTableFactories()::$_0::operator()() const obj-x86_64-linux-gnu/../contrib/rocksdb/table/table_factory.cc:23:14 (clickhouse-tsan+0x1b7ea94c)
    ...
    43 rocksdb::GetColumnFamilyOptionsFromMap() obj-x86_64-linux-gnu/../contrib/rocksdb/options/options_helper.cc:727:10 (clickhouse-tsan+0x1b6fffd2)
    44 DB::StorageEmbeddedRocksDB::initDb() obj-x86_64-linux-gnu/../src/Storages/RocksDB/StorageEmbeddedRocksDB.cpp:359:26 (clickhouse-tsan+0x14195e31)

Refs: ClickHouse/rocksdb#13
Fixes: #29341

NOTE: I marked it as Bug fix but I'm not sure that it is Bug fix in your terms

This should fix the following SIGSEGV, that was found on CI [1]:

    <Fatal> BaseDaemon: Address: NULL pointer. Access: read. Unknown si_code.
    <Fatal> BaseDaemon: 4.4. inlined from ../contrib/rocksdb/utilities/object_registry.cc:19: rocksdb::ObjectLibrary::FindEntry() const
    ...
    <Fatal> BaseDaemon: 7.3. inlined from ../contrib/rocksdb/options/cf_options.cc:678: rocksdb::$_7::operator()()

  [1]: https://clickhouse-test-reports.s3.yandex.net/29341/2b2bec3679df7965af908ce3f1e8e17e39bd12fe/integration_tests_flaky_check_(asan).html#fail1

And also I checked manually with TSan binary, and here is a data race
reported by TSan:

    WARNING: ThreadSanitizer: data race (pid=3356)
      Read of size 8 at 0x7b0c0008cca8 by thread T40:
        2 rocksdb::ObjectLibrary::FindEntry() const obj-x86_64-linux-gnu/../contrib/rocksdb/utilities/object_registry.cc:18:27 (clickhouse-tsan+0x1b839a6c)
        ...
        6 rocksdb::$_7::operator()() const obj-x86_64-linux-gnu/../contrib/rocksdb/options/cf_options.cc:676:32 (clickhouse-tsan+0x1b6bfa63)
        ...
        28 rocksdb::GetColumnFamilyOptionsFromMap() obj-x86_64-linux-gnu/../contrib/rocksdb/options/options_helper.cc:727:10 (clickhouse-tsan+0x1b6fffd2)
        29 DB::StorageEmbeddedRocksDB::initDb() obj-x86_64-linux-gnu/../src/Storages/RocksDB/StorageEmbeddedRocksDB.cpp:359:26 (clickhouse-tsan+0x14195e31)
        ...

      Previous write of size 8 at 0x7b0c0008cca8 by thread T41:
        ...
        9 rocksdb::ObjectLibrary::AddEntry() obj-x86_64-linux-gnu/../contrib/rocksdb/utilities/object_registry.cc:31:19 (clickhouse-tsan+0x1b8392fc)
        ...
        11 rocksdb::RegisterTableFactories()::$_0::operator()() const obj-x86_64-linux-gnu/../contrib/rocksdb/table/table_factory.cc:23:14 (clickhouse-tsan+0x1b7ea94c)
        ...
        43 rocksdb::GetColumnFamilyOptionsFromMap() obj-x86_64-linux-gnu/../contrib/rocksdb/options/options_helper.cc:727:10 (clickhouse-tsan+0x1b6fffd2)
        44 DB::StorageEmbeddedRocksDB::initDb() obj-x86_64-linux-gnu/../src/Storages/RocksDB/StorageEmbeddedRocksDB.cpp:359:26 (clickhouse-tsan+0x14195e31)

Refs: ClickHouse/rocksdb#13
Fixes: ClickHouse#29341
@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Sep 26, 2021
@robot-ch-test-poll4 robot-ch-test-poll4 added the submodule changed At least one submodule changed in this PR. label Sep 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants