Skip to content

Fix possible timeouts in sh tests with tsan#68791

Closed
vdimir wants to merge 10 commits intomasterfrom
vdimir/client-context-race
Closed

Fix possible timeouts in sh tests with tsan#68791
vdimir wants to merge 10 commits intomasterfrom
vdimir/client-context-race

Conversation

@vdimir
Copy link
Copy Markdown
Member

@vdimir vdimir commented Aug 23, 2024

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Fix data race in clickhouse-client when it's piped to a process that terminated early

Close #68626

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings (Only check the boxes if you know what you are doing):

  • Allow: All Required Checks
  • Allow: Stateless tests
  • Allow: Stateful tests
  • Allow: Integration Tests
  • Allow: Performance tests
  • Allow: All Builds
  • Allow: batch 1, 2 for multi-batch jobs
  • Allow: batch 3, 4, 5, 6 for multi-batch jobs

  • Exclude: Style check
  • Exclude: Fast test
  • Exclude: All with ASAN
  • Exclude: All with TSAN, MSAN, UBSAN, Coverage
  • Exclude: All with aarch64, release, debug

  • Run only fuzzers related jobs (libFuzzer fuzzers, AST fuzzers, etc.)
  • Exclude: AST fuzzers

  • Do not test
  • Woolen Wolfdog
  • Upload binaries for special builds
  • Disable merge-commit
  • Disable CI cache

@vdimir vdimir added the 🍃 green ci 🌿 Fixing flaky tests in CI label Aug 23, 2024
@vdimir vdimir changed the title Fix some possible timeouts in sh tests with tsan Fix possible timeouts in sh tests with tsan Aug 23, 2024
@robot-ch-test-poll robot-ch-test-poll added the pr-bugfix Pull request with bugfix, not backported by default label Aug 23, 2024
@robot-ch-test-poll2
Copy link
Copy Markdown
Contributor

robot-ch-test-poll2 commented Aug 23, 2024

This is an automated comment for commit d42ee9c with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
Bugfix validationChecks that either a new test (functional or integration) or there some changed tests that fail with the binary built on master branch❌ failure
Successful checks
Check nameDescriptionStatus
BuildsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success

@vitlibar
Copy link
Copy Markdown
Member

The first commit 3d0614f is enough. And the issue with failed tests can be solved by changing the lines

auto query_id = config().getString("query_id", "");
if (!query_id.empty())
global_context->setCurrentQueryId(query_id);

to

        auto query_id = config().getString("query_id", "");
        if (!query_id.empty())
            client_context->setCurrentQueryId(query_id);

@vitlibar vitlibar self-assigned this Aug 26, 2024
@vitlibar
Copy link
Copy Markdown
Member

Is it possible to make a test reproducing that data race?

Co-authored-by: Vitaly Baranov <[email protected]>
@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Aug 26, 2024

Is it possible to make a test reproducing that data race?

Actually, it’s reproducing in the CI, for example, in 00653_verification_monotonic_data_load

The problem is that TSAN reports for the client are not collected when a bug is reproduced. Instead, it is often only detected as a test timeout in most cases (so, there are probably other tests that time out for the same reason). So, we need to address the issue with sanitizer reports for the client in the CI in general.

@vdimir vdimir force-pushed the vdimir/client-context-race branch from 72c1df1 to 59198c1 Compare August 27, 2024 09:54
@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Sep 3, 2024

In the meantime, let's try to fix race by just properly synchronizing formatting and main threads, #69186

@vdimir vdimir closed this in #69186 Sep 6, 2024
@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Oct 16, 2024

Seems #69186 haven't fixed race completely and still there are cases #68614 (comment)

@vitlibar

@vdimir vdimir reopened this Oct 16, 2024
@vdimir vdimir closed this Nov 6, 2024
@vdimir vdimir deleted the vdimir/client-context-race branch December 5, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍃 green ci 🌿 Fixing flaky tests in CI pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stateless test 00653_verification_monotonic_data_load is flaky

5 participants