Skip to content

TSAN: fix data race when reading a histogram's sample count#16946

Merged
mattklein123 merged 5 commits intoenvoyproxy:mainfrom
rgs1:fix-tsan
Jun 14, 2021
Merged

TSAN: fix data race when reading a histogram's sample count#16946
mattklein123 merged 5 commits intoenvoyproxy:mainfrom
rgs1:fix-tsan

Conversation

@rgs1
Copy link
Copy Markdown
Member

@rgs1 rgs1 commented Jun 11, 2021

Commit Message: Fix TSAN issue caused from tests when reading histograms.

Additional Description:
This was introduced by #15908. We need to read a histogram's from
the main thread to avoid data races between writes (from the main
thread) and reads (from a worker thread).

Risk Level: Low.
Testing: Unit tests + TSAN should pass.
Docs Changes: NA
Release Notes: NA
Platform Specific Features: NA

Signed-off-by: Raul Gutierrez Segales [email protected]

This was introduced by envoyproxy#15908. We need to read a histogram's from
the main thread to avoid data races between writes (from the main
thread) and reads (from a worker thread).

Signed-off-by: Raul Gutierrez Segales <[email protected]>
@rgs1
Copy link
Copy Markdown
Member Author

rgs1 commented Jun 11, 2021

This should fix the TSAN issue. Happy to move the helper code that does the read from the main thread into
a helper function in test/test_common/utility.h if that's better.

cc: @antoniovicente @jmarantz

@rgs1
Copy link
Copy Markdown
Member Author

rgs1 commented Jun 11, 2021

/assign @antoniovicente

Signed-off-by: Raul Gutierrez Segales <[email protected]>
Signed-off-by: Raul Gutierrez Segales <[email protected]>
Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Waiting for CI before merging.

@rgs1
Copy link
Copy Markdown
Member Author

rgs1 commented Jun 12, 2021

Ok the ASAN failure seems unrelated:

//test/integration:health_check_integration_test                         FAILED in 2 out of 2 in 61.7s

I'll merge main...

@rgs1
Copy link
Copy Markdown
Member Author

rgs1 commented Jun 12, 2021

Ok the ASAN failure seems unrelated:

//test/integration:health_check_integration_test                         FAILED in 2 out of 2 in 61.7s

I'll merge main...

Hmm nothing new in main plus it seems possibly transient:

test/integration/health_check_integration_test.cc:276: Failure
Value of: clusters_[cluster_idx].host_fake_connection_->waitForNewStream( *dispatcher_, clusters_[cluster_idx].host_stream_)
  Actual: false (Timed out waiting for new stream.)
Expected: true
Stack trace:
  0x160b9b2e: Envoy::(anonymous namespace)::HttpHealthCheckIntegrationTest_SingleEndpointImmediateHealthcheckFailHttp_Test::TestBody()
  0x208969e8: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
  0x20866c13: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x20832340: testing::Test::Run()
  0x20833ad5: testing::TestInfo::Run()
... Google Test internal frames ...

[  FAILED  ] IpHttpVersions/

@rgs1
Copy link
Copy Markdown
Member Author

rgs1 commented Jun 12, 2021

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16946 (comment) was created by @rgs1.

see: more, trace.

Raul Gutierrez Segales added 2 commits June 13, 2021 11:36
@rgs1
Copy link
Copy Markdown
Member Author

rgs1 commented Jun 14, 2021

The remaining coverage failure is unrelated, so we can probably just merge:

  /build/tmp/_bazel_envoybuild/b570b5ccd0454dc9af9f65ab1833764d/execroot/envoy/bazel-out/k8-fastbuild/testlogs/test/exe/win32_scm_test/coverage.dat
//test/integration:health_check_integration_test                         FAILED in 38.4s

@rgs1
Copy link
Copy Markdown
Member Author

rgs1 commented Jun 14, 2021

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16946 (comment) was created by @rgs1.

see: more, trace.

@rgs1
Copy link
Copy Markdown
Member Author

rgs1 commented Jun 14, 2021

cc: @phlax for visibility on the CI/coverage failure (not sure if it's a global thing)

@mattklein123 mattklein123 merged commit 00f7ac2 into envoyproxy:main Jun 14, 2021
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
…xy#16946)

This was introduced by envoyproxy#15908. We need to read a histogram's from
the main thread to avoid data races between writes (from the main
thread) and reads (from a worker thread).

Signed-off-by: Raul Gutierrez Segales <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants