Skip to content

Commit fe0e9fb

Browse files
huydhnpytorchmergebot
authored andcommitted
Fix flaky SIGSEGV crash in test_profile_memory (#136304)
Fixes #132331 We need another barrier here to ensure that the main thread doesn't stop the profiler while other threads are still using it (and crash). I can reliably reproduce the issue with `pytest -v test/profiler/test_cpp_thread.py -k test_profile_memory --flake-finder`. ### Testing `pytest -v test/profiler/test_cpp_thread.py --flake-finder` all passes. Pull Request resolved: #136304 Approved by: https://github.com/briancoutinho
1 parent d45b015 commit fe0e9fb

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

test/profiler/test_cpp_thread.cpp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ void start_threads(int thread_count, int iteration_count, bool attach) {
4747

4848
static std::atomic<int> barrier = 0;
4949
barrier = 0;
50+
static std::atomic<int> another_barrier = 0;
51+
another_barrier = 0;
5052
thread_local bool enabled_in_main_thread = false;
5153

5254
std::vector<std::thread> threads;
@@ -78,6 +80,14 @@ void start_threads(int thread_count, int iteration_count, bool attach) {
7880
}
7981

8082
ProfilerEventHandler::Handler->emulateTraining(iteration, id);
83+
84+
// We need another barrier here to ensure that the main thread doesn't
85+
// stop the profiler while other threads are still using it. This fixes
86+
// https://github.com/pytorch/pytorch/issues/132331
87+
++another_barrier;
88+
while (another_barrier % thread_count) {
89+
std::this_thread::yield();
90+
}
8191
}
8292
});
8393
}

0 commit comments

Comments
 (0)