Skip to content

Conversation

@huluoboge
Copy link
Contributor

Fix a bug where the Stop function's state issue may cause the Pop() function to enter an infinite loop.
template
void JobQueue::Stop() {
stop_ = true; // problem code
push_condition_.notify_all();
pop_condition_.notify_all();
}

typename JobQueue::Job JobQueue::Pop() {
std::unique_lockstd::mutex lock(mutex_);
while (jobs_.empty() && !stop_) {
push_condition_.wait(lock);
....

​​Bug Description:​​
There is a race condition between Stop() and Pop() that can lead to deadlock. The sequence is:

In Pop(), the thread evaluates while (jobs_.empty() && !stop_) and sees stop_ = false
Before entering push_condition_.wait(lock), the Stop() function gets executed:
Sets stop_ = true
Calls push_condition_.notify_all()
The Pop() thread then proceeds to push_condition_.wait(lock)
Because the notification was sent before the wait began, it's missed
Now the thread is stuck waiting indefinitely with stop_ = true, causing a deadlock

Copy link
Contributor

@ahojnnes ahojnnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

@ahojnnes ahojnnes enabled auto-merge (squash) July 14, 2025 07:39
@ahojnnes ahojnnes disabled auto-merge July 14, 2025 07:41
@ahojnnes ahojnnes changed the title Fix a bug of deadlock Fix potential deadlock in job queue Jul 14, 2025
@ahojnnes ahojnnes enabled auto-merge (squash) July 14, 2025 07:41
@ahojnnes ahojnnes merged commit 17ed60c into colmap:main Jul 14, 2025
14 checks passed
ahojnnes added a commit that referenced this pull request Jul 16, 2025
Fix a bug where the Stop function's state issue may cause the Pop()
function to enter an infinite loop.
template <typename T>
void JobQueue<T>::Stop() {
    stop_ = true; // problem code
  push_condition_.notify_all();
  pop_condition_.notify_all();
}

typename JobQueue<T>::Job JobQueue<T>::Pop() {
  std::unique_lock<std::mutex> lock(mutex_);
  while (jobs_.empty() && !stop_) {
    push_condition_.wait(lock);
  ....

​​Bug Description:​​
There is a race condition between Stop() and Pop() that can lead to
deadlock. The sequence is:

In Pop(), the thread evaluates while (jobs_.empty() && !stop_) and sees
stop_ = false
Before entering push_condition_.wait(lock), the Stop() function gets
executed:
Sets stop_ = true
Calls push_condition_.notify_all()
The Pop() thread then proceeds to push_condition_.wait(lock)
Because the notification was sent before the wait began, it's missed
Now the thread is stuck waiting indefinitely with stop_ = true, causing
a deadlock

---------

Co-authored-by: 3000huyang <[email protected]>
Co-authored-by: Johannes Schönberger <[email protected]>
Co-authored-by: Johannes Schönberger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants