Skip to content

Fix bug in commit_ret_elems which lead to "live leak" in case of requests batch.#278

Merged
greensky00 merged 3 commits intoeBay:masterfrom
alesapin:fix_bug
Jan 26, 2022
Merged

Fix bug in commit_ret_elems which lead to "live leak" in case of requests batch.#278
greensky00 merged 3 commits intoeBay:masterfrom
alesapin:fix_bug

Conversation

@alesapin
Copy link
Copy Markdown
Contributor

I think the code contains explicit comment. Fixes #239.

alesapin and others added 3 commits April 7, 2021 13:25
…atch

Fix bug which lead to memory leak

(cherry picked from commit 1707a75)
/// condition) we have to add it here. Otherwise we don't need to add
/// anything into commit_ret_elems_, because nobody will wait for the
/// responses of the intermediate requests from requests batch.
bool need_to_check_commit_ret = sm_idx == pc_idx;
Copy link
Copy Markdown
Contributor

@greensky00 greensky00 Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix. To make sure if I understand this condition correctly, can you confirm the below?

Let's say the current committed index is 10 (for all pre-commit, quick-commit, and sm-commit), and two threads (T1 and T2) are calling handle_cli_req with exactly one log. What will happen is:

  1. T1 calls handle_cli_req first, precommit_index_ becomes 11.
  2. Before T1 creates the corresponding commit_ret_elem, the commit thread wakes up earlier.
  3. Since sm_idx == pc_idx in this case, the commit thread will create it.
  4. However, T2 cannot interfere before step (3), because of the lock held by handle_cli_req_prelock. T2 can append its log only after the commit_ret_elem for 11 is created.

So, if a thread order inversion happens, only one user thread is involved at a time. When the commit thread wakes up, there is no case that there are multiple handle_cli_req calls whose commit_ret_elems are not created yet. Is that right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I didn't think about such case, but yes handle_cli_req guarded by mutex (or recursive mutex for some reason...) in handle_cli_req_prelock: https://github.com/ClickHouse-Extras/NuRaft/blob/1707a7572aa66ec5d0a2dbe2bf5effa3352e6b2d/src/handle_client_request.cxx#L40-L53.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I cannot catch "thread inversion" even under heavy load. But according to the code it should be possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let me merge it.

@greensky00 greensky00 merged commit 1b08f25 into eBay:master Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A lot of "cancelled non-blocking client request" with async API

2 participants