Skip to content

[2.10] Fix Fork GC potential double-free on error path - [MOD-12521]#7461

Merged
GuyAv46 merged 7 commits into2.10from
backport-7423-to-2.10
Nov 24, 2025
Merged

[2.10] Fix Fork GC potential double-free on error path - [MOD-12521]#7461
GuyAv46 merged 7 commits into2.10from
backport-7423-to-2.10

Conversation

@GuyAv46
Copy link
Collaborator

@GuyAv46 GuyAv46 commented Nov 21, 2025

Description

Backport of #7423 to 2.10.


Note

Improves ForkGC pipe I/O error handling and buffer receive logic to avoid double-frees and adds tests that simulate pipe failures during GC and apply.

  • ForkGC (src/fork_gc.c)
    • Replace exit(1) with RedisModule_ExitFromChild(EXIT_FAILURE) on child write failure.
    • Improve FGC_recvFixed polling/error handling: capture poll_rc, break on non-EINTR read errors, and log detailed revents/errno diagnostics.
    • Refactor FGC_recvBuffer to use a temp length, allocate into a local buffer, handle SIZE_MAX/zero-length cases safely, and avoid freeing wrong pointers on failure.
    • Initialize locals defensively (e.g., fieldName = NULL).
  • Tests (tests/cpptests/test_cpp_forkgc.cpp)
    • Add pipe-failure tests: testPipeErrorDuringGC and testPipeErrorDuringApply, including a closer thread to close pipe mid-apply; verify no crashes and no double-free (e.g., totalCollected unchanged).
    • Include <thread> for new concurrency test support.

Written by Cursor Bugbot for commit c41f9ee. This will update automatically on new commits. Configure here.

* make FGC_recvBuffer clean

* add revents to timeout log

* improve polling logs

* Nullify tag field name

* remove unused variable

* add a test

* add a stres unit-test

* improve test

* improve logging

* add include

(cherry picked from commit 442a75e)
@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.30%. Comparing base (d056a40) to head (c41f9ee).
⚠️ Report is 5 commits behind head on 2.10.

Files with missing lines Patch % Lines
src/fork_gc.c 95.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             2.10    #7461      +/-   ##
==========================================
+ Coverage   89.29%   89.30%   +0.01%     
==========================================
  Files         207      207              
  Lines       35335    35448     +113     
==========================================
+ Hits        31553    31658     +105     
- Misses       3782     3790       +8     
Flag Coverage Δ
flow 83.87% <60.00%> (-0.14%) ⬇️
unit 42.37% <95.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alonre24
alonre24 previously approved these changes Nov 22, 2025
// just exit, do not abort(), which will trigger a watchdog on RLEC, causing adverse effects
RedisModule_Log(fgc->ctx, "warning", "GC fork: broken pipe, exiting");
exit(1);
RedisModule_ExitFromChild(EXIT_FAILURE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 8.x we already had RedisModule_ExitFromChild ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Not sure why we didn't BP it

// The GC should have failed, so no bytes should be collected
// (or at least the operation should complete without crashing)
ASSERT_EQ(0, fgc->stats.totalCollected);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add the test simulating killing the child at different times?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it back (probably), I think it was too slow because of the exit(1), but now it should be fine

@GuyAv46 GuyAv46 added this pull request to the merge queue Nov 24, 2025
github-merge-queue bot pushed a commit that referenced this pull request Nov 24, 2025
…7461)

* Fix Fork GC potential double-free on error path - [MOD-12521] (#7423)

* make FGC_recvBuffer clean

* add revents to timeout log

* improve polling logs

* Nullify tag field name

* remove unused variable

* add a test

* add a stres unit-test

* improve test

* improve logging

* add include

(cherry picked from commit 442a75e)

* reuse one thread

* improvement

* remove harsh stress test

* fix potential double close

* exit from child with RedisModule_ExitFromChild

* add heavy test back
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 24, 2025
@GuyAv46 GuyAv46 added this pull request to the merge queue Nov 24, 2025
Merged via the queue into 2.10 with commit b2eb4a4 Nov 24, 2025
16 checks passed
@GuyAv46 GuyAv46 deleted the backport-7423-to-2.10 branch November 24, 2025 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants