-
Notifications
You must be signed in to change notification settings - Fork 760
Fix: Trigger self-heal on read when shards missing from rejoined nodes #871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
houseme
merged 6 commits into
copilot/fix-upload-freeze-issue
from
copilot/fix-data-recovery-during-disconnection
Nov 17, 2025
Merged
Fix: Trigger self-heal on read when shards missing from rejoined nodes #871
houseme
merged 6 commits into
copilot/fix-upload-freeze-issue
from
copilot/fix-data-recovery-during-disconnection
Nov 17, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…recovery-during-disconnection
…recovery-during-disconnection
- Added proactive heal detection in get_object_with_fileinfo - When reading an object, now checks if any shards are missing even if read succeeds - Sends low-priority heal request to reconstruct missing shards on rejoined nodes - This fixes the issue where data written during node outage is not healed when node rejoins Co-authored-by: houseme <[email protected]>
Copilot
AI
changed the title
[WIP] Fix data recovery issue after node disconnection
Fix: Trigger self-heal on read when shards missing from rejoined nodes
Nov 17, 2025
* Initial plan * Replace CRC libraries with unified crc-fast implementation Co-authored-by: houseme <[email protected]> * fix * fix: replace low to Normal --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: houseme <[email protected]> Co-authored-by: houseme <[email protected]>
houseme
approved these changes
Nov 17, 2025
houseme
added a commit
that referenced
this pull request
Nov 17, 2025
* Initial plan * Fix large file upload freeze by increasing StreamReader buffer size Co-authored-by: houseme <[email protected]> * Add comprehensive documentation for large file upload freeze fix Co-authored-by: houseme <[email protected]> * upgrade s3s version * Fix compilation error: use BufReader instead of non-existent StreamReader::with_capacity Co-authored-by: houseme <[email protected]> * Update documentation with correct BufReader implementation Co-authored-by: houseme <[email protected]> * add tokio feature `io-util` * Implement adaptive buffer sizing based on file size Co-authored-by: houseme <[email protected]> * Constants are managed uniformly and fmt code * fix * Fix: Trigger self-heal on read when shards missing from rejoined nodes (#871) * Initial plan * Fix: Trigger self-heal when missing shards detected during read - Added proactive heal detection in get_object_with_fileinfo - When reading an object, now checks if any shards are missing even if read succeeds - Sends low-priority heal request to reconstruct missing shards on rejoined nodes - This fixes the issue where data written during node outage is not healed when node rejoins Co-authored-by: houseme <[email protected]> * fix * Unify CRC implementations to crc-fast (#873) * Initial plan * Replace CRC libraries with unified crc-fast implementation Co-authored-by: houseme <[email protected]> * fix * fix: replace low to Normal --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: houseme <[email protected]> Co-authored-by: houseme <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: houseme <[email protected]> Co-authored-by: houseme <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: houseme <[email protected]> Co-authored-by: houseme <[email protected]>
15 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Type of Change
Related Issues
Summary of Changes
Problem: When a node rejoins after being offline during writes, reads succeed using available shards but never trigger healing. Missing shards on the rejoined node remain unrecovered indefinitely, degrading redundancy protection.
Root Cause: Self-heal only triggered on decode errors. If enough shards exist to satisfy read quorum (e.g., 3 of 4 data shards), read succeeds silently despite missing shards.
Fix: Added proactive missing-shard detection in
get_object_with_fileinfo:Uses low priority to avoid interfering with critical heal operations. Restores full redundancy automatically on first read after node recovery.
Checklist
make pre-commitImpact
Additional Notes
Changed file:
crates/ecstore/src/set_disk.rs(+27 lines)Scenario this fixes:
myname.zip→ shards on node1,3,4 onlymyname.zip→ now triggers heal, rebuilds node2 shardsThank you for your contribution! Please ensure your PR follows the community standards (CODE_OF_CONDUCT.md) and sign the CLA if this is your first contribution.
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
example.com/home/REDACTED/work/rustfs/rustfs/target/debug/deps/rustfs_ecstore-8b0e737f8ad72232(dns block)example.org/home/REDACTED/work/rustfs/rustfs/target/debug/deps/rustfs_ecstore-8b0e737f8ad72232(dns block)server/home/REDACTED/work/rustfs/rustfs/target/debug/deps/rustfs_ecstore-8b0e737f8ad72232(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.