-
Notifications
You must be signed in to change notification settings - Fork 759
Fix Admin Heal API and Add Pagination Support for Large Buckets #933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fix heal bucket objects with pagination
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes the admin heal API by adding a missing endpoint and introduces pagination support for healing operations to efficiently handle large buckets without loading all objects into memory.
- Added
/v3/heal/{bucket}endpoint to heal entire buckets without requiring a prefix parameter - Implemented
list_objects_for_heal_page()method with continuation token support (page size: 1000 objects) - Refactored
heal_bucket_with_resume()andheal_bucket()to process objects page-by-page instead of loading all at once - Added custom
Serialize/Deserializeimplementations forHealScanModeenum to support both integer and string formats
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| rustfs/src/admin/mod.rs | Added missing /v3/heal/{bucket} endpoint for healing buckets without prefix |
| crates/common/src/heal_channel.rs | Implemented custom serialization for HealScanMode enum to support multiple formats; added serde attributes to HealOpts fields |
| crates/ahm/src/heal/storage.rs | Added list_objects_for_heal_page() method with pagination support; refactored list_objects_for_heal() to use pagination internally |
| crates/ahm/src/heal/erasure_healer.rs | Refactored heal_bucket_with_resume() and heal_bucket() to process objects using pagination instead of loading all objects into memory |
| crates/ahm/src/heal/manager.rs | Enhanced log messages in auto disk scanner for better debugging |
| crates/ahm/src/scanner/node_scanner.rs | Added comment to scanning loop initialization |
| crates/ahm/src/scanner/data_scanner.rs | Commented out immediate data usage collection on startup |
| crates/ahm/src/heal/channel.rs | Improved heal start request log message to include object prefix; updated test mock with new pagination method |
| crates/ahm/tests/heal_bug_fixes_test.rs | Updated test mock to implement new list_objects_for_heal_page() method |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // 2. process objects with pagination to avoid loading all objects into memory | ||
| let mut continuation_token: Option<String> = None; | ||
| let mut global_obj_idx = 0usize; | ||
|
|
||
| loop { | ||
| // Get one page of objects | ||
| let (objects, next_token, is_truncated) = self | ||
| .storage | ||
| .list_objects_for_heal_page(bucket, "", continuation_token.as_deref()) | ||
| .await?; | ||
|
|
||
| // Check if object still exists before attempting heal | ||
| let object_exists = match self.storage.object_exists(bucket, object).await { | ||
| Ok(exists) => exists, | ||
| Err(e) => { | ||
| warn!("Failed to check existence of {}/{}: {}, marking as failed", bucket, object, e); | ||
| *failed_objects += 1; | ||
| checkpoint_manager.add_failed_object(object.clone()).await?; | ||
| *current_object_index = obj_idx + 1; | ||
| // Process objects in this page | ||
| for object in objects { | ||
| // Skip objects before the checkpoint | ||
| if global_obj_idx < *current_object_index { | ||
| global_obj_idx += 1; | ||
| continue; | ||
| } |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resume logic with pagination is inefficient. When resuming from checkpoint object index N, the code fetches and iterates through all pages from the beginning, skipping N objects one by one (lines 273-276). For large buckets, this means fetching thousands of objects just to skip them.
Consider using the start_after parameter in list_objects_v2 (which is available but currently passed as None on line 552) to resume directly from the last processed object name instead of skipping by index. This would require storing the last object name in the checkpoint rather than just the index.
| } | ||
| // check if already processed | ||
| if checkpoint_manager.get_checkpoint().await.processed_objects.contains(&object) { | ||
| global_obj_idx += 1; |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent checkpoint index update when skipping already-processed objects. When an object is found in processed_objects set (line 279), the code increments global_obj_idx but doesn't update *current_object_index. This is inconsistent with error handling paths (lines 297, 310) and the successful processing path (line 342), which all update *current_object_index = global_obj_idx.
This inconsistency could cause issues if the heal operation is interrupted and resumed - the checkpoint index might not accurately reflect the actual progress. Consider adding *current_object_index = global_obj_idx; at line 280 for consistency.
| global_obj_idx += 1; | |
| global_obj_idx += 1; | |
| *current_object_index = global_obj_idx; |
| // start scanning loop | ||
| let scanner_clone = self.clone_for_background(); | ||
| tokio::spawn(async move { | ||
| // update object count and size for each bucket |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "update object count and size for each bucket" doesn't accurately describe what the code does. The code is spawning a task that runs scan_loop_with_resume(None), which is a generic scanning loop, not specifically updating object counts and sizes. This comment appears to be a copy-paste error or placeholder. Consider removing it or providing a more accurate description of what scan_loop_with_resume does.
| // update object count and size for each bucket | |
| // spawn background scanning loop |
|
|
||
| // Initialize and start the node scanner | ||
| self.node_scanner.initialize_stats().await?; | ||
| // update object count and size for each bucket |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "update object count and size for each bucket" doesn't accurately describe what the code does. The code is calling node_scanner.start(), which initializes and starts the node scanner, not specifically updating object counts and sizes. This comment appears to be a copy-paste error or placeholder. Consider removing it or providing a more accurate description.
| // update object count and size for each bucket |
Co-authored-by: Copilot <[email protected]>
fix heal bucket objects with pagination
…s/rustfs into dependabot/cargo/dependencies-1126 * 'dependabot/cargo/dependencies-1126' of github.com:rustfs/rustfs: Fix Admin Heal API and Add Pagination Support for Large Buckets (#933)
Fix Admin Heal API and Add Pagination Support for Large Buckets
Summary
This PR fixes the admin heal API endpoint and introduces pagination support for healing operations to handle large buckets efficiently without loading all objects into memory.
Changes
1. Admin Heal API Fix
/v3/heal/{bucket}endpoint to support healing entire buckets without requiring a prefix/v3/heal/{bucket}/{prefix}was available, which required a prefix parameter2. Pagination Support for Object Listing
New API: Added
list_objects_for_heal_page()method toHealStorageAPItraitRefactored heal operations:
heal_bucket_with_resume(): Now processes objects page-by-page instead of loading all at onceheal_bucket(): Updated to use pagination for better memory efficiency3. HealScanMode Serialization Improvements
SerializeandDeserializeforHealScanModeenumType of Change
Related Issues
Summary of Changes
Checklist
make pre-commitImpact
Additional Notes
Thank you for your contribution! Please ensure your PR follows the community standards (CODE_OF_CONDUCT.md) and sign the CLA if this is your first contribution.