-
Notifications
You must be signed in to change notification settings - Fork 715
fix: improve write flush logic #8208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR fixes a critical race condition bug in the ingester's writer flush logic. The core issue was in the flush_all() function where the original implementation would attempt to close writers first and then remove them from the WRITERS collection. If the close operation failed, writers would remain in the WRITERS collection indefinitely, causing subsequent TTL checks to find stale writers and attempt operations on closed channels, resulting in "channel closed" errors.
The fix reverses the order of operations - writers are now removed from the WRITERS collection first using w.remove(&key), and then the returned writer is flushed using r.flush().await?. This ensures that even if the flush operation fails, the writer is already cleaned up from the collection and won't cause future issues. Additionally, the method has been renamed from close() to flush() to better reflect its actual behavior, which involves WAL synchronization and memtable rotation rather than just resource cleanup.
The change also adds improved logging with log::info! statements to provide better observability into the flush process, helping with debugging and monitoring. This fix integrates well with the existing ingester architecture by maintaining the same external interface while improving the internal cleanup reliability.
Confidence score: 4/5
- This PR addresses a well-defined bug with a logical solution that prevents resource leaks
- Score reflects solid understanding of the race condition and appropriate fix, though the change touches critical ingester logic
- Pay close attention to the flush operation error handling and ensure proper testing of failure scenarios
1 file reviewed, no comments
PR Code Suggestions ✨Explore these optional code suggestions:
|
User description
The old logic will close write first then remove from WRITERS, but if close got problem, it never remove from WRITERS, then check ttl will get the writer and try to flush it, at the end, got error:
PR Type
Bug fix, Enhancement
Description
Replace writer close with safe flush
Remove-before-flush to avoid stale WRITERS
Add informative logging around flush lifecycle
Maintain metrics on file count decrement
Diagram Walkthrough
File Walkthrough
writer.rs
Safer writer flush and map removal orderingsrc/ingester/src/writer.rs