Interrupt subgraph_connectivity if it makes no progress#8691
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdded a termination condition to Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/segment/src/index/hnsw_index/graph_layers_builder.rs`:
- Around line 975-985: The test currently only checks handle.is_finished() which
will be true even if the worker thread panicked; after the timeout loop, call
handle.join() (or match its Result) to ensure the thread completed successfully
and assert/join.expect that it did not panic; specifically, after spawning the
thread that runs builder_clone.subgraph_connectivity(&points_clone, 0.5) and
after the deadline wait, invoke handle.join() and assert the join returned
Ok(()) (or unwrap/expect with a clear message) so a panic inside
subgraph_connectivity fails the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e5c67054-9e47-46fd-b7d2-9a348becdad8
📒 Files selected for processing (1)
lib/segment/src/index/hnsw_index/graph_layers_builder.rs
* Interrupt subgraph_connectivity if it makes no progress * less synthetic test * test unwrap res
GraphLayersBuilder::subgraph_connectivitywhen the selected entry point has no outgoing links on any layer.spent_budgetis only incremented while iterating a point's outgoing links,so an isolated entry point leaves it stuck at
0forever — the retryloopnever observesspent_budget > SUBGRAPH_CONNECTIVITY_SEARCH_BUDGET. This was seen in production: the HNSW build thread spinninginside
subgraph_connectivity→HNSWIndex::build→SegmentOptimizer::optimize, which in turn blocked the consensus thread (apply_entrieswaiting on the collection lock held by the optimizer), hangingthe whole node.
enumerated zero edges once, it will every time. Return value in that case is
1.0 / points.len(), truthfully reflecting an isolated entry point.test_subgraph_connectivity_isolated_entry_point_does_not_hang. Runs the call on a background thread with a 2s deadline so the test runner isn't wedged if the bug regresses. Fails in ~2son the old code, passes in ~0.02s with the fix.
Test plan
cargo test -p segment --lib test_subgraph_connectivity_isolated_entry_point_does_not_hang)hnsw_index::test suite passes (38/38), includingtest_graph_connectivitywhich exercises the normal, non-isolated path