Skip to content

Interrupt subgraph_connectivity if it makes no progress#8691

Merged
generall merged 3 commits into
devfrom
stop-subgraph-connectivity-no-progress
Apr 16, 2026
Merged

Interrupt subgraph_connectivity if it makes no progress#8691
generall merged 3 commits into
devfrom
stop-subgraph-connectivity-no-progress

Conversation

@agourlay
Copy link
Copy Markdown
Member

  • Fix infinite loop in GraphLayersBuilder::subgraph_connectivity when the selected entry point has no outgoing links on any layer. spent_budget is only incremented while iterating a point's outgoing links,
    so an isolated entry point leaves it stuck at 0 forever — the retry loop never observes spent_budget > SUBGRAPH_CONNECTIVITY_SEARCH_BUDGET. This was seen in production: the HNSW build thread spinning
    inside subgraph_connectivityHNSWIndex::buildSegmentOptimizer::optimize, which in turn blocked the consensus thread (apply_entries waiting on the collection lock held by the optimizer), hanging
    the whole node.
  • Break the retry loop when an iteration consumes zero budget. Sound because the graph is immutable across retries and the coin flip only gates following enumerated links, not enumerating them — if BFS
    enumerated zero edges once, it will every time. Return value in that case is 1.0 / points.len(), truthfully reflecting an isolated entry point.
  • Add regression test test_subgraph_connectivity_isolated_entry_point_does_not_hang. Runs the call on a background thread with a 2s deadline so the test runner isn't wedged if the bug regresses. Fails in ~2s
    on the old code, passes in ~0.02s with the fix.

Test plan

  • New regression test passes (cargo test -p segment --lib test_subgraph_connectivity_isolated_entry_point_does_not_hang)
  • Full hnsw_index:: test suite passes (38/38), including test_graph_connectivity which exercises the normal, non-isolated path
  • Manually verify on the affected deployment that optimizer-driven HNSW builds no longer hang

@agourlay agourlay marked this pull request as ready for review April 16, 2026 12:07
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3e8f3a14-2df0-4113-9830-1422fcdf3050

📥 Commits

Reviewing files that changed from the base of the PR and between 28bacfb and 9027535.

📒 Files selected for processing (1)
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs

📝 Walkthrough

Walkthrough

Added a termination condition to GraphLayersBuilder::subgraph_connectivity so the retry loop also breaks when an iteration makes no progress (i.e., spent_budget == budget_before_iteration / BFS traversed zero edges), in addition to the existing budget-exceeded check. Introduced a regression test test_subgraph_connectivity_isolated_entry_point_does_not_hang that constructs a level-0 graph with no links, runs subgraph_connectivity on a background thread, and asserts it completes within 2 seconds.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and concisely describes the main fix: adding an early termination condition to subgraph_connectivity when it makes no progress.
Description check ✅ Passed The description thoroughly explains the bug, the fix, and the testing approach; it is clearly related to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch stop-subgraph-connectivity-no-progress

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/segment/src/index/hnsw_index/graph_layers_builder.rs`:
- Around line 975-985: The test currently only checks handle.is_finished() which
will be true even if the worker thread panicked; after the timeout loop, call
handle.join() (or match its Result) to ensure the thread completed successfully
and assert/join.expect that it did not panic; specifically, after spawning the
thread that runs builder_clone.subgraph_connectivity(&points_clone, 0.5) and
after the deadline wait, invoke handle.join() and assert the join returned
Ok(()) (or unwrap/expect with a clear message) so a panic inside
subgraph_connectivity fails the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e5c67054-9e47-46fd-b7d2-9a348becdad8

📥 Commits

Reviewing files that changed from the base of the PR and between c01d16f and 28bacfb.

📒 Files selected for processing (1)
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs

Comment thread lib/segment/src/index/hnsw_index/graph_layers_builder.rs Outdated
@generall generall self-requested a review April 16, 2026 13:28
@generall generall merged commit d94430a into dev Apr 16, 2026
15 checks passed
@generall generall deleted the stop-subgraph-connectivity-no-progress branch April 16, 2026 13:29
timvisee pushed a commit that referenced this pull request May 8, 2026
* Interrupt subgraph_connectivity if it makes no progress

* less synthetic test

* test unwrap res
@timvisee timvisee mentioned this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants