Skip to content

Conversation

@oripress
Copy link
Contributor

@oripress oripress commented Sep 28, 2025

Adds an early-exit (“break”) guard to the Kruskal loop in the MST routines so the algorithm stops as soon as the spanning tree is complete.
We note that further optimizations are possible — for example a path-compression / union-by-rank version of the DSU (“Disjoint Set Union”) structure (commonly used in Kruskal) could give further speed-ups.
However, to keep scope minimal and avoid duplicating code, we defer the DSU overhaul and other sort-tuning optimisations to a future PR.

Original PR Description:
This PR adds an optional optimized backend for minimum/maximum spanning edges:
kruskal_mst_edges_opt, available via algorithm="kruskal_opt" in
minimum_spanning_edges / maximum_spanning_edges.

The PR is based off AlgoTuner generated code using GLM 4.5, see the full trajectory here.

Graph Type Mode n p kruskal median (s) kruskal_opt median (s) Speedup ×
Graph max 3000 0.001500 0.017873 0.013873 1.29
Graph max 3000 0.002500 0.026041 0.021647 1.20
Graph max 5000 0.001500 0.044824 0.038733 1.16
Graph max 5000 0.002500 0.070591 0.058161 1.21
Graph max 8000 0.001500 0.123916 0.108668 1.14
Graph max 8000 0.002500 0.196784 0.153247 1.28
Graph min 3000 0.001500 0.018044 0.013686 1.32
Graph min 3000 0.002500 0.026312 0.021060 1.25
Graph min 5000 0.001500 0.044379 0.037353 1.19
Graph min 5000 0.002500 0.068922 0.057766 1.19
Graph min 8000 0.001500 0.123356 0.103145 1.20
Graph min 8000 0.002500 0.196174 0.149423 1.31

Overall geometric-mean speedup across groups: 1.23×

Note: AlgoTune shows an even greater speedup, when running on larger graphs.

Tests:
Added: networkx/algorithms/tree/tests/test_mst_kruskal_opt.py
Cases cover Graph/MultiGraph, min/max, ignore_nan, partition semantics, and parity (edge count + total weight) vs kruskal.
Full test suite passes locally.

@oripress oripress marked this pull request as draft September 28, 2025 15:26
@oripress oripress marked this pull request as ready for review September 28, 2025 15:49
@oripress
Copy link
Contributor Author

Hi! This PR adds functionality but I don't have permissions to add a label.
Could a maintainer please add the type: Enhancements label? Thanks!

@dschult
Copy link
Member

dschult commented Sep 28, 2025

The CI doc build test failure is likely to be unrelated. We've had a couple other PRs recently with this same error.
Unfortunately that means we can't look at the docs for how they render the new functionality. Hopefully the doc build error will get cleaned up soon.

Thanks for this!!

@amcandio
Copy link
Contributor

I think it's better to replace the current Kruskal implementation than add a new one that is faster. I don't see the value of having both

@oripress
Copy link
Contributor Author

Sure thing @amcandio, fixed in the latest commit.

@oripress
Copy link
Contributor Author

I implemented the requested changes in the latest commit @amcandio. Thanks for the feedback!

@oripress
Copy link
Contributor Author

oripress commented Oct 9, 2025

Would it be possible to merge this change? If not, what are the fixes needed?

Thanks!

@dschult
Copy link
Member

dschult commented Oct 9, 2025

This seems to be the only thing driving the optimization. Can we minimize the code changes to the only impacting speed?

I think this is the main comment not addressed. Are all these changes needed for speed? Which are and which are for other reasons (and what other reasons)?

@oripress
Copy link
Contributor Author

I ran some ablations based on your feedback:

  • upstream_main: current NetworkX main.
  • upstream_main+early_break: upstream plus early exit at lines 275,282.
  • upstream_main+early_break+custom_dsu: adds custom DSU at lines 238-262.
  • upstream_main+early_break+custom_dsu+new_sort: also swaps the weight ordering and removes the final sort at lines 213-236.
  • full_pr_no_early_break: PR code with the early-break disabled (keeps the loop guard).

Average speedup vs upstream_main (GNM n=3k/5k/8k/10k/12k, max/min, 5 seeds × min-of-2 repeats)

variant avg speedup vs upstream_main
upstream_main 1.00x
upstream_main+early_break 0.99x
upstream_main+early_break+custom_dsu 1.19x
upstream_main+early_break+custom_dsu+new_sort 1.28x
full_pr_no_early_break 1.18x

Based on this, it seems like the speedup comes from all components, and not just the early break.

@amcandio
Copy link
Contributor

amcandio commented Nov 22, 2025

I ran some ablations based on your feedback:

  • upstream_main: current NetworkX main.
  • upstream_main+early_break: upstream plus early exit at lines 275,282.
  • upstream_main+early_break+custom_dsu: adds custom DSU at lines 238-262.
  • upstream_main+early_break+custom_dsu+new_sort: also swaps the weight ordering and removes the final sort at lines 213-236.
  • full_pr_no_early_break: PR code with the early-break disabled (keeps the loop guard).

Average speedup vs upstream_main (GNM n=3k/5k/8k/10k/12k, max/min, 5 seeds × min-of-2 repeats)

variant avg speedup vs upstream_main
upstream_main 1.00x
upstream_main+early_break 0.99x
upstream_main+early_break+custom_dsu 1.19x
upstream_main+early_break+custom_dsu+new_sort 1.28x
full_pr_no_early_break 1.18x
Based on this, it seems like the speedup comes from all components, and not just the early break.

Hey Ori, sorry I missed this reply! Would you mind sharing the different versions for upstream_main+early_break, upstream_main+early_break+custom_dsu+new_sort, etc? We need to make sure we understand what is driving the optimization!

@oripress oripress force-pushed the feat/kruskal-opt-mst branch from 531b15e to a59c708 Compare November 23, 2025 00:49
@oripress
Copy link
Contributor Author

Hi,
Here is the breakdown of the variants, I hope this clears it up:

  1. upstream_main: current NetworkX main.
  2. upstream_main+early_break: upstream plus the early exit at lines 265,273.
  3. upstream_main+early_break+custom_dsu: adds the custom DSU (lines 228–252).
  4. upstream_main+early_break+custom_dsu+new_sort: swaps the weight ordering
    and removes the final sort (previous commit, lines 205–236,285).
  5. full_pr_no_early_break: same as (4) but without the early-exit guard.

I re-benchmarked every variant with the current version and the deduped (lines 205-226). I'm using a different machine so I couldn't recreate the previous benchmarks exactly; we see slightly different speedup times:

Variant Avg runtime no dedupe (s) Speedup no dedupe Avg runtime dedupe (s) Speedup dedupe
upstream_main 0.183705 1.00× 0.183705 1.00×
upstream_main+early_break 0.171367 1.07× 0.171367 1.07×
upstream_main+early_break+custom_dsu 0.154113 1.19× 0.154113 1.19×
upstream_main+early_break+custom_dsu+new_sort 0.173472 1.06× 0.171413 1.07×
full_pr_no_early_break 0.172091 1.07× 0.174827 1.05×

The current version on this branch is the fastest, variant, i.e.: upstream_main+early_break+custom_dsu (deduped).

@amcandio
Copy link
Contributor

amcandio commented Nov 23, 2025

Interesting, so speed up comes from using early stop and doing path halving in union-find data structure. I missed the path halving thing before!

I'm +1 on and adding the early break in this PR. But imho the union-find optimizations should be added in the specific module https://networkx.org/documentation/stable/_modules/networkx/utils/union_find.html. This way we avoid code duplication and we let other algorithms benefit from it. It's probably better doing this in a separate PR.

@oripress oripress force-pushed the feat/kruskal-opt-mst branch from 93322f2 to cbe7bbb Compare November 23, 2025 13:42
@oripress
Copy link
Contributor Author

Done!

Copy link
Contributor

@amcandio amcandio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I approve this PR. Can you also update the description to reflect current code changes and the path halving opportunity for union find?

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. This looks like a branch with only the early exit code.
Can you check if this is what you want for the current PR?

@oripress
Copy link
Contributor Author

I updated the title + description now to better reflect the current state of the PR.

@dschult
Copy link
Member

dschult commented Dec 18, 2025

Did you press "save" after changing the title? It looks like the title is still the same one.

Otherwise this PR looks good. :)

@oripress oripress changed the title Add optimized Kruskal MST backend (kruskal_opt) Kruskal MST early break for efficiency Dec 18, 2025
@oripress
Copy link
Contributor Author

You're right, I fixed the title now. Thanks again.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was hard to digest the original benchmark results as they were tied to the original proposal and not re-runnable. I went ahead and added minimum_spanning_tree to the benchmark suite (see #8403) which clearly demonstrates the benefits of early termination. Here's what I see on my machine with:

asv continuous --bench UndirectedAlgorithmBenchmarks.time_minimum_spanning_tree_kruskal main feat/kruskal-opt-mst
asv compare main feat/kruskal-opt-mst
Benchmark results
All benchmarks:
Change Before [cb824d7] After [cbe7bbb] <feat/kruskal-opt-mst> Ratio Benchmark (Parameter)
32.4±0.02ms 32.5±0.06ms 1 benchmark_algorithms.UndirectedAlgorithmBenchmarks.time_minimum_spanning_tree_kruskal('Drug Interaction network')
478±3μs 434±20μs 0.91 benchmark_algorithms.UndirectedAlgorithmBenchmarks.time_minimum_spanning_tree_kruskal('Erdos Renyi (100, 0.1)')
- 1.58±0.02ms 948±8μs 0.6 benchmark_algorithms.UndirectedAlgorithmBenchmarks.time_minimum_spanning_tree_kruskal('Erdos Renyi (100, 0.5)')
- 2.70±0.01ms 1.47±0ms 0.54 benchmark_algorithms.UndirectedAlgorithmBenchmarks.time_minimum_spanning_tree_kruskal('Erdos Renyi (100, 0.9)')

The early termination makes a more and more significant difference as the density of the graph increases, as expected. This LGTM, thanks @oripress @dschult and @amcandio for the thoughtful review!

@dschult dschult merged commit e55ffa3 into networkx:main Dec 18, 2025
52 checks passed
@dschult dschult added this to the 3.6 milestone Dec 18, 2025
@amcandio
Copy link
Contributor

Very cool, thanks for the benchmarks @rossbar !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants