perf: Make A* tie break on lower h-values by Dietr1ch · Pull Request #813 · petgraph/petgraph

Dietr1ch · 2025-06-03T00:28:44Z

This implements the trick from #806 and adds an empty 2d maze benchmark

Update: This dropped original breaking change of making cost be core::ops::Sub, so it can be merged earlier.

Dietr1ch · 2025-06-03T01:04:57Z

src/algo/mod.rs

I figured that adding Sub<M, Output = M> would be ok, but I guess I can move the extra requirement into A*.

Adding this was needed to recover g as f-g

Moving it to A* will also be a breaking change

It's a smaller breaking change though. I moved it to scope down the breakage, but I feel that a Measure should be Sub<Self, Output = Self>.

I'm not sure if these breakages are meaningfully different for the library though. It might not be worth making the distinction and having the intermediate breakage.

In the comment right below this I already stated on how a breaking change might affect the merger of this PR. As also stated there, I would suggest trying to solve the problem without an API change (that is using addition instead of Sub). Unfortunately, the "size" of the breaking change does not matter, as any breaking change counts as breaking and will thus be delayed since a major release would be necessary.

The situation currently is, that probably 0.9 there are gonna be a lot of changes to the current internal implementation of things ( in particular to a lot of traits ) and thus, I am not sure whether this change will even be applicable anymore. Hence, I would either recommend to try to implement it now, or pause the PR until then and see how it fits into the possibly new design with 0.9 :)

And where is this 0.9 version being drafted? I see the next branch is stale and there's no other branch that stands out.

There's a hack to implement this without Sub nor Neg a the expense of the heap entries being larger that should work. Should I just go with that and leave a note about the possible data trim?

Yes, I would second the solution you suggested :) At least if the benchmarks show that it is actually worth it ! Otherwise we'll have to just pause this PR and see until 0.9 is around the corner.

Regarding the branches: The next branch is supposed to be the reworked one. As you correctly pointed out, currently it is not being worked on and is thus behind the master branch because the main author/maintainer took a pause and said they'd resume work in the course of this year :)

It's definitely worth it, but yes, you shouldn't take my word for granted. It'd be nice to have more benchmarks available. I'll try to come up with something different, yet simple, but I'm past my detouring quota already, so it might take me some time.

RaoulLuque · 2025-06-03T07:14:00Z

First off, thanks for taking the time to create a PR 🦕

Okay, well changing the Measure trait / the trait bounds on A* would be a breaking change and thus move the adoption of this change to earliest 0.9. With 0.9, the trait system might have changed drastically, so one would have to see how this change might fit in.

Would there possibly be a way to express the subtraction with the current definition of Measure (i.e. with addition)? I am thinking about something like keeping track of all three relevant values f,g,h, or only g and h somehow, since we can construct f from both of them?

Also the Partial Ord trait bound on K seems redundant since Measure is a subtrait of PartialOrd already.

Additionally, we usually call the benchmark files according to the algorithms. I am kind of surprised there is no file for A* yet, but please rename the file accordingly :) Moreover, since there are not other benchmarks yet, I would really like 1/2 more benchmarks which might not be as biased towards the optimization. Feel free to look at something like Dijkstra benchmarks / the build_graph function in benches/common/factories.rs to create some sufficiently large graphs (of course you'd have to check which nodes are connected and sufficiently far away from each other).

At last, please rename your PR to fit the conventional commit/PR format.

starovoid · 2025-06-03T08:24:04Z

Since the algorithm has a fairly large number of parameters, it already makes sense to switch to instance implementation. There have already been such proposals, for example #359. It won't help to get rid of the breaking changes, but we could collect all the suggestions about A* and prepare them to be combined for the new major release.

RaoulLuque · 2025-06-03T08:45:07Z

Since the algorithm has a fairly large number of parameters, it already makes sense to switch to instance implementation. There have already been such proposals, for example #359. It won't help to get rid of the breaking changes, but we could collect all the suggestions about A* and prepare them to be combined for the new major release.

Good point, I haven't thought about combining the changes with previous efforts 👍
Are you sure you mean major release? Since to me that would mean 0.9 and I think we could incorporate some optimizations before this release already.
Also, what do you mean by "instance implementation"?

starovoid · 2025-06-03T10:50:23Z

Also, what do you mean by "instance implementation"?

I mean the approach to describing an algorithm through a structure. For example, in #359:

pub struct AstarInstance<G, F, H, K, IsGoal>
where
    G: IntoEdges + Visitable,
    IsGoal: FnMut(G::NodeId) -> bool,
    G::NodeId: Eq + Hash,
    F: FnMut(G::EdgeRef) -> K,
    H: FnMut(G::NodeId) -> K,
    K: Measure + Copy,
{
    graph: G,
    is_goal: IsGoal,
    edge_cost: F,
    estimate_cost: H,

    estimate_costs: HashMap<G::NodeId, K>,
}

I think we could incorporate some optimizations before this release already.

All non-breaking optimizations can indeed be included in the upcoming minor releases, but so far it seems to me that the most "interesting" thing is still breaking. I will be glad to make A* roadmap after the release of 0.8.2 and think about the sequence of changes in more detail.

RaoulLuque · 2025-06-03T12:20:17Z

Ah, that makes sense. Thanks for the explanation 🙏

That kind of roadmap sounds great :)

I actually think that this optimization could be implemented without changing the API, no? As I stated above, instead of using subtraction, one could use addition. There would be some overhead, but I'd think that it is negligible in comparison to the speedup :) If I have time, I'll try to work on an example implementing that change, including some benchmarks 🌞

In any case, I agree that we could look at actually incorporating such optimizations after 0.8.2 👍

Dietr1ch · 2025-06-03T16:45:59Z

The implementation needs to rank with either (f, h) (using Sub to recover g) or (f, -g) (which requires Neg).

Alternatively, you could rank with (f, h, g) to keep all 3 values in the heap, which seems a bit silly, but would work without adding new requirements to K (nor Measure).

Oh, BTW having a structure representing a run of A* allows to implement Iterator and offer stats to peer into the algorithm's execution internals.

My own experimental implementation[^0] looks like this,

struct AStarSearch {..}

impl AStarSearch {
  pub fn new(..) -> Self {..}
  pub fn find_next_goal(&mut self) -> Option<Path> {..}
  
  #[cfg(feature = "stats")]
  pub fn stats() -> Stats {..}
}

impl Iterator for AStarSearch {
    type Item = Path;
    fn next(&mut self) -> Option<Self::Item> {
        self.find_next_goal()
    }
}

This works around a breaking change. `K` should be `core::ops::Sub` to avoid storing data that could easily be derived. See the thread for how breaking compatibility would've delay this improvement into 0.9, - petgraph#813 (comment)

Dietr1ch · 2025-06-03T18:30:58Z

I implemented the non-breaking workaround since 0.9 doesn't have a release branch that can take the breaking change.

RaoulLuque · 2025-06-04T07:40:00Z

Nice!

Would you please incorporate the changes I requested in the comments above. That is, add more benchmarks than only the maze one. Also, please provide before/after benchmark result comparisons. In particular, a graph which is just a line would be interesting, since in that case the new version should loose the most performance with the unnecessary entries in the binary tree.

Once that is done and the benchmark results look good, I'll review the changes once I've got time :)

Dietr1ch · 2025-06-04T15:52:04Z

a graph which is just a line would be interesting, since in that case the new version should lose the most performance with the unnecessary entries in the binary tree.

Well, given the current implementation uses 2 hashmaps it should still lose to the new implementation as the overhead around the keys is larger than having a larger value.

Still interesting though, because that kind of graph stresses some of the internals the most, but the real optimisation comes from being significantly better at exploring the graph and not just faster at picking the next node. From my time doing research around this, measuring the number of nodes expanded was the best measure as time ends up being linear in them, but noisy and dependent on things that were harder to reproduce (machine (cpu, cache, mem and their timings), load, compiler, linker) and compare.

Dietr1ch · 2025-06-06T22:44:15Z

Thinking about breaking APIs, is there a plan to break Graph in 0.9?

I'm asking because the other classic problems that are also widely used for benchmarking might require an abstract Graph representation to avoid materialising the entire state space.[0]

The classic 15-puzzle shouldn't need building 16!/2 nodes and their edges. Instead keeping a [u8; 16] around and operating on it to generate nodes on the fly is more efficient.

[0]: It seems that Graph instantiates the whole graph. I can see how that's convenient for some graphs, but this struct should be behind some abstract trait. (Maybe I missed something when skimming around?)

RaoulLuque · 2025-06-07T15:36:13Z

I am not sure this is the right place for such a discussion. Apart from that, I think that I don't quite understand what you mean exactly. If you would like to go into detail about this or make a suggestion please consider creating something like a conversation in the discussion tab 👍

I am glad to review your PR tho once you incorporate the changes I asked for above. Please consider turning this into a draft PR until then :)

Dietr1ch · 2025-06-07T17:07:22Z

I moved the discussion about Graph to a proper discussion on #821.

I think the PR is in a state that can me merged, so I'm not making it a draft. If you feel strongly about adding more problems and benchmarks I think petgraph should make Graph more general before getting those.

RaoulLuque · 2025-06-08T11:21:37Z

Okay. In that case someone else will have to give their opinion I guess 👍 Maybe you want to chime in @starovoid ?

My point still stands that the benchmarks to compare the performance of the new and old variant should cover different cases and not only the one where we know that it will favor the new implementation 👍 If the benchmarks show improvements in every case I would be glad to review the changes :)

I also don't understand the argument of changing the graph type before adding more benchmarks, since if we implement this PR, it will mostly affect people using the graph type as it currently is and the new Astar (in particular most people probably don't implement their own graph types for virtualized/unmaterialized graphs, but instead use the types provided by petgraph). In that case it would be interesting to see exactly that combination, of how the current graph type performs in different scenarios with the modifications of Astar 👍

Dietr1ch · 2025-06-08T16:19:00Z

I also don't understand the argument of changing the graph type before adding more benchmarks

I said this thinking about adding more variety to the graphs. I can do the line graph, or a small road network, but to me the interesting graphs are the ones that are large enough to make the decision-making while exploring the graph more important than details like how quick is the hash table.

Representing the classical 16-puzzle I mentioned would take about 50TB-300TB of memory depending on how "clever" (naughty) you get. It's completely unreasonable to try doing it without "abstract" graph support.

This works around a breaking change. `K` should be `core::ops::Sub` to avoid storing data that could easily be derived. See the thread for how breaking compatibility would've delay this improvement into 0.9, - petgraph#813 (comment)

Dietr1ch · 2025-06-09T20:22:10Z

Got benchmarks on the "line" graph (actually an aisle since it leads to slightly more ranking work in the heap).

There's a 2x speedup on it, even though my change should've made things slightly slower, but this is just due to removing the ~redundant hashmap as I made the ranking adjustment.

 name                     dev__pathfinding_bench ns/iter  dev__astar_rank_tie_breaking ns/iter  diff ns/iter   diff %  speedup
 astar_maze2d_bench       3,873,653                       208,091                                 -3,665,562  -94.63%  x 18.62
 astar_maze2d_line_bench  161,513                         63,586                                     -97,927  -60.63%   x 2.54

BTW, I also have an environment commit that adds a simple flake.nix file and a simple bash script to run a benchmark comparing 2..N branches. I can contribute that too, but I'll fork that into a new PR.

RaoulLuque

The performance improvement is indeed very impressive !

I had some time to look through the files :) Since we don't have any scripts of sorts in petgraph, please remove the bench script you also added.

benches/astar.rs

benches/common/factories.rs

benches/astar.rs

benches/dijkstra.rs

benches/astar.rs

RaoulLuque · 2025-06-14T09:59:30Z

src/algo/astar.rs

    let mut visit_next = BinaryHeap::new();
-    let mut scores = HashMap::new(); // g-values, cost to reach the node
-    let mut estimate_scores = HashMap::new(); // f-values, cost to reach + estimate cost to goal
+    // A node -> (f, h, g) mapping. Should be (f, h) since g can be recovered.


Suggested change

// A node -> (f, h, g) mapping. Should be (f, h) since g can be recovered.

// A node -> (f, h, g) mapping. TODO: With a breaking change this should be (f, h) since g can be recovered if subtraction is available.

Seems clearer

tests/graph.rs

Dietr1ch · 2025-06-18T22:45:09Z

Oh, I screwed up by switching back and forth between laptop/workstation believing that my github remote was updated and ended up resolving the comments over an older commit that didn't do the ranking trick that avoids requiring Sub and breaking the API. I'll re-do that now.

Dietr1ch · 2025-06-19T00:14:29Z

I re-ran the bench on my workstation using the script I had around.

Same remarks apply.

The ranking speed-up numbers are kind of made up. Empty grids essentially compare dim^N vs dim, so if you want to see >1000x out of it, just crank up the problem size, but to keep things comparable we should settle in the current instance.
The ~2x speed-up in the 1-D version comes from using a single hashmap instead of two.
This is not super scientific as sampling-based benchmarks are naturally noisy and get worse when execution isn't parallel, but I've seen ~consistent 18x-22x on the 2D grid bench and 1.7x-2.5x on the "1D" grid on both, my laptop and workstation.

name	dev/pathfinding_bench ns/iter	dev/astar_rank_tie_breaking ns/iter	diff ns/iter	diff %	speedup
`astar_2d_grid_bench`	3,872,858	210,172	-3,662,686	-94.57%	x18.43
`astar_2d_grid_line_bench`	164,219	67,211	-97,008	-59.07%	x2.44

RaoulLuque · 2025-06-19T07:51:09Z

Alright, well if nothing else changed, LGTM :) 👍

Thanks for taking the suggestions into consideration and all the work you put in so far 🦕

Gonna ping @starovoid , so he can say how to proceed ☺️

RaoulLuque · 2025-06-19T07:56:46Z

Well, actually the only other thing I think we could do while we are at it, is add a quickcheck for astar. It could be similar to the one for dijkstra, checking whether computed paths satisfy the triangle inequality using something like the null heuristic?

It would go like right below the fn dijkstra_triangle_ineq in tests/quickcheck.rs

Dietr1ch · 2025-06-20T19:22:34Z

checking whether computed paths satisfy the triangle inequality using something like the null heuristic?

There's 2 strong arguments IMO against doing this.

First, it seems a lot like testing the test, so what's stopping us from unproductively recursing into testing^+ the test indefinitely? At some point you need to simply be able to stare at the tests long enough to convince yourself things are fine, and in this case I think we try hard enough by comparing against Dijkstra's 1->N optimal paths.
Second, this is something that when studying A* is explicitly warned against trying to do, since doing that is more expensive than running the search, so it'd only make sense if you know your problem in advance, but can't afford the memory to just run Floyd-Warshall once.

That said, there's a weaker version of this, which is testing the heuristic along the optimal path retrieved, which comes almost for free since you already went through it. Along this path h should't over-estimate c (h(a, b) <= c(a, b)). This is something that would expose the heuristic is inconsistent, but not enough to prove the path isn't optimal in case it's inconsistent, and neither enough to prove the heuristic is indeed consistent.

Proofs around heuristics often come from the ideas that allow you to have an heuristic in the first place, so I think are rarely needed once you see your heuristic is just an optimal solution in a relaxed version of your problem.

If you really want to add this weaker check, I suggest you just add some debug_assertions, but keep that away from the tests and runtime on release runs.

RaoulLuque · 2025-06-21T09:22:13Z

Thanks for the response and the insights. I am not sure however, whether we are on the same page here. I am not suggesting to add another unit test. Instead I am suggesting to add a quickcheck like the ones in tests/quickcheck.rs.

This would allow the algorithm (A*) to be tested on a multitude of random graphs instead of just the one from the unit test. I do see the point of possibly missing bugs when comparing with the triangle inequality since that should be given with A*, however it could still catch some bugs. However, we could also just compare it with Dijkstra in a Quickcheck setting as well. That would rely on Dijkstra working of course, but that seems like a reasonable assumption I think.

Maybe I also misunderstand you but it sounds like you thought I'd suggest adding another unit test. Instead I suggested adding a quickcheck which has the great benefit of systematically testing pseudo-random graphs, which make testing edge-case graphs way easier.

RaoulLuque · 2025-07-26T16:09:04Z

Hey, just to let you know. I would fork and pickup the PR myself in a few days and just implement the last suggestions regarding quickcheck to merge your work, since I think that it is very valuable :)

RaoulLuque · 2025-09-28T15:50:10Z

Closed in favor of #882

This PR continues the work from #813. The code there has already been reviewed and the only addition are two quickchecks in quickcheck.rs. So the only part of the code that would need to be reviewed is the changes in quickcheck.rs. --------- Co-authored-by: Dietrich Daroch <[email protected]>

Dietr1ch commented Jun 3, 2025

View reviewed changes

Dietr1ch changed the title ~~A* tie breaking on lower h-value~~ perf: Make A* tie break on lower h-values Jun 3, 2025

Dietr1ch changed the title ~~perf: Make A* tie break on lower h-values~~ perf!: Make A* tie break on lower h-values Jun 3, 2025

Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch 2 times, most recently from 07a7efa to 0579282 Compare June 3, 2025 16:05

Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch from 0579282 to 6351c9a Compare June 3, 2025 18:29

Dietr1ch changed the title ~~perf!: Make A* tie break on lower h-values~~ perf: Make A* tie break on lower h-values Jun 3, 2025

starovoid added this to the 0.8.3 milestone Jun 4, 2025

Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch from 6351c9a to 46bceb8 Compare June 9, 2025 20:22

starovoid added C-new-algorithm Category: A feature request for a new graph algorithm C-performance Category: A change motivated by improving speed, memory usage or compile times S-waiting-on-review Status: Awaiting review from the assignee but also other interested parties labels Jun 13, 2025

starovoid removed this from the 0.8.3 milestone Jun 13, 2025

starovoid added this to the 0.8.4 milestone Jun 13, 2025

RaoulLuque reviewed Jun 14, 2025

View reviewed changes

Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch from 46bceb8 to f5fe3d1 Compare June 18, 2025 22:33

Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch 2 times, most recently from a82a56b to 04d4070 Compare June 18, 2025 23:21

Dietr1ch added 2 commits June 18, 2025 19:24

test: Add a 2D Maze benchmark for A* and Dijkstra

96cdd01

Alg,A*: Tie break ranking on lower h-values

2b866b3

Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch from 04d4070 to 2b866b3 Compare June 18, 2025 23:26

RaoulLuque mentioned this pull request Sep 21, 2025

perf: Make A* tie break on lower h-values #882

Merged

RaoulLuque closed this Sep 28, 2025

RaoulLuque modified the milestones: 0.8.4, 0.8.3 Sep 28, 2025

	// A node -> (f, h, g) mapping. Should be (f, h) since g can be recovered.
	// A node -> (f, h, g) mapping. TODO: With a breaking change this should be (f, h) since g can be recovered if subtraction is available.

Conversation

Dietr1ch commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dietr1ch Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

RaoulLuque Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Dietr1ch Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

RaoulLuque Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dietr1ch Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

RaoulLuque Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dietr1ch Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

RaoulLuque commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

starovoid commented Jun 3, 2025

Uh oh!

RaoulLuque commented Jun 3, 2025

Uh oh!

starovoid commented Jun 3, 2025

Uh oh!

RaoulLuque commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dietr1ch commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dietr1ch commented Jun 3, 2025

Uh oh!

RaoulLuque commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dietr1ch commented Jun 4, 2025

Uh oh!

Dietr1ch commented Jun 6, 2025

Uh oh!

RaoulLuque commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dietr1ch commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RaoulLuque commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dietr1ch commented Jun 8, 2025

Uh oh!

Dietr1ch commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RaoulLuque left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RaoulLuque Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Dietr1ch commented Jun 3, 2025 •

edited

Loading

RaoulLuque Jun 3, 2025 •

edited

Loading

RaoulLuque Jun 4, 2025 •

edited

Loading

RaoulLuque commented Jun 3, 2025 •

edited

Loading

RaoulLuque commented Jun 3, 2025 •

edited

Loading

Dietr1ch commented Jun 3, 2025 •

edited

Loading

RaoulLuque commented Jun 4, 2025 •

edited

Loading

RaoulLuque commented Jun 7, 2025 •

edited

Loading

Dietr1ch commented Jun 7, 2025 •

edited

Loading

RaoulLuque commented Jun 8, 2025 •

edited

Loading

Dietr1ch commented Jun 9, 2025 •

edited

Loading

RaoulLuque left a comment •

edited

Loading

RaoulLuque Jun 14, 2025 •

edited

Loading

RaoulLuque commented Jun 19, 2025 •

edited

Loading

RaoulLuque commented Jun 19, 2025 •

edited

Loading

Dietr1ch commented Jun 20, 2025 •

edited

Loading