Skip to content

perf: Make A* tie break on lower h-values#813

Closed
Dietr1ch wants to merge 2 commits intopetgraph:masterfrom
Dietr1ch:dev/astar_rank_tie_breaking
Closed

perf: Make A* tie break on lower h-values#813
Dietr1ch wants to merge 2 commits intopetgraph:masterfrom
Dietr1ch:dev/astar_rank_tie_breaking

Conversation

@Dietr1ch
Copy link
Contributor

@Dietr1ch Dietr1ch commented Jun 3, 2025

This implements the trick from #806 and adds an empty 2d maze benchmark

Update: This dropped original breaking change of making cost be core::ops::Sub, so it can be merged earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that adding Sub<M, Output = M> would be ok, but I guess I can move the extra requirement into A*.

Adding this was needed to recover g as f-g

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it to A* will also be a breaking change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a smaller breaking change though. I moved it to scope down the breakage, but I feel that a Measure should be Sub<Self, Output = Self>.

I'm not sure if these breakages are meaningfully different for the library though. It might not be worth making the distinction and having the intermediate breakage.

Copy link
Member

@RaoulLuque RaoulLuque Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment right below this I already stated on how a breaking change might affect the merger of this PR. As also stated there, I would suggest trying to solve the problem without an API change (that is using addition instead of Sub). Unfortunately, the "size" of the breaking change does not matter, as any breaking change counts as breaking and will thus be delayed since a major release would be necessary.

The situation currently is, that probably 0.9 there are gonna be a lot of changes to the current internal implementation of things ( in particular to a lot of traits ) and thus, I am not sure whether this change will even be applicable anymore. Hence, I would either recommend to try to implement it now, or pause the PR until then and see how it fits into the possibly new design with 0.9 :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And where is this 0.9 version being drafted? I see the next branch is stale and there's no other branch that stands out.

There's a hack to implement this without Sub nor Neg a the expense of the heap entries being larger that should work. Should I just go with that and leave a note about the possible data trim?

Copy link
Member

@RaoulLuque RaoulLuque Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would second the solution you suggested :) At least if the benchmarks show that it is actually worth it ! Otherwise we'll have to just pause this PR and see until 0.9 is around the corner.

Regarding the branches: The next branch is supposed to be the reworked one. As you correctly pointed out, currently it is not being worked on and is thus behind the master branch because the main author/maintainer took a pause and said they'd resume work in the course of this year :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely worth it, but yes, you shouldn't take my word for granted. It'd be nice to have more benchmarks available. I'll try to come up with something different, yet simple, but I'm past my detouring quota already, so it might take me some time.

@RaoulLuque
Copy link
Member

RaoulLuque commented Jun 3, 2025

First off, thanks for taking the time to create a PR 🦕

Okay, well changing the Measure trait / the trait bounds on A* would be a breaking change and thus move the adoption of this change to earliest 0.9. With 0.9, the trait system might have changed drastically, so one would have to see how this change might fit in.

Would there possibly be a way to express the subtraction with the current definition of Measure (i.e. with addition)? I am thinking about something like keeping track of all three relevant values f,g,h, or only g and h somehow, since we can construct f from both of them?

Also the Partial Ord trait bound on K seems redundant since Measure is a subtrait of PartialOrd already.

Additionally, we usually call the benchmark files according to the algorithms. I am kind of surprised there is no file for A* yet, but please rename the file accordingly :) Moreover, since there are not other benchmarks yet, I would really like 1/2 more benchmarks which might not be as biased towards the optimization. Feel free to look at something like Dijkstra benchmarks / the build_graph function in benches/common/factories.rs to create some sufficiently large graphs (of course you'd have to check which nodes are connected and sufficiently far away from each other).

At last, please rename your PR to fit the conventional commit/PR format.

@starovoid
Copy link
Collaborator

Since the algorithm has a fairly large number of parameters, it already makes sense to switch to instance implementation. There have already been such proposals, for example #359. It won't help to get rid of the breaking changes, but we could collect all the suggestions about A* and prepare them to be combined for the new major release.

@RaoulLuque
Copy link
Member

Since the algorithm has a fairly large number of parameters, it already makes sense to switch to instance implementation. There have already been such proposals, for example #359. It won't help to get rid of the breaking changes, but we could collect all the suggestions about A* and prepare them to be combined for the new major release.

Good point, I haven't thought about combining the changes with previous efforts 👍
Are you sure you mean major release? Since to me that would mean 0.9 and I think we could incorporate some optimizations before this release already.
Also, what do you mean by "instance implementation"?

@starovoid
Copy link
Collaborator

Also, what do you mean by "instance implementation"?

I mean the approach to describing an algorithm through a structure. For example, in #359:

pub struct AstarInstance<G, F, H, K, IsGoal>
where
    G: IntoEdges + Visitable,
    IsGoal: FnMut(G::NodeId) -> bool,
    G::NodeId: Eq + Hash,
    F: FnMut(G::EdgeRef) -> K,
    H: FnMut(G::NodeId) -> K,
    K: Measure + Copy,
{
    graph: G,
    is_goal: IsGoal,
    edge_cost: F,
    estimate_cost: H,

    estimate_costs: HashMap<G::NodeId, K>,
}

I think we could incorporate some optimizations before this release already.

All non-breaking optimizations can indeed be included in the upcoming minor releases, but so far it seems to me that the most "interesting" thing is still breaking. I will be glad to make A* roadmap after the release of 0.8.2 and think about the sequence of changes in more detail.

@RaoulLuque
Copy link
Member

RaoulLuque commented Jun 3, 2025

Ah, that makes sense. Thanks for the explanation 🙏

That kind of roadmap sounds great :)

I actually think that this optimization could be implemented without changing the API, no? As I stated above, instead of using subtraction, one could use addition. There would be some overhead, but I'd think that it is negligible in comparison to the speedup :) If I have time, I'll try to work on an example implementing that change, including some benchmarks 🌞

In any case, I agree that we could look at actually incorporating such optimizations after 0.8.2 👍

@Dietr1ch Dietr1ch changed the title A* tie breaking on lower h-value perf: Make A* tie break on lower h-values Jun 3, 2025
@Dietr1ch Dietr1ch changed the title perf: Make A* tie break on lower h-values perf!: Make A* tie break on lower h-values Jun 3, 2025
@Dietr1ch Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch 2 times, most recently from 07a7efa to 0579282 Compare June 3, 2025 16:05
@Dietr1ch
Copy link
Contributor Author

Dietr1ch commented Jun 3, 2025

The implementation needs to rank with either (f, h) (using Sub to recover g) or (f, -g) (which requires Neg).

Alternatively, you could rank with (f, h, g) to keep all 3 values in the heap, which seems a bit silly, but would work without adding new requirements to K (nor Measure).


Oh, BTW having a structure representing a run of A* allows to implement Iterator and offer stats to peer into the algorithm's execution internals.

My own experimental implementation[^0] looks like this,

struct AStarSearch {..}

impl AStarSearch {
  pub fn new(..) -> Self {..}
  pub fn find_next_goal(&mut self) -> Option<Path> {..}
  
  #[cfg(feature = "stats")]
  pub fn stats() -> Stats {..}
}

impl Iterator for AStarSearch {
    type Item = Path;
    fn next(&mut self) -> Option<Self::Item> {
        self.find_next_goal()
    }
}

Dietr1ch added a commit to Dietr1ch/petgraph that referenced this pull request Jun 3, 2025
This works around a breaking change. `K` should be `core::ops::Sub` to avoid
storing data that could easily be derived.

See the thread for how breaking compatibility would've delay this improvement
into 0.9,
- petgraph#813 (comment)
@Dietr1ch Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch from 0579282 to 6351c9a Compare June 3, 2025 18:29
@Dietr1ch
Copy link
Contributor Author

Dietr1ch commented Jun 3, 2025

I implemented the non-breaking workaround since 0.9 doesn't have a release branch that can take the breaking change.

@Dietr1ch Dietr1ch changed the title perf!: Make A* tie break on lower h-values perf: Make A* tie break on lower h-values Jun 3, 2025
@RaoulLuque
Copy link
Member

RaoulLuque commented Jun 4, 2025

Nice!

Would you please incorporate the changes I requested in the comments above. That is, add more benchmarks than only the maze one. Also, please provide before/after benchmark result comparisons. In particular, a graph which is just a line would be interesting, since in that case the new version should loose the most performance with the unnecessary entries in the binary tree.

Once that is done and the benchmark results look good, I'll review the changes once I've got time :)

@starovoid starovoid added this to the 0.8.3 milestone Jun 4, 2025
@Dietr1ch
Copy link
Contributor Author

Dietr1ch commented Jun 4, 2025

a graph which is just a line would be interesting, since in that case the new version should lose the most performance with the unnecessary entries in the binary tree.

Well, given the current implementation uses 2 hashmaps it should still lose to the new implementation as the overhead around the keys is larger than having a larger value.

Still interesting though, because that kind of graph stresses some of the internals the most, but the real optimisation comes from being significantly better at exploring the graph and not just faster at picking the next node. From my time doing research around this, measuring the number of nodes expanded was the best measure as time ends up being linear in them, but noisy and dependent on things that were harder to reproduce (machine (cpu, cache, mem and their timings), load, compiler, linker) and compare.

@Dietr1ch
Copy link
Contributor Author

Dietr1ch commented Jun 6, 2025

Thinking about breaking APIs, is there a plan to break Graph in 0.9?

I'm asking because the other classic problems that are also widely used for benchmarking might require an abstract Graph representation to avoid materialising the entire state space.[0]

The classic 15-puzzle shouldn't need building 16!/2 nodes and their edges. Instead keeping a [u8; 16] around and operating on it to generate nodes on the fly is more efficient.

[0]: It seems that Graph instantiates the whole graph. I can see how that's convenient for some graphs, but this struct should be behind some abstract trait. (Maybe I missed something when skimming around?)

@RaoulLuque
Copy link
Member

RaoulLuque commented Jun 7, 2025

I am not sure this is the right place for such a discussion. Apart from that, I think that I don't quite understand what you mean exactly. If you would like to go into detail about this or make a suggestion please consider creating something like a conversation in the discussion tab 👍

I am glad to review your PR tho once you incorporate the changes I asked for above. Please consider turning this into a draft PR until then :)

@Dietr1ch
Copy link
Contributor Author

Dietr1ch commented Jun 7, 2025

I moved the discussion about Graph to a proper discussion on #821.

I think the PR is in a state that can me merged, so I'm not making it a draft. If you feel strongly about adding more problems and benchmarks I think petgraph should make Graph more general before getting those.

@RaoulLuque
Copy link
Member

RaoulLuque commented Jun 8, 2025

Okay. In that case someone else will have to give their opinion I guess 👍 Maybe you want to chime in @starovoid ?

My point still stands that the benchmarks to compare the performance of the new and old variant should cover different cases and not only the one where we know that it will favor the new implementation 👍 If the benchmarks show improvements in every case I would be glad to review the changes :)

I also don't understand the argument of changing the graph type before adding more benchmarks, since if we implement this PR, it will mostly affect people using the graph type as it currently is and the new Astar (in particular most people probably don't implement their own graph types for virtualized/unmaterialized graphs, but instead use the types provided by petgraph). In that case it would be interesting to see exactly that combination, of how the current graph type performs in different scenarios with the modifications of Astar 👍

@Dietr1ch
Copy link
Contributor Author

Dietr1ch commented Jun 8, 2025

I also don't understand the argument of changing the graph type before adding more benchmarks

I said this thinking about adding more variety to the graphs. I can do the line graph, or a small road network, but to me the interesting graphs are the ones that are large enough to make the decision-making while exploring the graph more important than details like how quick is the hash table.

Representing the classical 16-puzzle I mentioned would take about 50TB-300TB of memory depending on how "clever" (naughty) you get. It's completely unreasonable to try doing it without "abstract" graph support.

Dietr1ch added a commit to Dietr1ch/petgraph that referenced this pull request Jun 9, 2025
This works around a breaking change. `K` should be `core::ops::Sub` to avoid
storing data that could easily be derived.

See the thread for how breaking compatibility would've delay this improvement
into 0.9,
- petgraph#813 (comment)
@Dietr1ch Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch from 6351c9a to 46bceb8 Compare June 9, 2025 20:22
@Dietr1ch
Copy link
Contributor Author

Dietr1ch commented Jun 9, 2025

Got benchmarks on the "line" graph (actually an aisle since it leads to slightly more ranking work in the heap).

There's a 2x speedup on it, even though my change should've made things slightly slower, but this is just due to removing the ~redundant hashmap as I made the ranking adjustment.

 name                     dev__pathfinding_bench ns/iter  dev__astar_rank_tie_breaking ns/iter  diff ns/iter   diff %  speedup
 astar_maze2d_bench       3,873,653                       208,091                                 -3,665,562  -94.63%  x 18.62
 astar_maze2d_line_bench  161,513                         63,586                                     -97,927  -60.63%   x 2.54

BTW, I also have an environment commit that adds a simple flake.nix file and a simple bash script to run a benchmark comparing 2..N branches. I can contribute that too, but I'll fork that into a new PR.

@starovoid starovoid added C-new-algorithm Category: A feature request for a new graph algorithm C-performance Category: A change motivated by improving speed, memory usage or compile times S-waiting-on-review Status: Awaiting review from the assignee but also other interested parties labels Jun 13, 2025
@starovoid starovoid removed this from the 0.8.3 milestone Jun 13, 2025
@starovoid starovoid added this to the 0.8.4 milestone Jun 13, 2025
Copy link
Member

@RaoulLuque RaoulLuque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance improvement is indeed very impressive !

I had some time to look through the files :) Since we don't have any scripts of sorts in petgraph, please remove the bench script you also added.

let mut visit_next = BinaryHeap::new();
let mut scores = HashMap::new(); // g-values, cost to reach the node
let mut estimate_scores = HashMap::new(); // f-values, cost to reach + estimate cost to goal
// A node -> (f, h, g) mapping. Should be (f, h) since g can be recovered.
Copy link
Member

@RaoulLuque RaoulLuque Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// A node -> (f, h, g) mapping. Should be (f, h) since g can be recovered.
// A node -> (f, h, g) mapping. TODO: With a breaking change this should be (f, h) since g can be recovered if subtraction is available.

Seems clearer

@Dietr1ch Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch from 46bceb8 to f5fe3d1 Compare June 18, 2025 22:33
@Dietr1ch
Copy link
Contributor Author

Oh, I screwed up by switching back and forth between laptop/workstation believing that my github remote was updated and ended up resolving the comments over an older commit that didn't do the ranking trick that avoids requiring Sub and breaking the API. I'll re-do that now.

@Dietr1ch Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch 2 times, most recently from a82a56b to 04d4070 Compare June 18, 2025 23:21
@Dietr1ch Dietr1ch force-pushed the dev/astar_rank_tie_breaking branch from 04d4070 to 2b866b3 Compare June 18, 2025 23:26
@Dietr1ch
Copy link
Contributor Author

I re-ran the bench on my workstation using the script I had around.

Same remarks apply.

  • The ranking speed-up numbers are kind of made up. Empty grids essentially compare dim^N vs dim, so if you want to see >1000x out of it, just crank up the problem size, but to keep things comparable we should settle in the current instance.
  • The ~2x speed-up in the 1-D version comes from using a single hashmap instead of two.
  • This is not super scientific as sampling-based benchmarks are naturally noisy and get worse when execution isn't parallel, but I've seen ~consistent 18x-22x on the 2D grid bench and 1.7x-2.5x on the "1D" grid on both, my laptop and workstation.
name dev/pathfinding_bench ns/iter dev/astar_rank_tie_breaking ns/iter diff ns/iter diff % speedup
astar_2d_grid_bench 3,872,858 210,172 -3,662,686 -94.57% x18.43
astar_2d_grid_line_bench 164,219 67,211 -97,008 -59.07% x2.44

@RaoulLuque
Copy link
Member

RaoulLuque commented Jun 19, 2025

Alright, well if nothing else changed, LGTM :) 👍

Thanks for taking the suggestions into consideration and all the work you put in so far 🦕

Gonna ping @starovoid , so he can say how to proceed ☺️

@RaoulLuque
Copy link
Member

RaoulLuque commented Jun 19, 2025

Well, actually the only other thing I think we could do while we are at it, is add a quickcheck for astar. It could be similar to the one for dijkstra, checking whether computed paths satisfy the triangle inequality using something like the null heuristic?

It would go like right below the fn dijkstra_triangle_ineq in tests/quickcheck.rs

@Dietr1ch
Copy link
Contributor Author

Dietr1ch commented Jun 20, 2025

checking whether computed paths satisfy the triangle inequality using something like the null heuristic?

There's 2 strong arguments IMO against doing this.

  • First, it seems a lot like testing the test, so what's stopping us from unproductively recursing into testing^+ the test indefinitely? At some point you need to simply be able to stare at the tests long enough to convince yourself things are fine, and in this case I think we try hard enough by comparing against Dijkstra's 1->N optimal paths.

  • Second, this is something that when studying A* is explicitly warned against trying to do, since doing that is more expensive than running the search, so it'd only make sense if you know your problem in advance, but can't afford the memory to just run Floyd-Warshall once.

That said, there's a weaker version of this, which is testing the heuristic along the optimal path retrieved, which comes almost for free since you already went through it. Along this path h should't over-estimate c (h(a, b) <= c(a, b)). This is something that would expose the heuristic is inconsistent, but not enough to prove the path isn't optimal in case it's inconsistent, and neither enough to prove the heuristic is indeed consistent.

Proofs around heuristics often come from the ideas that allow you to have an heuristic in the first place, so I think are rarely needed once you see your heuristic is just an optimal solution in a relaxed version of your problem.

If you really want to add this weaker check, I suggest you just add some debug_assertions, but keep that away from the tests and runtime on release runs.

@RaoulLuque
Copy link
Member

Thanks for the response and the insights. I am not sure however, whether we are on the same page here. I am not suggesting to add another unit test. Instead I am suggesting to add a quickcheck like the ones in tests/quickcheck.rs.

This would allow the algorithm (A*) to be tested on a multitude of random graphs instead of just the one from the unit test. I do see the point of possibly missing bugs when comparing with the triangle inequality since that should be given with A*, however it could still catch some bugs. However, we could also just compare it with Dijkstra in a Quickcheck setting as well. That would rely on Dijkstra working of course, but that seems like a reasonable assumption I think.

Maybe I also misunderstand you but it sounds like you thought I'd suggest adding another unit test. Instead I suggested adding a quickcheck which has the great benefit of systematically testing pseudo-random graphs, which make testing edge-case graphs way easier.

@RaoulLuque
Copy link
Member

Hey, just to let you know. I would fork and pickup the PR myself in a few days and just implement the last suggestions regarding quickcheck to merge your work, since I think that it is very valuable :)

@RaoulLuque
Copy link
Member

Closed in favor of #882

@RaoulLuque RaoulLuque closed this Sep 28, 2025
@RaoulLuque RaoulLuque modified the milestones: 0.8.4, 0.8.3 Sep 28, 2025
github-merge-queue bot pushed a commit that referenced this pull request Sep 28, 2025
This PR continues the work from #813. The code there has already been
reviewed and the only addition are two quickchecks in quickcheck.rs. So
the only part of the code that would need to be reviewed is the changes
in quickcheck.rs.

---------

Co-authored-by: Dietrich Daroch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-new-algorithm Category: A feature request for a new graph algorithm C-performance Category: A change motivated by improving speed, memory usage or compile times S-waiting-on-review Status: Awaiting review from the assignee but also other interested parties

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants