johnson add options to calculate distance
Currently, when users want to calculate distances using johnson, they must first obtain the paths by calling johnson(), and then use path_weight() to extract the actual distances.
However, in cases where users only care about distances and are not interested in the paths, each Dijkstra run within johnson could be optimized by omitting the steps of calculating the paths — i.e., computing only the distances(of course, we need to reconstruct the distances by adding and subtracting the potentials from Bellman-Ford. However, this step is much more efficient than reconstructing the full paths).
Therefore, this PR introduces an optional argument to johnson that allows users to specify what should be returned:
def johnson(G, weight="weight", ret="path")
When ret is set to "path"(default), only paths is returned. In this case, the code works as what before this PR. When ret is set to "dist", only dist is returned. And every dijkstra run will not calculate the paths. When ret is set to "all", both paths and dist are returned.
I assume the motivation is that you'd like to get the "transformed" distances out, correct? Otherwise computing the "distances" of the unmodified paths is straightforward.
I'm -1 on the proposed API changes - this introduces more internal complexity and moves further away from the other shortest-path patterns in the library. I'm not necessarily opposed to adding something that computes the "distances" (this is usually done in a companion function
*_path_length). In this case since that's likely to result in code duplication, so perhaps a single kwarg switch would be better.
Yes, you are right, the API becomes more complex, but I think there's practical value in exposing the distances—especially in cases where users only care about distances, since maintaining paths during every Dijkstra is less efficient.
Would it be acceptable to add an optional ret parameter to specify what to return?
def johnson(G, weight="weight", ret="path") ...
paths = johnson(G)
dist = johnson(G, ret="dist")
paths, dist = johnson(G, ret="all")
What takes a long time for computing the length of a path? It should be fairly quick computationally. And NetworkX has a function to do it: nx.path_weight
What aspect of this is causing trouble?
What takes a long time for computing the length of a path. It should be fairly quick computationally. And NetworkX has a function to do it: nx.path_weight
What aspect of this is causing trouble?
Yes, it is not time-consuming to calculate the distance when we know the paths. But if the users only want to know the distance, the current approach involves calculating the paths in every Dijkstra run, and this process is less efficient than only calculating the distance in Dijkstra.
I have edited the comment of this PR to make it more clear.
Thanks for your rewording of the OP. I now understand that this aims to allow folks to avoid creating the paths in the Djikstra part of the Johnson method.
As for an API for this feature, some/many of the shortest_path functions have optional arguments that indicate whether to compute a trait or not. For example, if we could return a dict of distances, a dict of immediate predecessor or a dict of paths, one way to construct the function signature would be:
def my_all_pairs_shortest_path(G, dist=None, pred=None, paths=None):
Or maybe not include dist as a parameter and always return the distance dict. If the keywords are None then we don't store/compute those values. I haven't thought much about how this might be done for Johnson, so this kind of api change might not work here. Ideally it would pass through to the djikstra function calls (and bellman-ford) inside of johnson and I don't know if that works here either.
But I agree with @rossbar that we don't want api inflation for all the possibilities. Let's make it as compact as possible. I also feel like providing a way to avoid constructing all the paths could be valuable. I vaguely recall that we thought about this for Johnson way-way-back and decided that is wasn't worth it -- maybe someone will be interested enough to work on what should be done for full feature support. So, thanks for this. Let's try to make it simple and natural feeling (whatever that means).
Thanks for your rewording of the OP. I now understand that this aims to allow folks to avoid creating the paths in the Djikstra part of the Johnson method.
As for an API for this feature, some/many of the shortest_path functions have optional arguments that indicate whether to compute a trait or not. For example, if we could return a dict of distances, a dict of immediate predecessor or a dict of paths, one way to construct the function signature would be:
def my_all_pairs_shortest_path(G, dist=None, pred=None, paths=None):Or maybe not include
distas a parameter and always return the distance dict. If the keywords areNonethen we don't store/compute those values. I haven't thought much about how this might be done for Johnson, so this kind of api change might not work here. Ideally it would pass through to the djikstra function calls (and bellman-ford) inside of johnson and I don't know if that works here either.But I agree with @rossbar that we don't want api inflation for all the possibilities. Let's make it as compact as possible. I also feel like providing a way to avoid constructing all the paths could be valuable. I vaguely recall that we thought about this for Johnson way-way-back and decided that is wasn't worth it -- maybe someone will be interested enough to work on what should be done for full feature support. So, thanks for this. Let's try to make it simple and natural feeling (whatever that means).
Thank you for the thoughtful suggestions!
I agree that using three optional arguments like def my_all_pairs_shortest_path(G, dist=None, pred=None, paths=None): would align well with other APIs in NetworkX. However, this approach doesn't preserve backward compatibility with the current behavior of johnson, which returns paths by default.
The ret="dist" | "path"(default) | "all" pattern is intended to maintain compatibility with existing usage, while trying to keep the API simple and compact.
However, this approach doesn't preserve backward compatibility with the current behavior of johnson, which returns paths by default.
Whether or not things are backward compatible will depend on the default values of the new kwargs. For example, I'd imagine johnson(paths=True, dist=False) would result in the current behavior, so having those be the defaults would not be backward incompatible.
For example, I'd imagine johnson(paths=True, dist=False) would result in the current behavior
Thank you for the suggestion!
Using boolean kwargs is indeed a nice idea. If we apply this approach, one thing needs to be clarified: when both paths=False and dist=False , what should the function return? Would it be better to raise a ValueError, or simply return None?
Happy to go with whatever is most consistent with the rest of the library.
I'm thinking we would raise on this case. More generally, I would think always returning dist and then updating the provided dicts for paths and/or pred is clear and useful. But for backwards compatibility, we want to have dist be a keyword too. So we should probably raise if the keyword inputs don't make sense. :)
Thanks!
Thanks! I’ve updated the API following your suggestion: dist, paths, and pred are now dictionaries that get updated during the computation and dist is returned. For backward compatibility, when all three dicts are None, the function simply returns paths instead of raising an error.