Skip to content

Add Floyd-Warshall algorithm#1312

Merged
fabratu merged 18 commits intonetworkit:masterfrom
Schwarf:feature/floyd_warshall
Apr 1, 2025
Merged

Add Floyd-Warshall algorithm#1312
fabratu merged 18 commits intonetworkit:masterfrom
Schwarf:feature/floyd_warshall

Conversation

@Schwarf
Copy link
Copy Markdown
Contributor

@Schwarf Schwarf commented Mar 7, 2025

Implement Floyd-Warshall Algorithm for All-Pairs Shortest Paths

This PR introduces the Floyd-Warshall algorithm for computing all-pairs shortest paths in a weighted graph. It supports both directed and undirected graphs, correctly handles negative edge weights, and detects negative cycles. If multiple shortest paths exist with the same distance, the algorithm selects the one with the fewest nodes.

Features

  • Computes shortest path distances between all node pairs.
  • Detects negative cycles in the graph.
  • Provides path reconstruction via getNodesOnShortestPath(source, target).
  • Prefers the shortest path with the fewest nodes when multiple exist.

The algorithm runs in O(n³) time complexity, with n representing the number of nodes, making it suitable for small to medium-sized graphs.

@Schwarf Schwarf force-pushed the feature/floyd_warshall branch from 029f7dc to 8e70a8e Compare March 8, 2025 09:12
@Schwarf Schwarf requested a review from angriman March 9, 2025 09:58
@Schwarf Schwarf force-pushed the feature/floyd_warshall branch from 24d5dce to 870709c Compare March 15, 2025 16:02
@Schwarf
Copy link
Copy Markdown
Contributor Author

Schwarf commented Mar 16, 2025

Performance Measurements with Valgrind:

@angriman, @fabratu
To assess performance implications, I created the following benchmark program:

#include "networkit/graph/Graph.hpp"
#include "networkit/distance/FloydWarshall.hpp"
#include <valgrind/callgrind.h>

NetworKit::Graph get_graph()
{
    NetworKit::Graph graph({{0, 2, 4}, {0, 3, 4}, {0, 4, 2}, {0, 5, 2}, {0, 6, 1}, {0, 7, 14}, ...
    });
    return graph;
}

int main()
{
    auto graph = get_graph();
    NetworKit::FloydWarshall test(graph);
    test.run();
    int negative{};
    int positive{};
    for(NetworKit::node node{}; node < graph.numberOfNodes(); node++)
    {
        bool inNegCycle = test.isNodeInNegativeCycle(node);
        if (inNegCycle)
            negative++;
        else
            positive++;
    }
    std::cout << negative << std::endl;
    std::cout << positive << std::endl;
    return negative;
}

where get_graph provides a graph with 50 nodes and about 1000 edges.
I build networkit using this config:
cmake -DNETWORKIT_BUILD_TESTS=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja ..
and my sample program:
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -G Ninja ..
using gcc-11.4.
I executed the program with Valgrind-3.18.1 using Callgrind:
valgrind --tool=callgrind floyd_warshall
which results in:
4,444,940,581 Dynamic Instructions
I repeated the measurement to obtain an approximate estimate of the intrinsic measurement noise
4,507,148,633 Dynamic Instructions
I expect this noise due to the use of OpenMP.

Next I changed the std::vector<bool> nodesInNegativeCycle; in the class FloydWarshall to std::vector<uint8_t> nodesInNegativeCycle;, reran the measurements and observed:
4,470,597,816 Dynamic Instructions
the second measurement resulted in
4,489,718,891 Dynamic Instructions

From these observations, I conclude that the performance difference between std::vector<bool> and std::vector<uint8_t> for this specific use case is negligible, and any minor differences observed fall within the intrinsic measurement noise.

Comparison of std::vector<bool> and std::vector<uint8_t>

To gain insight specifically into the overhead of accessing std::vector<bool>, I used Callgrind instrumentation macros around only the node-checking loop:

CALLGRIND_START_INSTRUMENTATION;
for(NetworKit::node node{}; node < graph.numberOfNodes(); node++)
{
    bool inNegCycle = test.isNodeInNegativeCycle(node);
    if (inNegCycle)
        negative++;
    else
        positive++;
}
CALLGRIND_STOP_INSTRUMENTATION;

and called valgrind using an additional option:
valgrind --tool=callgrind --instr-atstart=no ./cmake-build-debug/floyd_warshall
For the std::vector<bool> case I found:
1,608 Dynamic Instructions
and this result is reproducible (since no OpenMP calls are involved in the loop).
When using the macros with std::vector<uint8_t> and just measure the loop I found:
908 Dynamic Instructions
This targeted measurement reveals an approximate 40% instruction-count reduction when using std::vector<uint8_t>.
For loops executed frequently, such optimization could be significant.

I hope I haven't overlooked anything or made any mistakes. Please feel free to comment.

@fabratu
Copy link
Copy Markdown
Member

fabratu commented Mar 17, 2025

Thanks for doing the benchmarks. This seems to back the assumption, that in the case at hand, there is no significant difference in performance.

@angriman
Copy link
Copy Markdown
Member

Thanks a lot for setting up those benchmarks and sharing the results. For this particular application, it looks that using std::vector<uint8_t> does not bring noticeable performance improvements, so I'd recommend to use std::vector<bool>.

@fabratu fabratu merged commit c366d65 into networkit:master Apr 1, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants