Conversation
| const auto used_node_id_list_end = used_node_id_list.end(); | ||
|
|
||
| while (edge_iterator != all_edges_list_end && node_iterator != all_nodes_list_end) | ||
| while (edge_iterator != all_edges_list_end && node_iterator != used_node_id_list_end) |
There was a problem hiding this comment.
I'm pretty sure this logic is incorrect, because edge creation hangs for larger graphs. Still needs some work.
|
Note: This is not ready, it doesn't work properly yet. |
|
So, while this PR is still slightly buggy, the main point was to try to speed up edge-based-edge generation, which is coordinate-usage heavy. Based on re-processing California a few times, the results are: Edge-based-graph generation without Hilbert sorting: 30s This doesn't seem like a significant gain. In addition, the loss of locality during edge generation in the I'm open to ideas here, or a review to spot if I've made some glaring error (note: there are 5 or 6 tests failing still, so some graph-shape behaviour has changed, I'm just not exactly sure what without a deeper dive). Given the negligible speedup, I'm inclined to resign this change to the trash. |
| util::FixedLongitude{std::numeric_limits<std::int32_t>::min()}); | ||
|
|
||
| auto id_iter = external_to_internal_node_id_map.find(edge_iterator->result.osm_target_id); | ||
| if (id_iter == external_to_internal_node_id_map.end()) |
There was a problem hiding this comment.
I suspect this lookup is a big source of the slowdown.
| BOOST_ASSERT(edge_iterator->source_coordinate.lon != | ||
| util::FixedLongitude{std::numeric_limits<std::int32_t>::min()}); | ||
| const auto node_coord = util::Coordinate(all_nodes_list[id_iter->second].lon, | ||
| all_nodes_list[id_iter->second].lat); |
There was a problem hiding this comment.
Also this - previously, coordinates were cache-local in the node iterator, no additional lookup was required.
Because the edge IDs aren't hilbert order sorted, we're essentially doing random access to the all_nodes_list.
|
I think it is valuable to figure out why the tests are not passing. If we have behavior in our code that relies on a specific node order, that is not good. Looking at the failure there seems to be something odd going on with the geometry. Looking at the implementation it would make sense to do the following algorithmic changes to skip most of the overhead you are seeing:
The optimization here is to not do the I would also recommend running this on a much bigger dataset before we discard this. The graph for California is small enough that there is a reasonable chance to hit the cache even if you use random order. |
|
@danpat can you check your numbers again now with the Hilbert space-filling curve improvements? |
|
Ping - what's the state of this, especially after the Hilbert changes? Still worth investigating? |
|
@daniel-j-h We should close this but leave the ticket open. If someone wants to pick this up again, they can use this PR as a starting point. |
Issue
Addresses #3311
This change sorts node IDs on a Hilbert curve, on the assumption that this will improve cache locality during later queries of this data (coordinate data is heavily used during edge-based-graph construction when the guidance code does lots of localize graph exploring).
Tasklist