Sort nodes spatially to speed up later queries. by danpat · Pull Request #3342 · Project-OSRM/osrm-backend

danpat · 2016-11-18T16:44:07Z

Issue

Addresses #3311

This change sorts node IDs on a Hilbert curve, on the assumption that this will improve cache locality during later queries of this data (coordinate data is heavily used during edge-based-graph construction when the guidance code does lots of localize graph exploring).

Tasklist

add regression / cucumber cases (see docs/testing.md)
review
adjust for comments

danpat · 2016-11-18T16:44:52Z

src/extractor/extraction_containers.cpp

+    const auto used_node_id_list_end = used_node_id_list.end();

-    while (edge_iterator != all_edges_list_end && node_iterator != all_nodes_list_end)
+    while (edge_iterator != all_edges_list_end && node_iterator != used_node_id_list_end)


I'm pretty sure this logic is incorrect, because edge creation hangs for larger graphs. Still needs some work.

danpat · 2016-11-18T16:45:18Z

Note: This is not ready, it doesn't work properly yet.

danpat · 2016-11-19T02:02:28Z

So, while this PR is still slightly buggy, the main point was to try to speed up edge-based-edge generation, which is coordinate-usage heavy.

Based on re-processing California a few times, the results are:

Edge-based-graph generation without Hilbert sorting: 30s
Edge-based-graph generation with Hilbert sorting: 27s

This doesn't seem like a significant gain. In addition, the loss of locality during edge generation in the extraction_containers.cpp leads to that phase taking significantly longer - 5 seconds vs 800 seconds with the current implementation. While my changes are probably less than optimal, it doesn't seem possible to avoid a penalty - the current approach is reasonably well optimized for unused-node removal and edge-coordinate updating.

I'm open to ideas here, or a review to spot if I've made some glaring error (note: there are 5 or 6 tests failing still, so some graph-shape behaviour has changed, I'm just not exactly sure what without a deeper dive).

Given the negligible speedup, I'm inclined to resign this change to the trash.

/cc @daniel-j-h @TheMarex @MoKob

danpat · 2016-11-19T02:04:11Z

src/extractor/extraction_containers.cpp

+                     util::FixedLongitude{std::numeric_limits<std::int32_t>::min()});
+
+        auto id_iter = external_to_internal_node_id_map.find(edge_iterator->result.osm_target_id);
+        if (id_iter == external_to_internal_node_id_map.end())


I suspect this lookup is a big source of the slowdown.

danpat · 2016-11-19T02:06:04Z

src/extractor/extraction_containers.cpp

-        BOOST_ASSERT(edge_iterator->source_coordinate.lon !=
-                     util::FixedLongitude{std::numeric_limits<std::int32_t>::min()});
+        const auto node_coord = util::Coordinate(all_nodes_list[id_iter->second].lon,
+                                                 all_nodes_list[id_iter->second].lat);


Also this - previously, coordinates were cache-local in the node iterator, no additional lookup was required.
Because the edge IDs aren't hilbert order sorted, we're essentially doing random access to the all_nodes_list.

TheMarex · 2016-11-21T10:36:12Z

I think it is valuable to figure out why the tests are not passing. If we have behavior in our code that relies on a specific node order, that is not good. Looking at the failure there seems to be something odd going on with the geometry.

Looking at the implementation it would make sense to do the following algorithmic changes to skip most of the overhead you are seeing:

Compute hilbert value in ExtractionCallbacks when creating the ExternalMemoryNode (will not incur a write for very single node in a IO heavy loop)
Keep the current logic that first filters the used nodes (to first reduce the data size)
Sort the nodes by hilbert value (most expensive step, but data volume will be lower)
After the first mapping OSM -> internal was computed use it to compute OSM -> internal -> spatial

The optimization here is to not do the stxxl::sort on all nodes, but only on the relevant ones.

I would also recommend running this on a much bigger dataset before we discard this. The graph for California is small enough that there is a reasonable chance to hit the cache even if you use random order.

daniel-j-h · 2016-12-13T14:54:56Z

@danpat can you check your numbers again now with the Hilbert space-filling curve improvements?

daniel-j-h · 2017-01-17T11:59:41Z

Ping - what's the state of this, especially after the Hilbert changes? Still worth investigating?

danpat · 2017-01-18T07:43:56Z

@daniel-j-h We should close this but leave the ticket open. If someone wants to pick this up again, they can use this PR as a starting point.

Sort nodes spatially to speed up later queries.

1835e55

danpat added the Work In Progress label Nov 18, 2016

danpat commented Nov 18, 2016

View reviewed changes

Fix edge coordinate setting.

d40ee18

danpat commented Nov 19, 2016

View reviewed changes

oxidase mentioned this pull request Nov 22, 2016

hilbertCode is not really Hilbert SFC linear coordinate #3352

Closed

danpat closed this Jan 18, 2017

DennisOSRM deleted the fix/3311 branch November 6, 2022 14:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sort nodes spatially to speed up later queries.#3342

Sort nodes spatially to speed up later queries.#3342
danpat wants to merge 2 commits intomasterfrom
fix/3311

danpat commented Nov 18, 2016

Uh oh!

danpat Nov 18, 2016

Uh oh!

danpat commented Nov 18, 2016

Uh oh!

danpat commented Nov 19, 2016

Uh oh!

danpat Nov 19, 2016

Uh oh!

danpat Nov 19, 2016

Uh oh!

TheMarex commented Nov 21, 2016

Uh oh!

daniel-j-h commented Dec 13, 2016

Uh oh!

daniel-j-h commented Jan 17, 2017

Uh oh!

danpat commented Jan 18, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

danpat commented Nov 18, 2016

Issue

Tasklist

Uh oh!

danpat Nov 18, 2016

Choose a reason for hiding this comment

Uh oh!

danpat commented Nov 18, 2016

Uh oh!

danpat commented Nov 19, 2016

Uh oh!

danpat Nov 19, 2016

Choose a reason for hiding this comment

Uh oh!

danpat Nov 19, 2016

Choose a reason for hiding this comment

Uh oh!

TheMarex commented Nov 21, 2016

Uh oh!

daniel-j-h commented Dec 13, 2016

Uh oh!

daniel-j-h commented Jan 17, 2017

Uh oh!

danpat commented Jan 18, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants