Skip to content

Implement mmapDataFacade #1947

@danpat

Description

@danpat

OSRM currently supports reading data from files into heap memory (InternalDataFacade), or pre-loading data into shared memory using IPC shared memory blocks (SharedDataFacade+osrm-datastore).

We can consolidate the behaviour of both of these by using mmap. Instead of reading files into memory explicitly, we should be able to mmap the data files, and immediately begin using them.

There are a few changes that need to be made to get us there:

  • Benchmark mmapd data access vs heap - what, if any, penalty is there? How does this change when the file we mmap is on a ramdisk?
  • Identify data structures that can't be mmaped and fix them - basically anything in osrm-datastore (src/storage/storage.cpp) that isn't just loaded into memory in one big blob. Problem here is vector<bool> and its proxy behavior; we need a contiguous container we can memcpy to.
  • Clone the SharedDataFacade and perform similar .swap operations against mmaped memory addresses rather than shm addresses.
  • Figure out IPC signalling for swapping out mmaped files on-the-fly
  • Investigate using mmap instead of explicit read disk files for leaf nodes in the StaticRTree to boost performance (coordinate lookups represent the largest part of any given routing query because of the I/O in the rtree).
  • Make sure this works on Windows too.

The main goal here is to minimize double-reads of data. In situations where we are constantly cycling out data sets (in the case of traffic updates), we want to minimize I/O and the number of times any single bit of data gets touched. By using mmap and tmpfs, we can emulate the current share-memory behavior, but avoid an extra pass over the data.

For normal osrm-routed use, we would essentially get lazy-loading of data - osrm-routed would start up faster, but queries would be slower since pages are loaded from disk on demand until data is touched and lives in the filesystem cache. This initial slowness could be avoided by pre-seeding the data files into the filesystem cache or via MAP_POPULATE (Linux 2.5.46+), and this could be done in parallel to osrm-routed already starting up and answering queries.

/cc @daniel-j-h @TheMarex

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions