-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Implement mmapDataFacade #1947
Description
OSRM currently supports reading data from files into heap memory (InternalDataFacade), or pre-loading data into shared memory using IPC shared memory blocks (SharedDataFacade+osrm-datastore).
We can consolidate the behaviour of both of these by using mmap. Instead of reading files into memory explicitly, we should be able to mmap the data files, and immediately begin using them.
There are a few changes that need to be made to get us there:
- Benchmark
mmapd data access vs heap - what, if any, penalty is there? How does this change when the file wemmapis on a ramdisk? - Identify data structures that can't be
mmaped and fix them - basically anything inosrm-datastore(src/storage/storage.cpp) that isn't just loaded into memory in one big blob. Problem here isvector<bool>and its proxy behavior; we need a contiguous container we canmemcpyto. - Clone the
SharedDataFacadeand perform similar.swapoperations againstmmaped memory addresses rather thanshmaddresses. - Figure out IPC signalling for swapping out
mmaped files on-the-fly - Investigate using
mmapinstead of explicitreaddisk files for leaf nodes in the StaticRTree to boost performance (coordinate lookups represent the largest part of any given routing query because of the I/O in the rtree). - Make sure this works on Windows too.
The main goal here is to minimize double-reads of data. In situations where we are constantly cycling out data sets (in the case of traffic updates), we want to minimize I/O and the number of times any single bit of data gets touched. By using mmap and tmpfs, we can emulate the current share-memory behavior, but avoid an extra pass over the data.
For normal osrm-routed use, we would essentially get lazy-loading of data - osrm-routed would start up faster, but queries would be slower since pages are loaded from disk on demand until data is touched and lives in the filesystem cache. This initial slowness could be avoided by pre-seeding the data files into the filesystem cache or via MAP_POPULATE (Linux 2.5.46+), and this could be done in parallel to osrm-routed already starting up and answering queries.
/cc @daniel-j-h @TheMarex