Skip to content

New OSRM data format #2242

@TheMarex

Description

@TheMarex

Currently our output on a global planet extract looks like this:

  25G latest.osrm
    4 latest.osrm.core
    8 latest.osrm.datasource_indexes
   12 latest.osrm.datasource_names
 7.7G latest.osrm.ebg
 7.7G latest.osrm.edges
 980M latest.osrm.enw
  21G latest.osrm.fileIndex
  11G latest.osrm.geometry
  18G latest.osrm.hsgr
 980M latest.osrm.level
 119M latest.osrm.names
 8.8G latest.osrm.nodes
   12 latest.osrm.properties
 1.3G latest.osrm.ramIndex
  15M latest.osrm.restrictions
   20 latest.osrm.timestamp

Yes this are 16 files. Its about time that we reconcile this. This issues should capture our requirements around a new data format. Of the top of my head:

  • Only one final file (I expect we will need at least on temporary file for osrm-contract)
  • Platform independent on-disk storage
  • Needs to be extremely fast to read and write
  • Versioned
  • Documented
  • Resides in its own subsystem (no random std::fstream calls allover the place)

Platform dependence

We are kind of fortunate in the sense that we don't need to support complex nested data types. Almost all data we use is just a big array of something. Usually 32bit integers, no floating points and no pointers (thankfully).
So we might get away with a lot less complex solutions (no alignment problems).
So what we want to make sure is that we get the following right:

  • datatype size (only use types that have an explicit size like std::uint64_t)
  • alignment through sticking to primitive types

Existing solutions here:

Reading through Protobuf and Cap'n'Proto, they don't strike me as immediate fits. Both rely on schema based generators. I would expect we could capture the very tightly scoped functionality from above in a single header-only library.

References

/cc @daniel-j-h @danpat @miccolis

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions