New OSRM data format

Currently our output on a global planet extract looks like this:

```
  25G latest.osrm
    4 latest.osrm.core
    8 latest.osrm.datasource_indexes
   12 latest.osrm.datasource_names
 7.7G latest.osrm.ebg
 7.7G latest.osrm.edges
 980M latest.osrm.enw
  21G latest.osrm.fileIndex
  11G latest.osrm.geometry
  18G latest.osrm.hsgr
 980M latest.osrm.level
 119M latest.osrm.names
 8.8G latest.osrm.nodes
   12 latest.osrm.properties
 1.3G latest.osrm.ramIndex
  15M latest.osrm.restrictions
   20 latest.osrm.timestamp
```

Yes this are 16 files. Its about time that we reconcile this. This issues should capture our requirements around a new data format. Of the top of my head:
- Only one final file (I expect we will need at least on temporary file for `osrm-contract`)
- Platform independent on-disk storage
- Needs to be extremely fast to read and write
- Versioned
- Documented
- Resides in its own subsystem (no random `std::fstream` calls allover the place)
### Platform dependence

We are kind of fortunate in the sense that we don't need to support complex nested data types. Almost all data we use is just a big array of something. Usually 32bit integers, no floating points and no pointers (thankfully).
So we might get away with a lot less complex solutions (no alignment problems).
So what we want to make sure is that we get the following right:
- datatype size (only use types that have an explicit size like `std::uint64_t`)
- alignment through sticking to primitive types

Existing solutions here:
- Protobuf (high adoption, already a dependency, slow, schema)
- Cap'n'Proto (schema, claims to be zero overhead of x64)
- Boost::serialization (slow as hell, no schema)
- cereal http://uscilab.github.io/cereal/ (something like https://github.com/USCiLab/cereal/blob/master/include/cereal/archives/portable_binary.hpp should be what we need)

Reading through Protobuf and Cap'n'Proto, they don't strike me as immediate fits. Both rely on schema based generators. I would expect we could capture the very tightly scoped functionality from above in a single header-only library.
### References
- https://isocpp.org/wiki/faq/serialization#serialize-binary-format good read in general
- https://capnproto.org/cxx.html

/cc @daniel-j-h @danpat @miccolis 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New OSRM data format #2242

Platform dependence

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

New OSRM data format #2242

Description

Platform dependence

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions