Skip to content

Use tar-format to encapsulate data#4960

Merged
TheMarex merged 53 commits intomasterfrom
refactor/tar_files
Mar 27, 2018
Merged

Use tar-format to encapsulate data#4960
TheMarex merged 53 commits intomasterfrom
refactor/tar_files

Conversation

@TheMarex
Copy link
Copy Markdown
Member

@TheMarex TheMarex commented Mar 15, 2018

Issue

This PR is another step towards implementing #4952. The goal is to make any .osrm.* file a tar file internally and address data by (file)name. As example the .mldgr file now looks like this:

/mld/multilevelgraph/node_array.meta
/mld/multilevelgraph/node_array
/mld/multilevelgraph/edge_array.meta
/mld/multilevelgraph/edge_array
/mld/multilevelgraph/node_to_edge_offset.meta
/mld/multilevelgraph/node_to_edge_offset
/mld/multilevelgraph/connectivity_checksum.meta
/mld/multilevelgraph/connectivity_checksum

Where every file contains raw binary data for data structures in OSRM that we can deserialize or read directly to memory. The .meta files contain additional information for setting up the data layout, e.g. the number of values for serialized std::vector, or the fingerprint.

Tasklist

@TheMarex TheMarex self-assigned this Mar 15, 2018
@TheMarex TheMarex force-pushed the refactor/tar_files branch 3 times, most recently from c72d85a to 24dc369 Compare March 22, 2018 16:43
@TheMarex TheMarex requested a review from oxidase March 22, 2018 18:19
@TheMarex TheMarex requested a review from danpat March 22, 2018 18:27
Copy link
Copy Markdown
Contributor

@oxidase oxidase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have some comments to the last commit

namespace detail
{
template <typename T, typename BlockT = unsigned char>
inline unsigned char packBits(const T &data, std::size_t index, std::size_t count)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return value must be BlockT

{
static_assert(std::is_same<typename T::value_type, bool>::value, "value_type is not bool");
const unsigned char mask = 1 << (count - 1);
const BlockT mask = 1 << (count - 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here must be BlockT{1} << (count - 1)


const auto decode = [&](const std::uint64_t block) {
auto read_size = std::min<std::size_t>(count - index, WORD_BITS);
unpackBits(data, index, read_size, block);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here unpackBits<VectorT, std::uint64_t>


// FIXME on old boost version the function_input_iterator does not work with lambdas
// so we need to wrap it in a function here.
const std::function<std::uint64_t()> encode_function = [&]() -> char {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/-> char/-> std::uint64_t/

return packed;
};

std::uint64_t number_of_blocks = std::ceil((double)count / WORD_BITS);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(count + WORD_BITS - 1) / WORD_BITS would be better to avoid floating-point arithmetic

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice trick. I know a few places were this would come in useful.

@danpat danpat added this to the 5.17.0 milestone Mar 23, 2018
@TheMarex TheMarex force-pushed the refactor/tar_files branch from 3aa22d7 to 2707a2b Compare March 26, 2018 10:59
@TheMarex TheMarex force-pushed the refactor/tar_files branch from 8f75d0e to 3ee8a96 Compare March 26, 2018 12:13
@oxidase
Copy link
Copy Markdown
Contributor

oxidase commented Mar 27, 2018

@TheMarex i pushed fixes for msvc to trigger CI builds ("fixes" are reducing c++ syntactic sugar for msvc). Feel free to kick-out or squash the last commit.

@TheMarex
Copy link
Copy Markdown
Member Author

unknown location(0): fatal error: in "route/test_manual_setting_of_annotations_property": class osrm::util::exception: File "C:/projects/osrm/test/data/ch/monaco.osrm.fileIndex" mapping failed: failed opening file: The process cannot access the file because it is being used by another process.

Some more problems with windows. I suspect this is because I had to change the r-tree to allow writing to an mmap files. Maybe I can work around this.

@TheMarex
Copy link
Copy Markdown
Member Author

TheMarex commented Mar 27, 2018

Wuhu finally! It compiles. 🚀 I need to merge this because we add the new dependency tree which doesn't work with rebase + merge.

@TheMarex TheMarex merged commit 2690dd0 into master Mar 27, 2018
@TheMarex TheMarex deleted the refactor/tar_files branch March 27, 2018 12:34
@TheMarex TheMarex mentioned this pull request Mar 28, 2018
@mloskot
Copy link
Copy Markdown
Member

mloskot commented Mar 28, 2018

Follow-up to @TheMarex -s #2242 (comment) here

One of #2242 requirements was "Only one final file". Does this implement it too. It's unclear form the description. I've failed to fish out from the commits if it is about tar-ifying individual files only or also about packaging into single master file.

Will the docs or wiki:Processing Flow wiki updated on how to use the new format?

@TheMarex
Copy link
Copy Markdown
Member Author

@mloskot you are right, this doesn't meet the single-file requirement yet.

However, since we now use named identifiers for data, osrm-datastore does not really need to care anymore in which files the data is stored, we could figure that out automatically. That would unlock just passing a single file to osrm-datastore that is just a bunch of merged tar files. I captured this in #4978 for further discussion.

@mloskot
Copy link
Copy Markdown
Member

mloskot commented Mar 28, 2018

@TheMarex It makes sense, thanks for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants