[MOD-9230] "Faithful" trie reimplementation#5882
Conversation
520c80a to
06cb424
Compare
dd8f537 to
82612a9
Compare
bf2c37b to
9431452
Compare
82612a9 to
a63cef4
Compare
a63cef4 to
b86f64c
Compare
b86f64c to
d22d775
Compare
e59d527 to
7bf4e92
Compare
374bc8e to
bbc1fa7
Compare
7bf4e92 to
8c99843
Compare
ec89f22 to
5c193a3
Compare
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
2 similar comments
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
raz-mon
left a comment
There was a problem hiding this comment.
10/17 files reviewed (to be continued) - looking good 👍
947e0d8 to
cf238be
Compare
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
…ginal C implementation in deps/triemap The latter is included as a baseline.
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
d2beced to
01bab6f
Compare
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
GuyAv46
left a comment
There was a problem hiding this comment.
Will finish going over trie_ops.rs tomorrow.
| match longest_common_prefix(current.label(), key) { | ||
| Some((0, ordering)) => { | ||
| // The node's label and the key don't share a common prefix. | ||
| // This can only happen if the current node is the root node of the trie. |
There was a problem hiding this comment.
This is true only if insert_or_replace_with was called on the root node, right?
Shouldn't we prevent calling the API in this file for non-root nodes?
There was a problem hiding this comment.
You're correct there, this is only true if insert_or_replace_with is called on the root node.
Shouldn't we prevent calling the API in this file for non-root nodes?
The Node type is not exposed through the public API of the trie_rs crate—its methods can only be invoked from inside this crate, which is why the comment reads as it does.
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
raz-mon
left a comment
There was a problem hiding this comment.
LGTM in general.
Hard to say about the nitty gritty Rust details (sorry) but the general functionality looks good.
| // insert the old root as a child of the new root, | ||
| // and add a new child to the empty root. | ||
| current.map(|old_root| { | ||
| let new_child = Node::new_leaf(key, Some(f(None))); |
There was a problem hiding this comment.
What's the usage of Some(f(None))?
There was a problem hiding this comment.
f always returns a value, so we need to wrap it in Some in order to get an Option<Data>.
f is needed because, when invoked from C, we need to give C a way to free the previous value (if one existed), hence the callback. This will become more straightforward (i.e. no closures) once we don't have that requirement anymore.
* [MOD-9230] A Rust trie that's much closer to the structure of the original C implementation in deps/triemap The latter is included as a baseline. * Rename 'offset' to 'target' in 'shift_left'
* [MOD-9230] A Rust trie that's much closer to the structure of the original C implementation in deps/triemap The latter is included as a baseline. * Rename 'offset' to 'target' in 'shift_left'
Describe the changes in the pull request
A Rust trie that's much closer to the structure of the original C implementation in
deps/triemap.Design
Both label and children arrays are inlined in the buffer backing the node, matching the original C design.
This requires us to use
reallocwhenever we add/remove a child or modify the label.At the same time, it minimizes memory overhead (no capacity to keep track of!) and pointer indirection when performing trie operations.
Complexity
Rust doesn't provide (yet) a lot of machinery to define and manipulate custom dynamically-sized types, such as the one we're defining here. This translates into a significant number of
unsafeblocks, which must be reviewed with extreme care.I've tried, whenever possible, to build safe(r) abstractions and thus minimize the cognitive overhead/the complexity of the
// SAFETYcomments.mirihasn't detected any UB.Performance
The Rust implementation appears to be faster than the C implementation, no matter the operation and the dataset we tested.
Its memory footprint is comparable: 0.470 MBs for Rust vs 0.440 MBs for C to store the Gutenberg dataset.
Benchmark results (lower is better):
Mark if applicable