Skip to content

[MOD-9230] "Faithful" trie reimplementation#5882

Merged
LukeMathWalker merged 2 commits intomasterfrom
trie-review
Apr 23, 2025
Merged

[MOD-9230] "Faithful" trie reimplementation#5882
LukeMathWalker merged 2 commits intomasterfrom
trie-review

Conversation

@LukeMathWalker
Copy link
Collaborator

@LukeMathWalker LukeMathWalker commented Apr 6, 2025

Describe the changes in the pull request

A Rust trie that's much closer to the structure of the original C implementation in deps/triemap.

Design

Both label and children arrays are inlined in the buffer backing the node, matching the original C design.
This requires us to use realloc whenever we add/remove a child or modify the label.
At the same time, it minimizes memory overhead (no capacity to keep track of!) and pointer indirection when performing trie operations.

Complexity

Rust doesn't provide (yet) a lot of machinery to define and manipulate custom dynamically-sized types, such as the one we're defining here. This translates into a significant number of unsafe blocks, which must be reviewed with extreme care.

I've tried, whenever possible, to build safe(r) abstractions and thus minimize the cognitive overhead/the complexity of the // SAFETY comments.

miri hasn't detected any UB.

Performance

The Rust implementation appears to be faster than the C implementation, no matter the operation and the dataset we tested.
Its memory footprint is comparable: 0.470 MBs for Rust vs 0.440 MBs for C to store the Gutenberg dataset.

Benchmark results (lower is better):

Gutenberg|Load/Rust                               4299 ns/iter   0.81x
Gutenberg|Load/C                                  5302 ns/iter

Gutenberg|Insert (leaf)/Rust                        56 ns/iter   0.71x
Gutenberg|Insert (leaf)/C                           79 ns/iter

Gutenberg|Insert (split with 2 children)/Rust       44 ns/iter   0.50x
Gutenberg|Insert (split with 2 children)/C          88 ns/iter

Gutenberg|Insert (split with no children)/Rust      39 ns/iter   0.76x
Gutenberg|Insert (split with no children)/C         51 ns/iter

Gutenberg|Find no match/Rust                         7 ns/iter   0.70x
Gutenberg|Find no match/C                           10 ns/iter

Gutenberg|Find match (depth 2)/Rust                  7 ns/iter   0.70x
Gutenberg|Find match (depth 2)/C                    10 ns/iter

Gutenberg|Find match (depth 4)/Rust                 11 ns/iter   0.52x
Gutenberg|Find match (depth 4)/C                    21 ns/iter

Gutenberg|Remove leaf (with merge)/Rust             87 ns/iter   0.64x
Gutenberg|Remove leaf (with merge)/C               135 ns/iter

Gutenberg|Remove leaf (no merge)/Rust               40 ns/iter   0.51x
Gutenberg|Remove leaf (no merge)/C                  78 ns/iter

  Wiki-1K|Load/Rust                              82072 ns/iter   0.72x
  Wiki-1K|Load/C                                114311 ns/iter

  Wiki-1K|Insert (split with 2 children)/Rust       84 ns/iter   0.67x
  Wiki-1K|Insert (split with 2 children)/C         125 ns/iter

  Wiki-1K|Insert (split with 18 children)/Rust      53 ns/iter   0.73x
  Wiki-1K|Insert (split with 18 children)/C         73 ns/iter

  Wiki-1K|Find match (depth 5/Rust                  24 ns/iter   0.65x
  Wiki-1K|Find match (depth 5/C                     37 ns/iter

  Wiki-1K|Find no match (depth 5)/Rust              18 ns/iter   0.60x
  Wiki-1K|Find no match (depth 5)/C                 30 ns/iter

  Wiki-1K|Find no match (depth 1)/Rust               2 ns/iter   0.67x
  Wiki-1K|Find no match (depth 1)/C                  3 ns/iter

  Wiki-1K|Remove internal (with merge)/Rust         89 ns/iter   0.46x
  Wiki-1K|Remove internal (with merge)/C           194 ns/iter

  Wiki-10K|Load/Rust                           1153074 ns/iter   0.88x
  Wiki-10K|Load/C                              1313371 ns/iter

  Wiki-10K|Insert (leaf)/Rust                       70 ns/iter   0.33x
  Wiki-10K|Insert (leaf)/C                         213 ns/iter

  Wiki-10K|Find match (depth 5)/Rust                22 ns/iter   0.46x
  Wiki-10K|Find match (depth 5)/C                   48 ns/iter

  Wiki-10K|Remove leaf (no merge)/Rust             115 ns/iter   0.53x
  Wiki-10K|Remove leaf (no merge)/C                216 ns/iter

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@CLAassistant
Copy link

CLAassistant commented Apr 6, 2025

CLA assistant check
All committers have signed the CLA.

@LukeMathWalker LukeMathWalker force-pushed the trie-review branch 5 times, most recently from 520c80a to 06cb424 Compare April 7, 2025 07:44
@LukeMathWalker LukeMathWalker changed the title "Faithful" trie reimplementation [MOD-9230] "Faithful" trie reimplementation Apr 7, 2025
@LukeMathWalker LukeMathWalker force-pushed the trie-review branch 3 times, most recently from dd8f537 to 82612a9 Compare April 7, 2025 11:54
@LukeMathWalker LukeMathWalker force-pushed the trie-benchmarks branch 3 times, most recently from e59d527 to 7bf4e92 Compare April 8, 2025 12:07
@LukeMathWalker LukeMathWalker force-pushed the trie-review branch 4 times, most recently from 374bc8e to bbc1fa7 Compare April 8, 2025 12:45
Base automatically changed from trie-benchmarks to master April 8, 2025 16:03
@LukeMathWalker LukeMathWalker force-pushed the trie-review branch 2 times, most recently from ec89f22 to 5c193a3 Compare April 14, 2025 14:06
@RediSearch RediSearch deleted a comment from github-actions bot Apr 14, 2025
@LukeMathWalker LukeMathWalker marked this pull request as ready for review April 14, 2025 14:35
@LukeMathWalker LukeMathWalker requested a review from GuyAv46 April 14, 2025 15:05
@RediSearch RediSearch deleted a comment from github-actions bot Apr 15, 2025
@RediSearch RediSearch deleted a comment from github-actions bot Apr 15, 2025
@RediSearch RediSearch deleted a comment from github-actions bot Apr 15, 2025
@RediSearch RediSearch deleted a comment from github-actions bot Apr 15, 2025
@RediSearch RediSearch deleted a comment from github-actions bot Apr 15, 2025
@RediSearch RediSearch deleted a comment from github-actions bot Apr 15, 2025
@RediSearch RediSearch deleted a comment from github-actions bot Apr 15, 2025
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

2 similar comments
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Copy link
Collaborator

@raz-mon raz-mon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10/17 files reviewed (to be continued) - looking good 👍

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

…ginal C implementation in deps/triemap

The latter is included as a baseline.
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Copy link
Collaborator

@GuyAv46 GuyAv46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will finish going over trie_ops.rs tomorrow.

match longest_common_prefix(current.label(), key) {
Some((0, ordering)) => {
// The node's label and the key don't share a common prefix.
// This can only happen if the current node is the root node of the trie.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true only if insert_or_replace_with was called on the root node, right?
Shouldn't we prevent calling the API in this file for non-root nodes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct there, this is only true if insert_or_replace_with is called on the root node.

Shouldn't we prevent calling the API in this file for non-root nodes?

The Node type is not exposed through the public API of the trie_rs crate—its methods can only be invoked from inside this crate, which is why the comment reads as it does.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Copy link
Collaborator

@raz-mon raz-mon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general.
Hard to say about the nitty gritty Rust details (sorry) but the general functionality looks good.

// insert the old root as a child of the new root,
// and add a new child to the empty root.
current.map(|old_root| {
let new_child = Node::new_leaf(key, Some(f(None)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the usage of Some(f(None))?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f always returns a value, so we need to wrap it in Some in order to get an Option<Data>.
f is needed because, when invoked from C, we need to give C a way to free the previous value (if one existed), hence the callback. This will become more straightforward (i.e. no closures) once we don't have that requirement anymore.

@LukeMathWalker LukeMathWalker added this pull request to the merge queue Apr 23, 2025
Merged via the queue into master with commit fe7ab04 Apr 23, 2025
11 checks passed
@LukeMathWalker LukeMathWalker deleted the trie-review branch April 23, 2025 16:48
JoanFM pushed a commit that referenced this pull request May 27, 2025
* [MOD-9230] A Rust trie that's much closer to the structure of the original C implementation in deps/triemap

The latter is included as a baseline.

* Rename 'offset' to 'target' in 'shift_left'
JoanFM pushed a commit that referenced this pull request May 27, 2025
* [MOD-9230] A Rust trie that's much closer to the structure of the original C implementation in deps/triemap

The latter is included as a baseline.

* Rename 'offset' to 'target' in 'shift_left'
@LukeMathWalker LukeMathWalker restored the trie-review branch February 6, 2026 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants