Add fuzz testing for UTF8 LIKE pruning#13253
Closed
alamb wants to merge 5 commits intoapache:mainfrom
Closed
Conversation
adriangb
reviewed
Nov 4, 2024
Comment on lines
+29
to
+34
| /// Tests for `LIKE` with truncated statistics to validate incrementing logic | ||
| /// | ||
| /// Create several 2 row batches and ensure that `LIKE` with the min and max value | ||
| /// are correctly pruned even when the "statistics" are trunated. | ||
| #[test] | ||
| fn test_prune_like_truncated_statistics() { |
Contributor
There was a problem hiding this comment.
I think it's also worth having tests for = and maybe other operators, it was not immediately obvious to me that there wasn't a bug with those as well.
adriangb
reviewed
Nov 4, 2024
Comment on lines
+35
to
+37
| // Make 2 row random UTF-8 strings | ||
| let mut rng = thread_rng(); | ||
| let statistics = TestPruningStatistics::new(&mut rng, 100); |
Contributor
There was a problem hiding this comment.
I imagine a lot of the bugs are going to be around edge cases: empty strings, non-ascii characters, etc. Is there any way we could inject those into the randomness? Maybe what we need here more than random fuzzing is a matrix style test:
- Generate N full length values, including some random ones?
- Arrange them into row groups in multiple orders, of multiple sizes
- Truncate the stats to lengths between 1 and large
And make sure the results with and without pruning match?
This was referenced Jan 9, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft as it builds on #12978
Which issue does this PR close?
Part of #507
Rationale for this change
While working on #12978 with @adriangb and @findepi I am having nightmares of subtle bugs introduced with truncated statistics
What changes are included in this PR?
Fuzz tests for pruning with truncated statistics / prefix values
Are these changes tested?
It is only tests
cargo test --test fuzz -- pruningAre there any user-facing changes?
No, tests only