byte_pattern: share the TwoWaySearcher between byte and str #135931
Open
folkertdev wants to merge 1 commit intorust-lang:mainfrom
Open
byte_pattern: share the TwoWaySearcher between byte and str #135931folkertdev wants to merge 1 commit intorust-lang:mainfrom
byte_pattern: share the TwoWaySearcher between byte and str #135931folkertdev wants to merge 1 commit intorust-lang:mainfrom
Conversation
folkertdev
commented
Jan 23, 2025
Comment on lines
-1152
to
+1164
| SearchStep::Reject(a, mut b) => { | ||
| byte_pattern::SearchStep::Reject(a, mut b) => { | ||
| // skip to next char boundary | ||
| while !self.haystack.is_char_boundary(b) { | ||
| b += 1; | ||
| } | ||
| searcher.position = cmp::max(b, searcher.position); | ||
| SearchStep::Reject(a, b) | ||
| } | ||
| otherwise => otherwise, | ||
| byte_pattern::SearchStep::Match(a, b) => SearchStep::Match(a, b), | ||
| byte_pattern::SearchStep::Done => SearchStep::Done, |
Contributor
Author
There was a problem hiding this comment.
I duplicated SearchStep because it is a public type and its documentation refers to the Searcher trait. The byte_pattern module will have it's own Searcher trait (or ByteSearcher maybe) and so that documentation would be misleading one way or the other.
Comment on lines
+1000
to
+1001
| if let Some(result) = simd_contains(self, haystack) { | ||
| if let Some(result) = simd_contains(self.as_bytes(), haystack.as_bytes()) { |
Contributor
Author
There was a problem hiding this comment.
this function just takes a slice of bytes now. From what I can see the implementation does not rely on the input being UTF8 at all.
This comment has been minimized.
This comment has been minimized.
ea951c7 to
0b23d41
Compare
This comment has been minimized.
This comment has been minimized.
0b23d41 to
f3cb4ca
Compare
This comment has been minimized.
This comment has been minimized.
f3cb4ca to
c631191
Compare
Collaborator
|
☔ The latest upstream changes (presumably #144393) made this pull request unmergeable. Please resolve the merge conflicts. |
Member
|
Triage: Is it possible to break this PR up into even smaller PRs? That would make code review easier. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
tracking issue: #134149
An attempt to break up #134350 into more manageable pieces.
From what I can see, the
TwoWaySearcherimplementation does not have special logic for UTF8 boundaries, so it should work just as well on any&[u8]. So this PR just moves theTwoWaySearcherimplementation toslice/byte_pattern.rs, and then uses it fromstr/pattern.rs. No functional changes, no additional API surface.r? @BurntSushi