Skip to content

byte_pattern: share the TwoWaySearcher between byte and str #135931

Open
folkertdev wants to merge 1 commit intorust-lang:mainfrom
folkertdev:byte-pattern-two-way-searcher
Open

byte_pattern: share the TwoWaySearcher between byte and str #135931
folkertdev wants to merge 1 commit intorust-lang:mainfrom
folkertdev:byte-pattern-two-way-searcher

Conversation

@folkertdev
Copy link
Copy Markdown
Contributor

tracking issue: #134149

An attempt to break up #134350 into more manageable pieces.

From what I can see, the TwoWaySearcher implementation does not have special logic for UTF8 boundaries, so it should work just as well on any &[u8]. So this PR just moves the TwoWaySearcher implementation to slice/byte_pattern.rs, and then uses it from str/pattern.rs. No functional changes, no additional API surface.

r? @BurntSushi

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 23, 2025
Comment on lines -1152 to +1164
SearchStep::Reject(a, mut b) => {
byte_pattern::SearchStep::Reject(a, mut b) => {
// skip to next char boundary
while !self.haystack.is_char_boundary(b) {
b += 1;
}
searcher.position = cmp::max(b, searcher.position);
SearchStep::Reject(a, b)
}
otherwise => otherwise,
byte_pattern::SearchStep::Match(a, b) => SearchStep::Match(a, b),
byte_pattern::SearchStep::Done => SearchStep::Done,
Copy link
Copy Markdown
Contributor Author

@folkertdev folkertdev Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I duplicated SearchStep because it is a public type and its documentation refers to the Searcher trait. The byte_pattern module will have it's own Searcher trait (or ByteSearcher maybe) and so that documentation would be misleading one way or the other.

Comment thread library/core/src/str/pattern.rs Outdated
Comment on lines +1000 to +1001
if let Some(result) = simd_contains(self, haystack) {
if let Some(result) = simd_contains(self.as_bytes(), haystack.as_bytes()) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function just takes a slice of bytes now. From what I can see the implementation does not rely on the input being UTF8 at all.

@rust-log-analyzer

This comment has been minimized.

@folkertdev folkertdev force-pushed the byte-pattern-two-way-searcher branch from ea951c7 to 0b23d41 Compare January 23, 2025 12:54
@rust-log-analyzer

This comment has been minimized.

@folkertdev folkertdev force-pushed the byte-pattern-two-way-searcher branch from 0b23d41 to f3cb4ca Compare January 23, 2025 13:58
@rust-log-analyzer

This comment has been minimized.

@folkertdev folkertdev force-pushed the byte-pattern-two-way-searcher branch from f3cb4ca to c631191 Compare January 23, 2025 20:00
@bors
Copy link
Copy Markdown
Collaborator

bors commented Jul 30, 2025

☔ The latest upstream changes (presumably #144393) made this pull request unmergeable. Please resolve the merge conflicts.

@Enselic
Copy link
Copy Markdown
Member

Enselic commented Jan 20, 2026

Triage: Is it possible to break this PR up into even smaller PRs? That would make code review easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants