Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault during eval of a regex #10

Closed
sharksforarms opened this issue Sep 3, 2019 · 11 comments
Closed

Segfault during eval of a regex #10

sharksforarms opened this issue Sep 3, 2019 · 11 comments

Comments

@sharksforarms
Copy link
Contributor

Found this interesting segfault today! Seems to happen using this regex, possibly others... But if you remove a single x, it no longer segfaults...

Here is a barebone example

use pcre2::bytes::RegexBuilder;

fn main() {
    let rgx = RegexBuilder::new().jit_if_available(true).build("xxxx|xxxx|xxxx").unwrap();
    rgx.is_match(&Vec::new()).unwrap();
}
@BurntSushi
Copy link
Owner

Could you please do the debugging work required to confirm that it's a bug in this library and not PCRE2 itself? I can indeed reproduce this, however, tweaking the regex as you suggest makes the segfault go away. That strongly points toward PCRE2 itself.

@BurntSushi
Copy link
Owner

Hmmm, I wonder if this is related to the fact that you're passing an empty vec. Since it doesn't allocate, it probably points to something that can't be dereferenced. So this library probably needs to special case an empty slice.

@sharksforarms
Copy link
Contributor Author

That's a possibility, but I find it odd that the behavior would change with the regex, if it is upstream, possibly a optimization bug? I'll look into it further

@BurntSushi
Copy link
Owner

BurntSushi commented Sep 3, 2019

I think the empty slice issue is the more likely candidate.

But changing a regex and having it result in buggy behavior is by itself not odd at all. Lots of optimizations in production grade regex engines have thresholds and what not, which trigger different code paths. For example, in your regex, all branches have the same fixed length. So the regex engine can tell that there is no match by just inspecting the length. It could do the same thing when you remove an x (via a minimum fixed length), but there's no guarantee that that specific optimization is implemented. Of course, the reverse is happening here, but still, something like that could be going on.

@sharksforarms
Copy link
Contributor Author

Yes most likely rgx.is_match(&vec![0]).unwrap(); passes

@sharksforarms
Copy link
Contributor Author

sharksforarms commented Sep 4, 2019

    pub unsafe fn find(
        &mut self,
        code: &Code,
        subject: &[u8],
        start: usize,
        options: u32,
    ) -> Result<bool, Error> {
        println!("pointer: {:?}", subject.as_ptr());
        let rc = pcre2_match_8(
            code.as_ptr(),
            subject.as_ptr(),
            subject.len(),
            start,
            options,
            self.as_mut_ptr(),
            self.match_context,
        );

subject.as_ptr() seems to return an invalid pointer, an empty Vec does not allocate

@sharksforarms
Copy link
Contributor Author

sharksforarms commented Sep 4, 2019

Not exactly sure what the best solution is yet, please take a look at the pull-request! Happy to make changes.

@BurntSushi
Copy link
Owner

Closed by #11.

@sharksforarms
Copy link
Contributor Author

For completeness I also reported it to pcre2 maintainers: https://bugs.exim.org/show_bug.cgi?id=2440

On one hand: Yes, we're passing it an invalid pointer, who knows what will happen
On the other hand: The length is 0, the pointer shouldn't be read. Also only happens on certain regex.

@sharksforarms
Copy link
Contributor Author

@BurntSushi
Copy link
Owner

Awesome, thanks!

BurntSushi added a commit that referenced this issue Jul 30, 2024
To work around likely bugs in (older versions of) PCRE2. Namely, at one
point, PCRE2 would dereference the haystack pointer even when the length
was zero.

This was reported in #10 and we worked around this in #11 by passing a
pointer to a const `&[]`, with the (erroneous) presumption that this
would be a valid pointer to dereference. In retrospect though, this was
a little silly, because you should never be dereferencing a pointer to
an empty slice. It's not valid. Alas, at that time, Rust did actually
hand you a valid pointer that could be dereferenced. But [this
PR][rust-pull] changed that. And thus, we're back to where we started:
handing buggy versions of PCRE2 a zero length haystack with a dangling
pointer.

So we fix this once and for all by passing a slice of length 1, but with
a haystack length of 0, to the PCRE2 search routine when searching an
empty haystack. This will guarantee the provision of a dereferencable
pointer should PCRE2 decide to dereference it.

Fixes #42

[rust-pull]: rust-lang/rust#123936
BurntSushi added a commit that referenced this issue Jul 30, 2024
To work around likely bugs in (older versions of) PCRE2. Namely, at one
point, PCRE2 would dereference the haystack pointer even when the length
was zero.

This was reported in #10 and we worked around this in #11 by passing a
pointer to a const `&[]`, with the (erroneous) presumption that this
would be a valid pointer to dereference. In retrospect though, this was
a little silly, because you should never be dereferencing a pointer to
an empty slice. It's not valid. Alas, at that time, Rust did actually
hand you a valid pointer that could be dereferenced. But [this
PR][rust-pull] changed that. And thus, we're back to where we started:
handing buggy versions of PCRE2 a zero length haystack with a dangling
pointer.

So we fix this once and for all by passing a slice of length 1, but with
a haystack length of 0, to the PCRE2 search routine when searching an
empty haystack. This will guarantee the provision of a dereferencable
pointer should PCRE2 decide to dereference it.

Fixes #42

[rust-pull]: rust-lang/rust#123936
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants