Skip to content

use lazyregexp to compile regexes on first use#48166

Merged
thaJeztah merged 16 commits intomoby:masterfrom
thaJeztah:regex_oncevalue
Jan 2, 2025
Merged

use lazyregexp to compile regexes on first use#48166
thaJeztah merged 16 commits intomoby:masterfrom
thaJeztah:regex_oncevalue

Conversation

@thaJeztah
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah commented Jul 15, 2024

implement lazyregexp package based on the "lazyregexp" package in golang.org/x/mod;
https://cs.opensource.google/go/x/mod/+/refs/tags/v0.19.0:internal/lazyregexp/lazyre.go;l=66-78

This package allows defining regular expressions that should not be
compiled until used, but still providing validation to prevent
invalid regular expressions from producing a panic at runtime.

This is largely a copy of the package from golang.org/x/mod,
with FindAllStringSubmatch and ReplaceAllLiteralString,
and ReplaceAllStringFunc added

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah thaJeztah added this to the 28.0.0 milestone Jul 15, 2024
@thaJeztah thaJeztah self-assigned this Jul 15, 2024
@thaJeztah thaJeztah marked this pull request as ready for review July 15, 2024 15:49
@thaJeztah
Copy link
Copy Markdown
Member Author

Ah, need a rebase for daemon/prune.go

@thaJeztah
Copy link
Copy Markdown
Member Author

Discussing in the maintainers call; one downside of this is that mistakes in regular expressions may now result in a panic at runtime. Such mistakes should of course be revealed as part of tests (i.e. if it doesn't trigger in tests, there's no coverage of the regular suppression 😇), so we were discussing approaches for having a Linter verify if any regex.MustCompile is covered by tests.

While discussing that @corhere round an interesting package in golang.org/x/mod that seems to handle most of this, but it's internal, so we'd have to copy it. We could rewrite parts of it to us sync.OnceValue() (as it was likely written before that was introduced);

https://pkg.go.dev/golang.org/x/mod/internal/lazyregexp

it has a special case when running under test

https://cs.opensource.google/go/x/mod/+/refs/tags/v0.19.0:internal/lazyregexp/lazyre.go;l=66-78

@corhere
Copy link
Copy Markdown
Contributor

corhere commented Jul 15, 2024

The special case when running under test is much simpler to achieve with Go 1.21: https://pkg.go.dev/testing#Testing

@thaJeztah
Copy link
Copy Markdown
Member Author

Oh! Just after I mentioned "I wish stdlib has something for this" 😂

@kolyshkin
Copy link
Copy Markdown
Contributor

Yet another alternative to this is https://github.com/containers/storage/tree/main/pkg/regexp (godoc: https://pkg.go.dev/github.com/containers/storage/pkg/regexp).

In particular, it solves the issue of checking all regexes by using a build tag (regexp_precompile), so all that's needed is a CI job to compile everything with this tag set.

@thaJeztah
Copy link
Copy Markdown
Member Author

so all that's needed is a CI job to compile everything with this tag set.

Ah, yes, that's an option. I think I want to avoid special build-tags though, and it looks like the testing.Testing() would provide an elegant solution to do this automatically.

Looking at the Lazyregexp approach, the only thing I'm not super fond of is that it's also wrapping the Regexp; I gave it a try to continue using a sync.OnceValue to keep it simple.

@thaJeztah thaJeztah changed the title use sync.OnceValue to compile regexes on first use use lazyregexp (sync.OnceValue) to compile regexes on first use Jul 16, 2024
@corhere
Copy link
Copy Markdown
Contributor

corhere commented Jul 16, 2024

the only thing I'm not super fond of is that it's also wrapping the Regexp; I gave it a try to continue using a sync.OnceValue to keep it simple.

laxyregexp wrapping the Regexp is one of the things I liked the most about it! Wrapping makes it a near drop-in replacement for regexp.MustCompile: none of the lines which reference the compiled regexp need to be touched. And it blocks the shared regexp from being mutated. While your implementation of lazyregexp is simplified, the complexity is pushed to the consumers.

Thoughts on adding a forbidigo lint rule to block direct uses of regexp.MustCompile outside of lazyregexp, like I did with old-style atomics?

@thaJeztah
Copy link
Copy Markdown
Member Author

laxyregexp wrapping the Regexp is one of the things I liked the most about it! Wrapping makes it a near drop-in replacement for regexp.MustCompile: none of the lines which reference the compiled regexp need to be touched.

Yeah, that's a good point, and I was going back-and-forth a bit on that. It's nice for it to be a drop-in replacement, but I was also somewhat concerned on that requiring us to keep it in sync with any changes / new features that may be introduced in regexp.Regexp. Overall it looks fairly stable, but I see for example that go1.21 introduced Regexp.MarshalText() (currently missing in the golang.org/x/mod/internal/lazyregexp implementation, although that implementation looks to be only a subset of regexp.Regexp).

Implementing all of those would not be too complicated, so if we want it to be a full drop-in replacement, I can have a look of that, but input welcome.

@kolyshkin
Copy link
Copy Markdown
Contributor

BTW it probably doesn't make any sense to touch any _test.go files -- those optimizations do not worth it per se. The only worthiness such changes may have is if you consider it a way to test the lazyregexp package, and/or to serve as lazyregexp usage examples.

One other thing is, some uses of regular expressions (in non-test code) may and should be eliminated, as NOT using regexp usually results in faster code and smaller binaries. Yet the task is somewhat hard; for a few examples, see opencontainers/runc#3460 and the linked PRs/CLs.

@corhere
Copy link
Copy Markdown
Contributor

corhere commented Jul 18, 2024

Overall it looks fairly stable, but I see for example that go1.21 introduced Regexp.MarshalText() (currently missing in the golang.org/x/mod/internal/lazyregexp implementation, although that implementation looks to be only a subset of regexp.Regexp).

Implementing only a subset of regexp.Regexp is a feature, not a bug. Regexp.UnmarshalText, like Regexp.Longest(), mutates the receiver! We need package-global regexp values to be immutable to avoid chaos.

if we want it to be a full drop-in replacement

It's in an internal package. We could trivially add wrapper methods on demand, as needed, just like golang.org/x/mod's lazyregexp package. Though I doubt it would take much time to whip up a full (minus methods which mutate the compiled regexp) wrapper with a bit of editor-fu.

@thaJeztah thaJeztah force-pushed the regex_oncevalue branch 3 times, most recently from f291414 to c25d705 Compare January 1, 2025 21:25
@thaJeztah thaJeztah requested a review from samuelkarp as a code owner January 1, 2025 21:25
@thaJeztah thaJeztah changed the title use lazyregexp (sync.OnceValue) to compile regexes on first use use lazyregexp to compile regexes on first use Jan 1, 2025
)

var stripOSC = regexp.MustCompile(`\x1b\][^\x1b\a]*(\x1b\\|\a)`)
var stripOSC = lazyregexp.New(`\x1b\][^\x1b\a]*(\x1b\\|\a)`)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread internal/lazyregexp/lazyregexp.go Outdated
Comment on lines +70 to +72
func (r *Regexp) ReplaceAllLiteralString(src, repl string) string {
return r.re().ReplaceAllLiteralString(src, repl)
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was added for that fix as well, so we could drop it if that one's no longer needed (it's the only place this one was used currently).

@thaJeztah
Copy link
Copy Markdown
Member Author

@corhere @kolyshkin I updated this PR to use a fork of the lazyregexp package from golang.org/x/mod, and added a Linter rule to check for MustCompile() uses in non-test code. I already had some of the tests and test-utility packages updated, so I kept those for now, but let me know if you feel strongly about removing those (at least in _test.go files).

Based on the "lazyregexp" package in golang.org/x/mod;
https://cs.opensource.google/go/x/mod/+/refs/tags/v0.19.0:internal/lazyregexp/lazyre.go;l=66-78

This package allows defining regular expressions that should not be
compiled until used, but still providing validation to prevent
invalid regular expressions from producing a panic at runtime.

This is largely a copy of the package from golang.org/x/mod,
with FindAllStringSubmatch and ReplaceAllStringFunc added

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Signed-off-by: Sebastiaan van Stijn <[email protected]>
@thaJeztah
Copy link
Copy Markdown
Member Author

Rebased after #49195 was merged (dropped one commit, and removed the now unused ReplaceAllLiteralString from lazyregexp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/performance kind/refactor PR's that refactor, or clean-up code status/4-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants