-
Notifications
You must be signed in to change notification settings - Fork 1.7k
OS string string-like interface #1309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
See rust-lang/rust#26499 (related discussion) |
|
Thanks. I'd missed that discussion.
That also suggests the question of whether these functions should take things like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding string-like APIs to OsStr, it may be best to try to stick to original str APIs as much as possible, while also allowing all kinds of fun functionality for OsStr. Along those lines, perhaps this API could look like:
fn starts_with<S: AsRef<OsStr>>(&self, prefix: S) -> bool;That should cover this use case as well as something like os_str.starts_with(&other_os_string).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable. I'll make that change.
|
Thanks for the RFC @wthrowe! It does seem high time that we start expanding the API for My thoughts on this in the past have been primarily along the lines of:
|
|
I'll look into writing up equivalents of some of the more general pattern matching |
|
Yeah I agree that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading between lines I think you already know that, but WTF-8 to UTF-8 conversion can be done in place: https://simonsapin.github.io/wtf-8/#converting-wtf-8-utf-8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. In fact, the internal WTF-8 implementation in libstd already has an into_string_lossy that does the right thing. It just isn't exposed at the OsString level.
Also adds more explanation of how OS strings are interpreted.
This seems to not prevent anything actually useful and avoids confusion. Also matches `str` better.
|
New version, now with more patterns! The previous proposed functions have mostly been replaced with a new interface matching I did decide to not propose The main uncertainty in the new version is what to do with patterns that match the empty string. The non-iterator functions (like |
|
One more thought on patterns: With more complicated implementors of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be a bit onerous to remember that contains and contains_os are both methods on an OsStr. I think we'd be in a better place (e.g. especially in mirroring str) if we could avoid extra methods like this. It may involve perhaps a new Pattern trait down below, but we may also be able to substitute AsRef<OsStr> for Pattern because str is in theory far more ubiquitous than OsStr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some kind of OsPattern trait implemented for P: Pattern and OsStr could probably work. (Implementing for AsRef<OsStr> as well would be forbidden by coherence, I believe.) There might be some trickiness due to starts_with and contains having different bounds. Have to think about it.
Edit: I think that can be dealt with by bounds on the trait methods.
No reason such a thing couldn't be used for replace as well, except that for some reason str doesn't accept patterns there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I forget the exact reason that replace on strings doesn't take a pattern, but I think there may be a good reason? (cc @Kimundi)
Otherwise yeah having an OsPattern trait seems not-too-bad here perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a possible rough design:
Trait OsPattern<'a> is basically identical to Pattern<'a> except that the strs become OsStrs, the Searcher is bound on OsSearcher<'a>, and is_contained_in has an extra bound (more on that later).
Safe trait OsSearcher<'a> only has methods haystack (same as Searcher<'a>) and is_prefix_of.
Safe trait ReverseOsSearcher<'a> requires OsSearcher<'a> and adds is_suffix_of. OsStr::ends_with requires this bound.
Safe trait FullOsSearcher<'a> requires OsSearcher<'a> and adds is_contained_in. OsStr::contains requires this bound.
If we additionally want to support replace with patterns, then we also have:
Unsafe trait IndexedOsSearcher<'a> requires FullOsSearcher<'a> and adds the remainder of the Searcher<'a> methods, except that all the usize returns are changed to OsIndex<'a>s. OsStr::replace requires this bound.
Enum OsIndex<'a> has variants Unicode(&'a str, usize) and NonUnicode(&'a OsStr, NonUnicodeIndex). The first field in each variant is expected to be a substring of the haystack, and the second is an index into that substring. NonUnicodeIndex is a struct with basically no public interface. (OS-specific interfaces may be added later.)
If we decide to add methods like the split_prefix mentioned above we will likely need to add more variants on the OsSearcher trait. Ideally that would be figured out before any of this is stabilized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in the limit I think this may end up duplicating the API surface area of the string pattern traits, but there's also a question as to whether this needs to be so full-blown just yet. In theory OS strings are used much more rarely than regular strings (especially for various text manipulation routines), so we may be able to get by with a much smaller API surface area.
I wonder if perhaps this could expose a relatively straightforward, if not as generic, API today which leaves room to this sort of expansion in the future but doesn't take the leap quite just yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflecting on this, I think at the very least IndexedOsSearcher is silly, because I can't think of any case where one would implement FullOsSearcher but not it, so they can be merged.
The only reasonable simplification of the API that I can think of that can be generalized to something like this is to just bound on the Pattern family of traits. Then there are no new traits needed, and everything works except for starts_with, ends_with, contains, and replace with an OsStr. That's probably not too bad as a restriction, and I believe changing to OsPattern at some later time would be fully backwards compatible.
|
I've unfortunately not been able to find much time to allocate to this RFC, so @Kimundi (who originally developed the pattern API for strings) is gonna take over this. |
|
Hi, just wanted to say that haven't forgotten about this RFC, but I'm currently investigating how a fully general pattern API (for arbitrary slice types) would look like, and whether there might be incompatibilities or method name clashes with this RFC. |
|
@Kimundi Thanks for the update - could really do with this functionality! |
|
Update: I have a incomplete, undocumented prototype at https://github.com/Kimundi/rust_pattern_api_v2 that provides the same Pattern API + iterators for both I need to find time to do a proper writeup for my findings, but in regard to this RFC the cliff notes are:
|
|
The libs team discussed this RFC recently and the conclusion was that while we'd like to implement something along these lines the RFC will need much of a revamp now and we unfortunately haven't been able to reach the author, so we're going to close. We're of course quite willing to entertain RFCs in this area though! |
Rendered
Prototype implementation (also covers #1307)
Cc: #900, #1307