Skip to content

Add str::split_first and others #764

@jdahlstrom

Description

@jdahlstrom

Proposal

Problem statement

There are several ways to get or split off a prefix or suffix of a [_] with a given length, either as a slice, array, or a single element, and most of them work in const by now:

method since const
{get,index}{,_mut} 1.0 no, uses a trait
first and last 1.0 1.83
split_at{_mut} 1.0 1.71, 1.83
split_{first,last}{,_mut} 1.5 1.56, 1.83
{,split_}{first,last}_chunk{_mut} 1.77 1.77
split_at{_mut}_{,un}checked ~1.80 ~1.80
split_off{,_mut} 1.87 no, uses a trait
split_off_{first,last}{,_mut} 1.87 unstable
pattern-matching [fst, rst @ ..] etc ? yes

Additionally, for some use cases {strip,trim}_{pre,suf}fix can also be used (not const, use a trait).

For str the situation is considerably worse, even though it's actually more difficult and error-prone to achieve due to variable-length elements:

method since const notes
{get,index}{,_mut} 1.0 no, uses a trait byte offset
split_at{_mut} 1.4 1.86 byte offset
split_at{_mut}_checked 1.80 1.86 byte offset
Plus {strip,trim}_{pre,suf}fix no, uses a trait

And that's it, I believe. The slice methods are of course available via as_bytes() but are error-prone and the results usually require conversion back to the string world (including a redundant utf-8 check). Note that even the split_at_checked method is very recent, and even more recent in const.

Motivating examples or use cases

To optionally trim a single character from a &str in a const function I found myself having to jump through a few hoops. This is the best I could come up with:

    if let Some((fst, rst)) = s.split_at_checked(1)
        && fst.as_bytes()[0] == b'#'
    {
        s = rst;
    }

This compiles since 1.86 and also works if the string is empty, or the first character in the string is multi-byte, but if the character(s) you want to trim may itself be multi-byte, things become considerably more difficult.

All the other ways that I could think of require either const trait support or str->bytes->str ceremony.

Solution sketch

Existing PR rust-lang/rust#89603 proposed to add str::first, str::split_first, str::last, and str::split_last (a _char was added to the names after someone suggested it). The PR predates ACPs, I believe, and was closed due to lack of author activity after a few months. It adds

pub fn first_char(&self) -> Option<char>
pub fn last_char(&self) -> Option<char>
pub fn split_first_char(&self) -> Option<(char, &str)>
pub fn split_last_char(&self) -> Option<(char, &str)>

which is also what I propose. I'd prefer dropping the _char suffix, but that's a bikeshed problem. If this ACP is accepted, someone(TM) could simply revive #89603, making changes if any are requested by the team.

Alternatives

Wait until const traits are stable, unlocking trimming and indexing. Doesn't help with the byte offset issue.

Add a new pattern syntax for splitting strings the way slices can be split.

Implement these in a third-party crate as extension methods first.

Implement some other combination of the mentioned [_] methods missing in str.

Links and related work

Issue rust-lang/rust#48731 from 2017 (yeah) proposes adding str::split_first.

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ACP-acceptedAPI Change Proposal is accepted (seconded with no objections)T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions