Skip to content

SIM905 fix doesn’t treat U+001C..U+001F as white space #19845

@dscorbett

Description

@dscorbett

Summary

str.split and str.rsplit categorize the four characters U+001C..U+001F as white space but split-static-string (SIM905) doesn’t. SIM905 is implemented using the Rust methods str::split_whitespace, str::trim_start, and str::trim_end, which define white space as Unicode [:White_Space:], but Python defines white space as [[:bc=B:][:bc=S:][:bc=WS:][:Zs:]]. This discrepancy can make the fix change a program’s behavior. Example:

$ cat >sim905.py <<'# EOF'
print("S\x1cP\x1dL\x1eI\x1fT".split())
print("\x1c\x1d\x1e\x1f>".split(maxsplit=0))
print("<\x1c\x1d\x1e\x1f".rsplit(maxsplit=0))
# EOF

$ python sim905.py
['S', 'P', 'L', 'I', 'T']
['>']
['<']

$ ruff --isolated check sim905.py --select SIM905 --fix
Found 3 errors (3 fixed, 0 remaining).

$ python sim905.py
['S\x1cP\x1dL\x1eI\x1fT']
['\x1c\x1d\x1e\x1f>']
['<\x1c\x1d\x1e\x1f']

Version

ruff 0.12.8 (f51a228 2025-08-07)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions