[str] Add LiteralString overload for __getitem__#12714
[str] Add LiteralString overload for __getitem__#12714JelleZijlstra merged 3 commits intopython:mainfrom
LiteralString overload for __getitem__#12714Conversation
In PEP 675, @gbleaney and I had specified a list of `LiteralString`-preserving [overloads](https://peps.python.org/pep-0675/#appendix-c-str-methods-that-preserve-literalstring) for `str`. However, we didn't specify an overload for `__getitem__` and didn't give any rationale either. IIRC this was an edge case we didn't want to take a strong decision on unless users wanted it. @carljm brought this up yesterday, so I think it's worth discussing. Pro: `my_literal_string[i]` or `my_literal_string[i:j]` should technically be compatible with `LiteralString`, since it is a substring of a literal-derived string. Con: The main downside is that an attacker might control the indexes and try to access a specific substring from a literal string in the code. For example, they might narrow down the string to `rm foo` or `SELECT *`. It's true that `join` and other methods could also construct dangerous strings from `LiteralString`s, and we even call that out as an accepted tradeoff in the PEP: > 4. Trivial functions could be constructed to convert a str to a LiteralString: > > def make_literal(s: str) -> LiteralString: > letters: Dict[str, LiteralString] = { > "A": "A", > "B": "B", > ... > } > output: List[LiteralString] = [letters[c] for c in s] > return "".join(output) > > We could mitigate the above using linting, code review, etc., but ultimately a clever, malicious developer attempting to circumvent the protections offered by LiteralString will always succeed. The important thing to remember is that LiteralString is not intended to protect against malicious developers; it is meant to protect against benign developers accidentally using sensitive APIs in a dangerous way (without getting in their way otherwise). > > Without LiteralString, the best enforcement tool API authors have is documentation, which is easily ignored and often not seen. With LiteralString, API misuse requires conscious thought and artifacts in the code that reviewers and future developers can notice. > > -- [PEP 675 - Appendix B: Limitations](https://peps.python.org/pep-0675/#appendix-b-limitations) `__getitem__`, however, seems a bit different, because it (and `split`, `zfill`, etc.) accept an index or width that could be used to construct a dangerous query or a humongous string. So, we need to clarify the intent a little. What was the intent of these overloads? We wanted to forbid "arbitrary user-supplied strings" while allowing methods that preserved literal strings. We were not trying to prevent every possible exploit on the string. Since `__getitem__` forbids arbitrary user-supplied strings and preserves literal strings, I think we should add an overload for it. Thanks to @carljm for bringing this up! cc @gbleaney @JelleZijlstra in case they have different opinions. Would also love to hear from other users.
…load. This should fix the error: ``` /opt/hostedtoolcache/Python/3.10.15/x64/bin/python -m mypy.stubtest --check-typeshed --custom-typeshed-dir . --allowlist stdlib/@tests/stubtest_allowlists/common.txt --allowlist stdlib/@tests/stubtest_allowlists/linux.txt --allowlist stdlib/@tests/stubtest_allowlists/py310.txt --allowlist stdlib/@tests/stubtest_allowlists/linux-py310.txt error: not checking stubs due to mypy build errors: stdlib/builtins.pyi:595: error: Overloaded function signature 2 will never be matched: signature 1's parameter type(s) are the same or broader [misc] ``` I'm not sure why there is an error, because the overload seems valid to me. Other `LiteralString` overloads have the same `type: ignore[misc]`, so I'm just going to copy them.
This comment has been minimized.
This comment has been minimized.
AlexWaygood
left a comment
There was a problem hiding this comment.
To me this makes sense. I can't think of any way in which this would be any less safe from a security perspective than the existing LiteralString overload for str.__mul__, which also allows arbitrary integers and could also create a potentially huge string.
I think this can just be ignored, because mypy still doesn't support LiteralString, and patches their copy of builtins.pyi so that all LiteralString overloads are removed from it. |
|
This looks good to me, thanks @pradeep90 ! I don't understand the mypy-primer errors, but I'll take @AlexWaygood 's word for it that they aren't a problem. |
It's a thing mypy does so they can halfway-but-not-really support |
|
The mypy-primer errors seem to be because mypy issues a different error code for errors involving |
|
(technically mypy doesn't patch typeshed for it to treat LiteralString as str, it just goes ahead and does that. The patch is because all the extra overloads with a str alias get pretty confusing and result in false positives and worse diagnostics) But yes, either way the conclusion is that mypy_primer on this PR is meaningless. |
|
Thanks, everyone! @AlexWaygood or @JelleZijlstra, how to merge this PR? I don't think I have write access. (I was unfamiliar with GitHub's UI and pressed "Update branch", but I don't think that does it.) |
JelleZijlstra
left a comment
There was a problem hiding this comment.
I can merge it once CI passes.
|
Diff from mypy_primer, showing the effect of this PR on open source code: streamlit (https://github.com/streamlit/streamlit)
- lib/tests/streamlit/elements/lib/column_config_utils_test.py:385:30: error: Invalid index type "str" for "Union[ColumnConfig, str, None]"; expected type "Union[SupportsIndex, slice]" [index]
+ lib/tests/streamlit/elements/lib/column_config_utils_test.py:385:13: error: No overload variant of "__getitem__" of "str" matches argument type "str" [call-overload]
+ lib/tests/streamlit/elements/lib/column_config_utils_test.py:385:13: note: Possible overload variants:
+ lib/tests/streamlit/elements/lib/column_config_utils_test.py:385:13: note: def __getitem__(self, Union[SupportsIndex, slice], /) -> str
prefect (https://github.com/PrefectHQ/prefect)
- src/prefect/cli/cloud/__init__.py:326: error: Invalid index type "str" for "dict[UUID, list[Workspace]]"; expected type "UUID" [index]
- src/prefect/cli/cloud/__init__.py:326: error: Invalid index type "str" for "str"; expected type "SupportsIndex | slice" [index]
+ src/prefect/cli/cloud/__init__.py:326: error: No overload variant of "__getitem__" of "str" matches argument type "str" [call-overload]
+ src/prefect/cli/cloud/__init__.py:326: note: Possible overload variants:
+ src/prefect/cli/cloud/__init__.py:326: note: def __getitem__(self, SupportsIndex | slice, /) -> str
dd-trace-py (https://github.com/DataDog/dd-trace-py)
+ ddtrace/settings/config.py:840: error: Unused "type: ignore" comment [unused-ignore]
+ ddtrace/settings/config.py:840: error: No overload variant of "__getitem__" of "str" matches argument type "str" [call-overload]
+ ddtrace/settings/config.py:840: note: Error code "call-overload" not covered by "type: ignore" comment
+ ddtrace/settings/config.py:840: note: Possible overload variants:
+ ddtrace/settings/config.py:840: note: def __getitem__(self, SupportsIndex | slice, /) -> str
|
In PEP 675, @gbleaney and I had specified a list of
LiteralString-preserving overloads forstr. However, we didn't specify an overload for__getitem__and didn't give any rationale either. IIRC this was an edge case we didn't want to take a strong decision on unless users wanted it.@carljm brought this up yesterday, so I think it's worth discussing.
Pro:
my_literal_string[i]ormy_literal_string[i:j]should technically be compatible withLiteralString, since it is a substring of a literal-derived string.Con: The main downside is that an attacker might control the indexes and try to access a specific substring from a literal string in the code. For example, they might narrow down the string to
rm fooorSELECT *.It's true that
joinand other methods could also construct dangerous strings fromLiteralStrings, and we even call that out as an accepted tradeoff in the PEP:__getitem__, however, seems a bit different, because it (andsplit,zfill, etc.) accept an index or width that could be used to construct a dangerous query or a humongous string.So, we need to clarify the intent a little. What was the intent of these overloads? We wanted to forbid "arbitrary user-supplied strings" while allowing methods that preserved literal strings. We were not trying to prevent every possible exploit on the string. Since
__getitem__forbids arbitrary user-supplied strings and preserves literal strings, I think we should add an overload for it.Thanks to @carljm for bringing this up!
cc @gbleaney @JelleZijlstra in case they have different opinions. Would also love to hear from other users.