[ty] Replace strsim with CPython-based Levenshtein implementation#23291
Merged
AlexWaygood merged 7 commits intomainfrom Feb 16, 2026
Merged
[ty] Replace strsim with CPython-based Levenshtein implementation#23291AlexWaygood merged 7 commits intomainfrom
strsim with CPython-based Levenshtein implementation#23291AlexWaygood merged 7 commits intomainfrom
Conversation
Replace the `strsim::damerau_levenshtein` dependency with a Levenshtein implementation ported from CPython's suggestion algorithm. The new implementation uses case-insensitive character matching with weighted costs (matching CPython's approach) and improved max-distance heuristics, rather than the simple Damerau-Levenshtein with lowercased comparison. This removes the `strsim` crate dependency from `ty_python_semantic`. https://claude.ai/code/session_014vwDhReNbswR4MXhYNcvBW
- Rename `unresolved_member` to `typo` and `member` to `candidate` in
levenshtein.rs since the module is used for TypedDict key suggestions
- Move `pub(crate) use diagnostic::levenshtein` to the use-statement
section in types.rs
- Update .md test files with typos that trigger suggestions under the
new Levenshtein algorithm ("nane" -> "name", "leg" -> "legs")
- Add intentional misspellings from test data to _typos.toml allowlist
https://claude.ai/code/session_014vwDhReNbswR4MXhYNcvBW
Exclude the entire levenshtein.rs test file from the typos spell checker (matching the approach in #18751) rather than adding individual misspelled words to the allowlist. https://claude.ai/code/session_014vwDhReNbswR4MXhYNcvBW
Typing conformance resultsNo changes detected ✅ |
|
Memory usage reportMemory usage unchanged ✅ |
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
invalid-key |
0 | 0 | 95 |
invalid-argument-type |
1 | 1 | 31 |
type-assertion-failure |
0 | 1 | 3 |
unresolved-attribute |
0 | 0 | 3 |
invalid-assignment |
0 | 0 | 2 |
invalid-return-type |
0 | 1 | 0 |
unsupported-operator |
0 | 0 | 1 |
| Total | 1 | 3 | 135 |
|
…nostic.rs` - Move `levenshtein.rs` from `types::diagnostic` to `diagnostic` (converting `diagnostic.rs` to `diagnostic/mod.rs`) - Revert all changes to `types/diagnostic.rs`: restore `did_you_mean` import and original callsite instead of calling levenshtein directly - Remove the `levenshtein` re-export from `types.rs` - Use `HideUnderscoredSuggestions::Yes` in `did_you_mean` - Update `_typos.toml` exclusion path to match new file location https://claude.ai/code/session_014vwDhReNbswR4MXhYNcvBW
cbca655 to
9ce308e
Compare
strsim with CPython-based Levenshtein implementation
sharkdp
approved these changes
Feb 16, 2026
Contributor
sharkdp
left a comment
There was a problem hiding this comment.
You seem very eager to maintain your own implementation, so I'm not going object this time 😛.
Thanks!
Member
Author
|
Haha. Well, with a higher-quality Levenshtein implementation, we can consider reviving #21780, for example — which wasn't viable with |
carljm
added a commit
that referenced
this pull request
Feb 16, 2026
* main: (43 commits) [`ruff`] Suppress diagnostic for strings with backslashes in interpolations before Python 3.12 (`RUF027`) (#21069) [flake8-bugbear] Fix B023 false positive for immediately-invoked lambdas (#23294) [ty] Add `Final` mdtests for loops and redeclaration (#23331) [`flake8-pyi`] Also check string annotations (`PYI041`) (#19023) Remove AlexWaygood as a flake8-pyi codeowner (#23347) [ty] Add comments to clarify the purpose of `NominalInstanceType::class_name` and `NominalInstanceType::class_module_name` (#23339) Add attestations for release artifacts and Docker images (#23111) [ty] Fix `assert_type` diagnostic messages (#23342) [ty] Force-update all insta snapshots (#23343) Add Q004 to the list of conflicting rules (#23340) [ty] Fix `invalid-match-pattern` false positives (#23338) [ty] new diagnostic called-match-pattern-must-be-a-type (#22939) [ty] Update flaky projects (#23337) [ty] Increase timeout for ecosystem report to 40 min (#23336) Bump ecosystem-analyzer pin (#23335) [ty] Replace `strsim` with CPython-based Levenshtein implementation (#23291) [ty] Add mdtest for staticmethod assigned in class body (#23330) [ty] fix inferring type variable from string literal argument (#23326) [ty] bytes literal is a sequence of integers (#23329) Update rand and getrandom (#23333) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
For a couple of diagnostics currently, we add a "Did you mean...?" diagnostic hint if it appears like there's an obvious typo that caused us to emit an error. The "Did you mean...?" suggestion is generated via the
strsimLevenshtein implementation oncrates.io.This PR replaces the
strsimimplementation of Levenshtein used to create these hints with a custom Levenshtein implementation based on the one that CPython itself uses to create these hints:The added tests are also derived from CPython's test suite.
The motivation for copying CPython's implementation almost exactly is that CPython has had this feature for several Python versions now, and during that time many bug reports have been filed regarding incorrect suggestions, which have since been fixed. This implementation is thus very well "battle-tested" by this point; we can say with a reasonable degree of confidence that it gives good suggestions for typos in the Python context.
The ecosystem report on this PR bears out that this is an improvement. We see bad suggestions going away:
and good suggestions being added:
This Levenshtein implementation was originally proposed in #18705, and then again in #18751. Those PRs also made other changes to use the Levenshtein implementation in certain other areas, however, where computing the list of suggestions to pass into the Levenshtein algorithm turned out to be prohibitively expensive. This PR therefore only updates the Levenshtein implementation being used for our existing subdiagnostics, rather than expanding the callsites of the Levenshtein implementation.
Test plan
Unit tests have been added in
levenshtein.rs. Some mdtests and snapshots were updated to ensure that they still test what they're meant to be testing, even with the new Levenshtein implementation.Co-authored-by: Brent Westbrook [email protected]