[ty] Add more and update existing projects in ty_benchmark#21536
[ty] Add more and update existing projects in ty_benchmark#21536MichaReiser merged 10 commits intomainfrom
ty_benchmark#21536Conversation
There was a problem hiding this comment.
No, github doesn't recognize the move :(
|
1f5303f to
4cb553b
Compare
d196985 to
e10b1e9
Compare
AlexWaygood
left a comment
There was a problem hiding this comment.
A few small things from skimming!
scripts/ty_benchmark/README.md
Outdated
|
|
||
| The benchmark script supports snapshoting the results when running with `--snapshot` and `--accept`. | ||
| The goal of those snapshots is to catch accidental regressions. They are not intended | ||
| as a testing tool. E.g. the snapshot runner doesn't account for platform differences so that |
There was a problem hiding this comment.
Can/should we "pin" the platform to an arbitrary one, i.e. pass the equivalent of ty's --python-platform=linux to all type checkers?
There was a problem hiding this comment.
It's tricky because you know Python.
Solving for one platform requires installing the dependency for that platform, and that failed for me at least for some of the projects on macos.
| pandas-stubs/_typing.pyi:861:44: error[invalid-argument-type] Argument to class `dtype` is incorrect: Expected `generic[Any]`, found `typing.TypeVar` | ||
| pandas-stubs/_typing.pyi:865:48: error[invalid-argument-type] Argument to class `dtype` is incorrect: Expected `generic[Any]`, found `typing.TypeVar` | ||
| pandas-stubs/_typing.pyi:877:53: error[invalid-argument-type] Argument to class `dtype` is incorrect: Expected `generic[Any]`, found `typing.TypeVar` |
| # pyright exit codes: https://docs.basedpyright.com/v1.31.6/configuration/command-line/#pyright-exit-codes | ||
| # pyrefly exit codes: Not documented | ||
| # ty: https://docs.astral.sh/ty/reference/exit-codes/ | ||
| "-i=1", |
There was a problem hiding this comment.
Minor: I prefer to always use the --long-form of options when calling tools from scripts
| "-i=1", | |
| "--ignore-failure=1", |
a3bdc54 to
4303563
Compare
* main: [ty] Implement `typing.override` (astral-sh#21627) [ty] Avoid expression reinference for diagnostics (astral-sh#21267) [ty] Improve autocomplete suppressions of keywords in variable bindings [ty] Only suggest completions based on text before the cursor Implement goto-definition and find-references for global/nonlocal statements (astral-sh#21616) [ty] Inlay Hint edit follow up (astral-sh#21621) [ty] Implement lsp support for string annotations (astral-sh#21577) [ty] Add 'remove unused ignore comment' code action (astral-sh#21582) [ty] Refactor `CheckSuppressionContext` to use `DiagnosticGuard` (astral-sh#21587) [ty] Improve several "Did you mean?" suggestions (astral-sh#21597) [ty] Add more and update existing projects in `ty_benchmark` (astral-sh#21536) [ty] fix ty playground initialization and vite optimization issues (astral-sh#21471)
Summary
This PR adds more projects to
ty_benchmarkand updates existing benchmarks. It also addspyreflyas a benchmark target. I also made some improvements to result rendering and added a check that the command fails if any type checker exits due to an error other than typing errors (requires hyperfine 1.20 or newer).I don't consider this the final set of projects and I'm happy to add more projects (or remove projects) based on your feedback. Overall, it's fairly tricky to select a set of projects because any project that isn't a library tends to use a mypy-plugin or non-strict type checking options which either results in a lot of diagnostics for type checkers other than the one the project is using, because it would require customizing each type checker's configuration to roughly the same settings. Which I'm not convinced is worth the effort.
We should be careful about drawing early conclusions from the benchmark, especially when comparing ty and pyrefly, because both type checkers are still missing crucial, but different, typing features, where ty is probably a little further behind (at least up to the beta where we add many of those missing large features).
Closes astral-sh/ty#241
I'm not 100% convinced whether we want the snapshotting mechanism, but it's sort of nice to have some way of measuring if the projects still do what one expects them to.
Test Plan