[ty] Detect flaky projects during ecosystem CI jobs#23178
Conversation
|
4b99b1c to
47cd57e
Compare
47cd57e to
cb7fe67
Compare
Typing conformance resultsNo changes detected ✅ |
|
524fd18 to
23fb3be
Compare
| --flaky-runs 10 \ | ||
| diff \ | ||
| --profile=profiling \ | ||
| --projects-old ruff/projects_old.txt \ | ||
| --projects-new ruff/projects_new.txt \ | ||
| --projects-flaky ruff/projects_flaky.txt \ |
There was a problem hiding this comment.
This is the job that runs for every PR (that has the ecosystem-analyzer label). Human people are blocked waiting for it to run, so we want to keep its runtime as short as possible. So we only run flake detection for a fixed set of projects. The 10 runs was chosen empirically — that's the number that gave stable results for the flaky projects mentioned below.
| ecosystem-analyzer \ | ||
| --verbose \ | ||
| --repository ruff \ | ||
| --flaky-runs 20 \ |
There was a problem hiding this comment.
This is the weekly CI job that runs in the background and does not block anyone's progress on any given PR. So I'm having it run more EXTREME flake detection against all projects. We can periodically check the results at https://ty-ecosystem-ext.pages.dev/ to see if we need to update the list.
* main: [ty] Detect flaky projects during ecosystem CI jobs (astral-sh#23178) [ty] Disallow TypeVars within ClassVars (astral-sh#23184)
This brings in an
ecosystem-analyzerupdate that detects when we produce flaky results for a project.We do this by running
tymultiple times against each project, looking for when we produce a different diagnostic at the same file/line/column location. If we do, we consider that diagnostic as flaky. ThediffandanalyzeHTML reports have been updated to show flakiness information. (A diagnostic changing from "flaky" to "not flaky", or vice versa, is now considered a change, as is a different set of diagnostic codes/messages appearing for a flaky diagnostic.)Since most projects aren't flaky 🤞, in the per-PR CI job, we only run flake detection against the projects that we known are flaky. For the others, we still run
tyonce and don't consider any of the diagnostics flaky. In the weekly background job, we run flake detection more aggressively, and on all projects, so that we can update the flaky project list when needed.