[ty] Speedup project file discovery #19913
Conversation
Diagnostic diff on typing conformance testsChanges were detected when running ty on typing conformance tests--- old-output.txt 2025-08-14 18:13:06.319161224 +0000
+++ new-output.txt 2025-08-14 18:13:06.387161856 +0000
@@ -1,5 +1,5 @@
WARN ty is pre-release software and not ready for production use. Expect to encounter bugs, missing features, and fatal errors.
-fatal[panic] Panicked at /home/runner/.cargo/git/checkouts/salsa-e6f3bb7c2a062968/918d35d/src/function/execute.rs:215:25 when checking `/home/runner/work/ruff/ruff/typing/conformance/tests/aliases_typealiastype.py`: `infer_definition_types(Id(409cd)): execute: too many cycle iterations`
+fatal[panic] Panicked at /home/runner/.cargo/git/checkouts/salsa-e6f3bb7c2a062968/918d35d/src/function/execute.rs:215:25 when checking `/home/runner/work/ruff/ruff/typing/conformance/tests/aliases_typealiastype.py`: `infer_definition_types(Id(ac97)): execute: too many cycle iterations`
_directives_deprecated_library.py:15:31: error[invalid-return-type] Function always implicitly returns `None`, which is not assignable to return type `int`
_directives_deprecated_library.py:30:26: error[invalid-return-type] Function always implicitly returns `None`, which is not assignable to return type `str`
_directives_deprecated_library.py:36:41: error[invalid-return-type] Function always implicitly returns `None`, which is not assignable to return type `Self@__add__` |
|
BurntSushi
left a comment
There was a problem hiding this comment.
Nice! This makes sense to me.
crates/ruff_db/src/files.rs
Outdated
| /// | ||
| /// This function differs from [`system_path_to_file`] in that it doesn't add a dependency | ||
| /// on `File.status` and always returns a `File`, even | ||
| /// if `path` doesn't point to a file (e.g. if `path` is a directory). |
There was a problem hiding this comment.
How come? If metadata lets you cheaply query the file type, can you still return a Result here?
I mention this because it seems like we are otherwise careful not to allow a File to be backed by a directory.
There was a problem hiding this comment.
Yeah we could. I had a versio where I did just that.
I considered it to be fine because Files::try_system also doesn't return a Result but the file directly (even if the path maps to a directory). This where I consider the distinction between system_path_to_file which makes sure it adds a query dependency on status (so that your query re-runs if the status changes) and protects you from getting a File that points to a directory or doesn't exist.
Either way, I'm going to revert all of this thanks to @ibraheemdev's suggestion
crates/ty_project/src/walk.rs
Outdated
| paths.push(entry.into_path()); | ||
| // If this returns `Err`, then the file was deleted between now and when the walk callback was called. | ||
| // We can ignore this. | ||
| if let Ok(metadata) = system.path_metadata(entry.path()) { |
There was a problem hiding this comment.
I was wondering why you can't use ignore::DirEntry::file_type here, and I think it's because you also need the file permissions? Because directory entries (on Unix) will include the FileType free of charge. But not permissions.
There was a problem hiding this comment.
Yeah, I thought about using the walk dir entry directly, but we, unfortunately, also need the permissions and, more importantly, the last modified time
ibraheemdev
left a comment
There was a problem hiding this comment.
We also can't add a fork method to ty_project::Db that returns a cloned Self because the Db trait must be dyn compatible.
The usual workaround for this is to have a dyn_clone method that returns a Box<dyn Db>, but the solution here also seems perfectly fine.
Oh nice, that's exactly what I was looking for. |
* main: [ty] Add diagnostics for invalid `await` expressions (#19711) [ty] Synthesize read-only properties for all declared members on `NamedTuple` classes (#19899) [ty] Remove use of `ClassBase::try_from_type` from `super()` machinery (#19902) [ty] Speedup project file discovery (#19913) [`pyflakes`] Add secondary annotation showing previous definition (`F811`) (#19900) Bump 0.12.9 (#19917) [ty] support `kw_only=True` for `dataclass()` and `field()` (#19677)
Summary
The
ProjectFilesWalkerdiscovers all the files to check in a ty project by spawning multiple threads that walk the directory tree.The walker's output is a
VecorSetof all theFiles. The way this works today is that we first collect all file paths (multi threaded)into a single
Vec, and the main thread then interns the paths to theirFile's.The problem with this approach is that
system_path_to_fileis fairly expensive when called for many files because it does astatcall internally.The ideal solution would move the
system_path_to_filecall to the walker threads. That would also eliminate the sharedVecthat collects all paths. Unfortunately, this isn't possible because&dyn Dbisn'tSendnor does&dyn DbimplementClone(Salsa does have afork_dbfeature but this isn't publicly exposed as of today). We also can't add aforkmethod toty_project::Dbthat returns a clonedSelfbecause theDbtrait must be dyn compatible.The solution I've taken here is to move the stat files into the walker threads, which fixes most of the waterfall effect and is much simpler than exposingfork_dbproperly in salsa.I implemented the ideal solution thanks to @ibraheemdev's suggestion to add a
dyn_clone!Fixes astral-sh/ty#970
Test Plan
Before

After