Conversation
|
Hmm, the test failure here is very odd! Test definitely passes locally. Will have to look into this later. |
|
that is really weird. it passes for me locally too (and it's obviously passing on all but one of our CI jobs). That said, the statistic given by the one job that's failing is a statistic that makes more sense intuitively to me... |
|
The issue is that in a release build we (intentionally, for perf reasons) omit tracking of distinct messages / file-and-line information for Todo types. So this results in fewer different kinds of Todo types in a release build. Then this interacts with another bug, which is that we are using |
|
Pushed a commit that fixes statistics merging, and fixes the tests. It remains the case that we can't add any tests specifically testing that the statistics for "different" Todo types are differentiated, or those tests will fail on a release build. (We could switch any such tests off in release build.) We can still build features making use of this differentiation, but they will only work in debug build. |
I think it's okay to make some statistics-related features only available if debug assertions are enabled. Statistics collection doesn't need to be that fast. |
AlexWaygood
left a comment
There was a problem hiding this comment.
The bug we had in our use of HashMap::extend() makes me wonder if it is worth using a "proper" reimplementation of Python's collections.Counter class, like https://docs.rs/counter/latest/counter/, rather than rolling our own version. We could make the additional dependency conditional on a statistics optional feature.
But for now, it still may not be worth the additional dependency.
| use rustc_hash::FxHashMap; | ||
|
|
||
| /// Get type-coverage statistics for a file. | ||
| #[salsa::tracked(return_ref)] |
There was a problem hiding this comment.
Do we expect this method to be called multiple times in the same revision and file for it to be a salsa query?
There was a problem hiding this comment.
Perhaps roughly equally likely as check_types, which is a query? It's very parallel to check_types, so seems reasonable to handle them the same way.
MichaReiser
left a comment
There was a problem hiding this comment.
Should this be gated to debug only builds? How is it intended to be used?
|
Depending on what this is used for: have you considered using Countme? E.g. by adding a count flag to every relevant type? |
Two potential uses are as part of ecosystem check, and as a user-facing feature to measure static type coverage. Former could perhaps be debug-gated (though we may want to run ecosystem check on release build for speed), latter cannot be.
I don't think countme works for this. If we have 20 expressions in a file typed as |
|
I'm sort of leaning towards reverting this change until we have a concrete usage that informs the design and avoids this just being dead code:
|
|
The idea was just to land an MVP that wasn't too complicated. Perhaps you're right and this really is too minimal to even be worth keeping in the source code, though.
probably not
What's the downside of having this level of precision?
We're aware. Again, this was meant to be just a simple MVP. I think even just knowing the number of "pure TODO types" we infer when running on |
More granular statistics have a higher runtime cost (performance and storage). The most simple representation -- and probably sufficient for this MVP -- is to only count the number of todo types. This can be done with a single usize. The next step is to store the type kind: Literal, Unknown, any. This can be done with a fixed array or vector. Counting the unique usages of each type (which this PR implements) requires a hash map which requires expensive hashing and is probably large. We should only use the hash map solution if we have a need for it -- which I don't think we currently do |
Something Alex and I threw together during our 1:1 this morning. Allows us to collect statistics on the prevalence of various types in a file, most usefully TODO types or other dynamic types.