Skip to content

Comments

Run mypy-primer with TY_MAX_PARALLELISM=1#19059

Closed
ibraheemdev wants to merge 3 commits intomainfrom
ibraheem/mypy-primer-determinism
Closed

Run mypy-primer with TY_MAX_PARALLELISM=1#19059
ibraheemdev wants to merge 3 commits intomainfrom
ibraheem/mypy-primer-determinism

Conversation

@ibraheemdev
Copy link
Member

Summary

The memory usage numbers are currently flaky partly due to memory usage being non-deterministic in multi-threaded runs. Running ty in single-threaded mode in the mypy-primer mode should fix this. From what I understand, mypy-primer has parallelism internally so this might not be significantly slower, but if it is then we'll need a separate CI job for memory usage.

@ibraheemdev ibraheemdev added ci Related to internal CI tooling ty Multi-file analysis & type inference labels Jul 1, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2025

mypy_primer results

Changes were detected when running on open source projects
freqtrade (https://github.com/freqtrade/freqtrade)
-     memo fields = ~304MB
+     memo fields = ~276MB

schemathesis (https://github.com/schemathesis/schemathesis)
- TOTAL MEMORY USAGE: ~156MB
+ TOTAL MEMORY USAGE: ~171MB

alerta (https://github.com/alerta/alerta)
- TOTAL MEMORY USAGE: ~117MB
+ TOTAL MEMORY USAGE: ~106MB

paasta (https://github.com/yelp/paasta)
-     memo fields = ~156MB
+     memo fields = ~171MB

scikit-learn (https://github.com/scikit-learn/scikit-learn)
- TOTAL MEMORY USAGE: ~652MB
+ TOTAL MEMORY USAGE: ~717MB

@ibraheemdev ibraheemdev marked this pull request as ready for review July 1, 2025 00:42
@ibraheemdev
Copy link
Member Author

ibraheemdev commented Jul 1, 2025

It looks like this is about a minute slower (4m as compared to 3m), which seems reasonable? A separate job would likely take more than a minute anyways.

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is about a minute slower (4m as compared to 3m), which seems reasonable? A separate job would likely take more than a minute anyways.

The increase in time seems reasonable to me given that cargo test takes around 4 minutes on Linux.

@dhruvmanila
Copy link
Member

Running ty in single-threaded mode in the mypy-primer mode should fix this.

I'm assuming this can only be verified after merging this PR?

@AlexWaygood
Copy link
Member

It looks like this is about a minute slower (4m as compared to 3m), which seems reasonable? A separate job would likely take more than a minute anyways.

It looks more like ~1m30s slower to me -- it looks like primer generally takes around 3m10s on main, and it takes 4m50s with this PR.

I really love how quickly we get feedback from mypy_primer on PRs currently -- it makes it very easy to quickly spot bugs and iterate on PRs. From that perspective, I would selfishly sort-of prefer the memory report to be a separate primer job that runs concurrently with the existing primer job, so that we can still get feedback on the impact on diagnostics as quickly as possible. I'd expect ty PRs to change the diagnostics being emitted much more often than they have a significant impact on memory usage, anyway.

Neither CI job would be marked as "required", so we would still be able to merge PRs even if one or neither job hadn't yet completed and we wanted to merge a trivial PR quickly. The only cost would be additional CI resources -- but I actually think that having primer reports be as quick as possible might be worth that cost.

Curious if @sharkdp has opinions

@sharkdp
Copy link
Contributor

sharkdp commented Jul 1, 2025

Curious if @sharkdp has opinions

I was hesitant to bring it up, but I fully agree with you. mypy_primer feedback is also a crucial development tool for me. I am often waiting for its output. So in that sense, every saved minute is a developer-experience improvement. But it's probably also a bit annoying to create a completely separate job for this.

@ibraheemdev
Copy link
Member Author

I guess the problem with this approach is that we're bottlenecked on the largest project instead of evenly spreading the load. I'm fine to make this a separate job if the speed is important. There's also probably some things we could do to speed it up as is.

@ibraheemdev ibraheemdev closed this Jul 2, 2025
ibraheemdev added a commit that referenced this pull request Jul 7, 2025
AlexWaygood pushed a commit that referenced this pull request Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Related to internal CI tooling ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants