[caching] Add determinism test to inductor config hash.#171275
[caching] Add determinism test to inductor config hash.#171275
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171275
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1978158 with merge base 3d9ce8d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
2bcb525 to
5956a09
Compare
5956a09 to
5e1c8f5
Compare
|
TBH, I'm not sure this test is gonna help that much for prevention. It just adds friction any time someone adds a new config option. They'll see a test failure, investigate, and very likely just run with |
@masnesral I tried to implement test with multi hosts. It's not clear whether it's justified to such a test with high complexity and questionable maintainability in the CI in the long term. I agree it's a friction. We could implement other detection mechanisms for nondeterminism with much higher cost. This is a compromise to surface the changes made to config in the short term. Open to ideas if this could be addressed systematically in the long term. I'm not sure whether dir path detection solves the problem of non-determinism in general. If something is dependent on the environment, it shouldn't matter if it's a dir path or not. In fact if we look at ignored factors here https://www.internalfb.com/code/fbsource/fbcode/caffe2/torch/_inductor/config.py only 1 out of 15 configs ignored there is a path, therefore I think the chance to directly address this SEV is very small if we only look at directory. If the major concern is about friction, we can roll this out only with string fields and see whether it works. |
68c6632 to
2308d60
Compare
|
@zhxchen17 has imported this pull request. If you are a Meta employee, you can view this in D90331815. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
2308d60 to
23910aa
Compare
Merge failedReason: New commits were pushed while merging. Please rerun the merge command. Details for Dev Infra teamRaised by workflow job |
No I'm mostly worried that we haven't actually solved the problem. It's possible we've made it worse: if someone "fixes" a future failing test by adding to the blocklist when they shouldn't then we risk serving illegal cache entries. I agree that detecting non-determinism across hosts is hard, so I wasn't advocating that. I was arguing it would be worthwhile to at least target a few patterns that we have experience as having been problematic: usernames in config options, hostnames in config options, paths in config names (specifically paths that look like they come from unpacking a .xar file). |
Summary: Following up a previous change #160174 where we accidentally include cutlass path to config hash and regressed the cache hit rate, we want to have some basic test added to pytorch to guard against accidental changes to configs like this happening again in the future. One way to solve this is to add detection for common patterns like paths, username and hostname in the config values. Test Plan: pytest test/test_torch_config_hash_determinism.py Reviewers: Subscribers: Tasks: Tags:
23910aa to
1978158
Compare
Yea this sounds a reasonable failure mode for this test.
Updated the PR to do this instead. |
|
This looks nice. Thanks! |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Following up a previous change pytorch#160174 where we accidentally include cutlass path to config hash and regressed the cache hit rate, we want to have some basic test added to pytorch to guard against accidental changes to configs like this happening again in the future. One way to solve this is to add detection for common patterns like paths, username and hostname in the config values. Test Plan: pytest test/test_torch_config_hash_determinism.py Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#171275 Approved by: https://github.com/bobrenjc93, https://github.com/masnesral
Summary:
Following up a previous change #160174 where we
accidentally include cutlass path to config hash and regressed the cache hit rate, we
want to have some basic test added to pytorch to guard against accidental changes to
configs like this happening again in the future.
One way to solve this is to add detection for common patterns like paths, username
and hostname in the config values.
Test Plan:
pytest test/test_torch_config_hash_determinism.py
Reviewers:
Subscribers:
Tasks:
Tags:
Fixes #ISSUE_NUMBER