Skip to content

[caching] Add determinism test to inductor config hash.#171275

Closed
zhxchen17 wants to merge 1 commit intomainfrom
zhxchen17/config_hash_fixture
Closed

[caching] Add determinism test to inductor config hash.#171275
zhxchen17 wants to merge 1 commit intomainfrom
zhxchen17/config_hash_fixture

Conversation

@zhxchen17
Copy link
Contributor

@zhxchen17 zhxchen17 commented Dec 24, 2025

Summary:
Following up a previous change #160174 where we
accidentally include cutlass path to config hash and regressed the cache hit rate, we
want to have some basic test added to pytorch to guard against accidental changes to
configs like this happening again in the future.

One way to solve this is to add detection for common patterns like paths, username
and hostname in the config values.

Test Plan:
pytest test/test_torch_config_hash_determinism.py

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171275

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1978158 with merge base 3d9ce8d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Dec 24, 2025
@zhxchen17 zhxchen17 marked this pull request as draft December 24, 2025 19:17
@zhxchen17 zhxchen17 force-pushed the zhxchen17/config_hash_fixture branch 2 times, most recently from 2bcb525 to 5956a09 Compare December 29, 2025 16:33
@zhxchen17 zhxchen17 force-pushed the zhxchen17/config_hash_fixture branch from 5956a09 to 5e1c8f5 Compare January 7, 2026 21:20
@zhxchen17 zhxchen17 marked this pull request as ready for review January 7, 2026 21:20
@masnesral
Copy link
Contributor

TBH, I'm not sure this test is gonna help that much for prevention. It just adds friction any time someone adds a new config option. They'll see a test failure, investigate, and very likely just run with EXPECTTEST_ACCEPT=1 without thinking through the implications (because they won't have the context). What we really want is to catch sources of non-determinism. In practice, those have always come from paths that vary among hosts. It would be nice if we could execute the test across machines and compare, but I doubt we can do that easily (either in OSS or internally). As an alternative, how would you feel about scanning the config options and just looking (heuristically) for any that look like a path or have the current machine or username somehow embedded in the config value?

@zhxchen17
Copy link
Contributor Author

zhxchen17 commented Jan 8, 2026

TBH, I'm not sure this test is gonna help that much for prevention. It just adds friction any time someone adds a new config option. They'll see a test failure, investigate, and very likely just run with EXPECTTEST_ACCEPT=1 without thinking through the implications (because they won't have the context). What we really want is to catch sources of non-determinism. In practice, those have always come from paths that vary among hosts. It would be nice if we could execute the test across machines and compare, but I doubt we can do that easily (either in OSS or internally). As an alternative, how would you feel about scanning the config options and just looking (heuristically) for any that look like a path or have the current machine or username somehow embedded in the config value?

@masnesral I tried to implement test with multi hosts. It's not clear whether it's justified to such a test with high complexity and questionable maintainability in the CI in the long term.

I agree it's a friction. We could implement other detection mechanisms for nondeterminism with much higher cost. This is a compromise to surface the changes made to config in the short term. Open to ideas if this could be addressed systematically in the long term.

I'm not sure whether dir path detection solves the problem of non-determinism in general. If something is dependent on the environment, it shouldn't matter if it's a dir path or not. In fact if we look at ignored factors here https://www.internalfb.com/code/fbsource/fbcode/caffe2/torch/_inductor/config.py only 1 out of 15 configs ignored there is a path, therefore I think the chance to directly address this SEV is very small if we only look at directory.

If the major concern is about friction, we can roll this out only with string fields and see whether it works.

@zhxchen17 zhxchen17 force-pushed the zhxchen17/config_hash_fixture branch 2 times, most recently from 68c6632 to 2308d60 Compare January 8, 2026 17:21
@meta-codesync
Copy link

meta-codesync bot commented Jan 8, 2026

@zhxchen17 has imported this pull request. If you are a Meta employee, you can view this in D90331815.

@zhxchen17
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 8, 2026
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@zhxchen17 zhxchen17 force-pushed the zhxchen17/config_hash_fixture branch from 2308d60 to 23910aa Compare January 8, 2026 19:33
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team Raised by workflow job

@masnesral
Copy link
Contributor

If the major concern is about friction, we can roll this out only with string fields and see whether it works.

No I'm mostly worried that we haven't actually solved the problem. It's possible we've made it worse: if someone "fixes" a future failing test by adding to the blocklist when they shouldn't then we risk serving illegal cache entries. I agree that detecting non-determinism across hosts is hard, so I wasn't advocating that. I was arguing it would be worthwhile to at least target a few patterns that we have experience as having been problematic: usernames in config options, hostnames in config options, paths in config names (specifically paths that look like they come from unpacking a .xar file).

Summary:
Following up a previous change #160174 where we
accidentally include cutlass path to config hash and regressed the cache hit rate, we
want to have some basic test added to pytorch to guard against accidental changes to
configs like this happening again in the future.

One way to solve this is to add detection for common patterns like paths, username
and hostname in the config values.

Test Plan:
pytest test/test_torch_config_hash_determinism.py

Reviewers:

Subscribers:

Tasks:

Tags:
@zhxchen17 zhxchen17 force-pushed the zhxchen17/config_hash_fixture branch from 23910aa to 1978158 Compare January 9, 2026 04:27
@zhxchen17 zhxchen17 changed the title [caching] Add fixture test to inductor config hash. [caching] Add determinism test to inductor config hash. Jan 9, 2026
@zhxchen17
Copy link
Contributor Author

if someone "fixes" a future failing test by adding to the blocklist

Yea this sounds a reasonable failure mode for this test.

We have experience as having been problematic: usernames in config options, hostnames in config options, paths in config names (specifically paths that look like they come from unpacking a .xar file).

Updated the PR to do this instead.

@masnesral
Copy link
Contributor

This looks nice. Thanks!

@zhxchen17
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

hinriksnaer pushed a commit to hinriksnaer/pytorch that referenced this pull request Jan 12, 2026
Summary:
Following up a previous change pytorch#160174 where we
  accidentally include cutlass path to config hash and regressed the cache hit rate, we
  want to have some basic test added to pytorch to guard against accidental changes to
  configs like this happening again in the future.

One way to solve this is to add detection for common patterns like paths, username
and hostname in the config values.

Test Plan:
pytest test/test_torch_config_hash_determinism.py

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: pytorch#171275
Approved by: https://github.com/bobrenjc93, https://github.com/masnesral
@github-actions github-actions bot deleted the zhxchen17/config_hash_fixture branch February 9, 2026 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants