Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Jan 24, 2025

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145579

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 77abf9b with merge base d6bea39 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Jan 24, 2025
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: 28df7aa
Pull Request resolved: #145579
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 02:50 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 02:50 Inactive
@ezyang ezyang added the topic: not user facing topic category label Jan 24, 2025
@ezyang
Copy link
Contributor Author

ezyang commented Jan 24, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 24, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 15:51 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 15:51 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 15:51 Inactive
@albanD albanD removed their request for review January 24, 2025 16:23
# For multiple, fused pointwise nodes, inductor will elide the intermediary upcasts and downcasts
# Typically this should be closer to fp64 ref numerics. However, it can be useful for debugging
# to emulate the eager numerics.
# Mode to emulate PyTorch eager numerics when doing lower precision compute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to what extent do you think we should have a flag (this one or another) for inductor to emulate all eager numerics? In particular, even if there is a potential perf cost?

One that came up recently was var_mean, which it looks like inductor lowers in a way that gives slightly different (but more accurate?) numerics:

import torch
torch._inductor.config.emulate_precision_casts = True

class GraphModule(torch.nn.Module):
    def forward(self, inp):
        return torch.ops.aten.var_mean.correction(inp, [3], correction = 0, keepdim = True)

from torch._dynamo.testing import rand_strided
inp = rand_strided((1, 256, 256, 144,), (9437184, 36864, 144, 1,), device='cuda:0', dtype=torch.float32)


m = GraphModule()
out_eager = m(inp)
out_ref = m(inp.to(dtype=torch.float64))
out_compile = torch.compile(m)(inp)

print(torch.allclose(out_eager[0], out_compile[0]))
print(torch.allclose(out_ref[0], out_compile[0].to(dtype=torch.float64)))
print(torch.allclose(out_ref[0], out_eager[0].to(dtype=torch.float64)))

print(torch.max(torch.abs(out_eager[0] - out_compile[0])))
print(torch.max(torch.abs(out_ref[0] - out_compile[0])))
print(torch.max(torch.abs(out_ref[0] - out_eager[0])))

# prints:
True
True
True
tensor(3.5763e-07, device='cuda:0')
tensor(2.3738e-07, device='cuda:0', dtype=torch.float64)
tensor(2.7881e-07, device='cuda:0', dtype=torch.float64)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be very useful lol

nWEIdia pushed a commit to nWEIdia/pytorch that referenced this pull request Jan 27, 2025
# and downcasting after. When two low precision operators are fused together,
# Inductor will elide the downcast-upcast pairs (effectively a precision
# truncation) that would occur between these two operators. Typically,
# Inductor's behavior should be closer to fp64 ref numerics. However, with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're going to expand maybe we should actually have a doc somewhere instead of just a longer config str

@github-actions github-actions bot deleted the gh/ezyang/3071/head branch March 1, 2025 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants