[NAdam] Add capturable API and tests + fix differentiable #106615

janeyx99 · 2023-08-04T15:18:14Z

This PR:

adds a capturable API for NAdam similar to Adam(W)
adds tests accordingly
discovered and fixed bugs in the differentiable implementation (now tested through the capturable codepath).

cc @mlazos -- once this lands you should be able to build on top of this implementation.

Stack from ghstack (oldest at bottom):

-> [NAdam] Add capturable API and tests + fix differentiable #106615

[ghstack-poisoned]

pytorch-bot · 2023-08-04T15:18:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106615

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b2558de:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: 98bc871 Pull Request resolved: #106615

janeyx99 · 2023-08-05T01:47:09Z

torch/optim/nadam.py

            mu_product_next = mu_product * mu_next
            grad = grad * (-lr * (1. - mu) / (1. - mu_product))
-            exp_avg = grad * (-lr * (1. - mu_next) / (1. - mu_product_next))
+            exp_avg = exp_avg * (-lr * mu_next / (1. - mu_product_next))


This was actually just incorrect before. It is on my mind to add differentiable correctness tests as a part of the test revamp that I will get to one day.

crcrpar · 2023-08-05T12:08:32Z

torch/optim/nadam.py

                if len(state) == 0:
-                    state['step'] = torch.tensor(0.)
-                    state['mu_product'] = torch.tensor(1.)
+                    # note(crcrpar): [special device hosting for step]


is note(crcrpar) typo or something?

no it's copied from here: https://github.com/pytorch/pytorch/blob/e35cb480f4df1cf440b8705c93546c1b15891a4b/torch/optim/adam.py#L88C1-L90C82

The same reasoning applies here

haha we can remove the note if you want your authorship to not be attached there @crcrpar

was wondering if sth like see note - [special device hosting for step] happens to be a little bit clearer. I just felt it a bit surprising to see my ID as a comment author while the comment itself is "author"ed by someone else

albanD · 2023-08-07T14:17:45Z

test/optim/test_optim.py

+                if optimizer_constructor.__name__ == "NAdam":
+                    # with capturable in NAdam, we have 3 extra intermediates for the
+                    # bias_correction, mus, and mu_nexts
+                    nintermediates = 5


Hmmm, the above comment has 2 extra leading to 3. And this one has 3 extra leading to 5?

nadam needs 2 intermediates to start

Ho ok. The comments are just a bit confusing then :D

albanD · 2023-08-07T14:20:29Z

torch/optim/nadam.py

+        if capturable:
+            step = step_t
+        else:
+            step = _get_value(step_t)


Ho, should _get_value handle capturable?

One followup from using capturable is that the noncapturable path should not use _get_value or any PT2 related differing code (there's also _dispatch_sqrt). I was thinking of doing that in another PR in case this assumption that the PT2 path should always go in capturable impl does not actually hold. ccing @mlazos who may have better context here!

albanD · 2023-08-07T14:25:28Z

torch/optim/nadam.py

+            denom = torch._foreach_sub(grouped_mu_products, 1.0)
+            torch._foreach_neg_(denom)


If only we had torch._foreach_rsub that matches the existing torch.rsub() :D

albanD

All sounds good!

janeyx99 · 2023-08-07T17:54:22Z

@pytorchbot merge

pytorchmergebot · 2023-08-07T17:56:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

huydhn · 2023-08-09T01:01:06Z

@janeyx99 It looks like test_optim.py::TestOptim::test_multi_tensor_optimizers_with_varying_tensors is failing on multigpu with Tensor-likes are not close error after this change https://hud.pytorch.org/pytorch/pytorch/commit/0208574db95720a2569004114d323e922f46716d. Could you help take a look?

janeyx99 · 2023-08-09T14:07:50Z

Yes working on a fix rn

Forward fixes #106615 by increasing tolerance in the test. The capturable implementation for foreach simply varies due to a different order of operations when updating params. I had also attempted to compare against fp64 but that introduced more disparity in the other optimizer configs. It is worth trying the fp64 comparison at a later point, but let's get the test passing first. Pull Request resolved: #106887 Approved by: https://github.com/izaitsevfb

…6615) This PR: - adds a capturable API for NAdam similar to Adam(W) - adds tests accordingly - discovered and fixed bugs in the differentiable implementation (now tested through the capturable codepath). Pull Request resolved: pytorch#106615 Approved by: https://github.com/albanD

[NAdam] Add capturable API and tests

22486dd

[ghstack-poisoned]

janeyx99 requested a review from albanD as a code owner August 4, 2023 15:18

pytorch-bot bot added the release notes: optimizer Relating to optimizers, torch.optim label Aug 4, 2023

janeyx99 marked this pull request as draft August 4, 2023 15:20

janeyx99 changed the title ~~[NAdam] Add capturable API and tests~~ [WIP][NAdam] Add capturable API and tests Aug 4, 2023

Update on "[WIP][NAdam] Add capturable API and tests"

b2558de

[ghstack-poisoned]

janeyx99 added a commit that referenced this pull request Aug 5, 2023

[NAdam] Add capturable API and tests

1e77c49

ghstack-source-id: 98bc871 Pull Request resolved: #106615

janeyx99 changed the title ~~[WIP][NAdam] Add capturable API and tests~~ [NAdam] Add capturable API and tests + fix differentiable Aug 5, 2023

janeyx99 added topic: new features topic category topic: bug fixes topic category labels Aug 5, 2023

janeyx99 commented Aug 5, 2023

View reviewed changes

janeyx99 marked this pull request as ready for review August 5, 2023 01:47

janeyx99 requested a review from mlazos August 5, 2023 01:51

crcrpar reviewed Aug 5, 2023

View reviewed changes

albanD reviewed Aug 7, 2023

View reviewed changes

janeyx99 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 7, 2023

albanD approved these changes Aug 7, 2023

View reviewed changes

pytorchmergebot added the merging label Aug 7, 2023

pytorchmergebot added Merged and removed merging labels Aug 7, 2023

pytorchmergebot closed this in 0208574 Aug 7, 2023

janeyx99 mentioned this pull request Aug 9, 2023

[forward-fix] Fix multigpu varying tensor optim tests #106887

Closed

facebook-github-bot deleted the gh/janeyx99/78/head branch August 11, 2023 14:17

janeyx99 mentioned this pull request Aug 25, 2023

Add capturable ASGD impl #107857

Closed

janeyx99 mentioned this pull request Oct 9, 2023

feat(optimizer): Adagrad will use device when capturable - True always when compiling with dynamo #110339

Closed

janeyx99 mentioned this pull request Jan 22, 2024

Enable all optimizers (except SparseAdam + LBFGS) to be capturable #118018

Open

		denom = torch._foreach_sub(grouped_mu_products, 1.0)
		torch._foreach_neg_(denom)

[NAdam] Add capturable API and tests + fix differentiable #106615

[NAdam] Add capturable API and tests + fix differentiable #106615

Uh oh!

Conversation

janeyx99 commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106615

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

janeyx99 commented Aug 7, 2023

Uh oh!

pytorchmergebot commented Aug 7, 2023

Merge started

Uh oh!

huydhn commented Aug 9, 2023

Uh oh!

janeyx99 commented Aug 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

janeyx99 commented Aug 4, 2023 •

edited

Loading

pytorch-bot bot commented Aug 4, 2023 •

edited

Loading