[MPS] Fixes GELU, LeakyRELU and MISH on non-contiguous tensors #123049

jtang98 · 2024-03-31T11:35:15Z

Fixes GELU, LeakyRELU and MISH activation functions on non-contiguous tensors (for instance, when a transpose operation was applied on the tensors prior to the MPS operator), forward and backward passes.

I also extended tests on the 3 activation functions to check: full-precision and half-precision, contiguous and non-contiguous, and several dims of tensors: scalars, 1D, empty, 2D, > 3D.

I had issues with Mish and GELU activations when asserting the gradients vs. CPU with sum() on some cases, so I reverted to the previous setup by setting a gradient parameter on .backwards().
This PR also fixes an issue with LeakyRELU on empty tensors.

Fixes #98212 huggingface/transformers#22468 huggingface/transformers#19353

pytorch-bot · 2024-03-31T11:35:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123049

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit df59052 with merge base 14162ee ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

trunk / linux-focal-cuda12.1-py3.10-gcc9 / test (nogpu_AVX512, 1, 1, linux.2xlarge) (gh)
profiler/test_profiler.py::TestProfiler::test_source_multithreaded_multiple_preexisting_work_in_main_thread_False
trunk / linux-focal-cuda12.1-py3.10-gcc9 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge) (gh)
profiler/test_profiler.py::TestProfiler::test_source_multithreaded_multiple_preexisting_work_in_main_thread_False

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-03-31T11:35:19Z

The committers listed above are authorized under a signed CLA.

✅ login: jtang98 / name: Joel Tang (f06a2c1, fdf9049, 0bf1672, 76d3105, d93e71c, 5f3ec2b, ec8f70b, dca0754, 03d3eae, bd0721a, df59052, 60b48a2, e43d23d, 0f73b1f, 330394e, fc227c4)

jtang98 · 2024-04-08T09:37:51Z

@malfet @kulinseth Could you have a look? Thanks!

jtang98 · 2024-04-12T08:09:34Z

A test failed but it seems unrelated to this PR, should I rebase on viable/strict?

kulinseth · 2024-04-18T08:18:45Z

@pytorchbot --help

pytorch-bot · 2024-04-18T08:18:47Z

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. Some commands may be used on issues as specified below.

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label,drci,cherry-pick,close}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR
    drci                Update Dr. CI
    cherry-pick         Cherry pick a PR onto a release branch
    close               Close a PR

Merge

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
                        
                        Please use `-f` as last resort, prefer `--ignore-current` to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.
  -i, --ignore-current  Merge while ignoring the currently failing jobs.  Behaves like -f if there are no pending jobs.
  -ic                   Old flag for --ignore-current. Deprecated in favor of -i.
  -r [{viable/strict,main}], --rebase [{viable/strict,main}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or main as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
Repeat contributor may use this command to rebase their PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR or Issue [Can be used on Issues]

positional arguments:
  labels  Labels to add to given Pull Request or Issue [Can be used on Issues]

Dr CI

usage: @pytorchbot drci 

Update Dr. CI. Updates the Dr. CI comment on the PR in case it's gotten out of sync with actual CI results.

cherry-pick

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Cherry pick a pull request onto a release branch for inclusion in a release

optional arguments:
  --onto ONTO           Branch you would like to cherry pick onto (Example: release/2.1)
  --fixes FIXES         Link to the issue that your PR fixes (Example: https://github.com/pytorch/pytorch/issues/110666)
  -c {regression,critical,fixnewfeature,docs,release}, --classification {regression,critical,fixnewfeature,docs,release}
                        A machine-friendly classification of the cherry-pick reason.

Close

usage: @pytorchbot close

Close a PR [Can be used on issues]

kulinseth · 2024-04-18T08:19:11Z

@pytorchbot rebase

kulinseth · 2024-04-18T08:20:36Z

A test failed but it seems unrelated to this PR, should I rebase on viable/strict?

Test is indeed unrelated, i have kicked off a rebase.

pytorchmergebot · 2024-04-18T08:21:01Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

…ntiguous tensors

…RELU and Mish activations

pytorchmergebot · 2024-04-18T08:21:11Z

Successfully rebased fix_non_contiguous_activations onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix_non_contiguous_activations && git pull --rebase)

kulinseth · 2024-04-18T08:25:01Z

A test failed but it seems unrelated to this PR, should I rebase on viable/strict?

Test is indeed unrelated, i have kicked off a rebase.

I will keep an eye out, if there is still a problem, I will merge the PR. Otherwise if it looks green please go ahead and merge @jtang98

kulinseth · 2024-04-20T20:47:45Z

@pytorchbot merge -f “all tests are green”

pytorch-bot · 2024-04-20T20:47:48Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: tests are green”

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

kulinseth · 2024-04-20T20:48:30Z

@pytorchbot merge

pytorchmergebot · 2024-04-20T20:51:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ch#123049) Fixes GELU, LeakyRELU and MISH activation functions on non-contiguous tensors (for instance, when a transpose operation was applied on the tensors prior to the MPS operator), forward and backward passes. I also extended tests on the 3 activation functions to check: full-precision and half-precision, contiguous and non-contiguous, and several dims of tensors: scalars, 1D, empty, 2D, > 3D. I had issues with Mish and GELU activations when asserting the gradients vs. CPU with sum() on some cases, so I reverted to the previous setup by setting a gradient parameter on .backwards(). This PR also fixes an issue with LeakyRELU on empty tensors. Fixes pytorch#98212 huggingface/transformers#22468 huggingface/transformers#19353 Pull Request resolved: pytorch#123049 Approved by: https://github.com/kulinseth

Similar to pytorch#123049, however, `SiLU` also produces random values, `0.0`, or `NaN` as results if input tensor is not contiguous on prior to macOS 15.0.

Similar to #123049, however, `SiLU` also produces random values, `0.0`, or `NaN` as results if input tensor is not contiguous on prior to macOS 15.0. Orignally the problem was found at jy0205/Pyramid-Flow#113. Pull Request resolved: #139006 Approved by: https://github.com/malfet

Similar to pytorch#123049, however, `SiLU` also produces random values, `0.0`, or `NaN` as results if input tensor is not contiguous on prior to macOS 15.0. Orignally the problem was found at jy0205/Pyramid-Flow#113. Pull Request resolved: pytorch#139006 Approved by: https://github.com/malfet

pytorch-bot bot added the release notes: mps Release notes category label Mar 31, 2024

pytorchbot added the open source label Mar 31, 2024

jtang98 force-pushed the fix_non_contiguous_activations branch 2 times, most recently from 0ff6d76 to 3933af5 Compare March 31, 2024 15:07

jtang98 marked this pull request as ready for review April 2, 2024 09:19

jtang98 requested review from kulinseth and malfet as code owners April 2, 2024 09:19

jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 3, 2024

jtang98 marked this pull request as draft April 3, 2024 18:24

jtang98 marked this pull request as ready for review April 3, 2024 18:26

kulinseth approved these changes Apr 10, 2024

View reviewed changes

pytorch deleted a comment from pytorch-bot bot Apr 18, 2024

jtang98 added 9 commits April 18, 2024 08:21

Fixes GELU, LeakyRELU and MISH activation functions for MPS on non-co…

bd0721a

…ntiguous tensors

Added MPS tests for transpose / non-contiguous inputs for GELU, Leaky…

fdf9049

…RELU and Mish activations

Tests WIP

d93e71c

Tests pass but grads might be None

dca0754

LeakyRELU backward and forward OK with tests

76d3105

Tests and backwards OK, empty shapes not supported in backwards

60b48a2

Complete tests for 1D on Leaky RELU

ec8f70b

wip

f06a2c1

Target tests for LeakyRELU, GELU and Mish

0f73b1f

jtang98 added 6 commits April 18, 2024 08:21

Fix extended tests Mish

fc227c4

Fix GELU empty inputs

e43d23d

Non-passing tests cases for GELU and Mish, to be tested on main

0bf1672

clean tests

03d3eae

Lint tests and remove false-positives due to .sum gradient error

5f3ec2b

Harmonize LeakyRELU tests with GELU and Mish

df59052

pytorchmergebot force-pushed the fix_non_contiguous_activations branch from f0493f3 to df59052 Compare April 18, 2024 08:21

This was referenced Apr 18, 2024

addcdiv computes incorrect results on MPS with noncontiguous tensors #118115

Closed

Add Torch Check for addcdiv input to be contiguous #120272

Closed

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 20, 2024

pytorchmergebot added the merging label Apr 20, 2024

pytorchmergebot added the Merged label Apr 21, 2024

pytorchmergebot closed this in a6a3f2e Apr 21, 2024

pytorchmergebot removed the merging label Apr 21, 2024

niw added a commit to niw/pytorch that referenced this pull request Oct 27, 2024

[MPS] Fixes SiLU on non-contiguous tensors

9152988

Similar to pytorch#123049, however, `SiLU` also produces random values, `0.0`, or `NaN` as results if input tensor is not contiguous on prior to macOS 15.0.

niw mentioned this pull request Oct 27, 2024

[MPS] Fixes SiLU on non-contiguous tensors #139006

Closed

[MPS] Fixes GELU, LeakyRELU and MISH on non-contiguous tensors #123049

[MPS] Fixes GELU, LeakyRELU and MISH on non-contiguous tensors #123049

Uh oh!

Conversation

jtang98 commented Mar 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123049

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

linux-foundation-easycla bot commented Mar 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtang98 commented Apr 8, 2024

Uh oh!

jtang98 commented Apr 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kulinseth commented Apr 18, 2024

Uh oh!

pytorch-bot bot commented Apr 18, 2024

PyTorchBot Help

Merge

Revert

Rebase

Label

Dr CI

cherry-pick

Close

Uh oh!

kulinseth commented Apr 18, 2024

Uh oh!

kulinseth commented Apr 18, 2024

Uh oh!

pytorchmergebot commented Apr 18, 2024

Uh oh!

pytorchmergebot commented Apr 18, 2024

Uh oh!

kulinseth commented Apr 18, 2024

Uh oh!

kulinseth commented Apr 20, 2024

Uh oh!

pytorch-bot bot commented Apr 20, 2024

Uh oh!

kulinseth commented Apr 20, 2024

Uh oh!

pytorchmergebot commented Apr 20, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jtang98 commented Mar 31, 2024 •

edited

Loading

pytorch-bot bot commented Mar 31, 2024 •

edited

Loading

linux-foundation-easycla bot commented Mar 31, 2024 •

edited

Loading

jtang98 commented Apr 12, 2024 •

edited

Loading