Skip to content

Conversation

@jamesr66a
Copy link
Collaborator

@jamesr66a jamesr66a commented Sep 21, 2018

-O0 is problematic for compiling sleef kernels since they consist of a bunch of vector intrinsics. In -O0, the compiler spills every intermediate value to the stack. In one example (TestEndToEndHybridFrontendModels.test_snli in test_jit.py) the function Sleef_tanhf8_u10avx2 would spill 30kB of AVX registers onto the stack and run two orders of magnitude slower than in opt mode, causing the test to take minutes rather than seconds. I've verified that this behavior is not present with -O1

Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@colesbury
Copy link
Member

This seems like a very good idea. I think we may want to do the same thing for everything in aten/src/ATen/native/cpu/.

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Sep 21, 2018

@colesbury - this should include that, no?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jamesr66a is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 21, 2018
Summary:
`-O0` is problematic for compiling sleef kernels since they consist of a bunch of vector intrinsics. In `-O0`, the compiler spills *every* intermediate value to the stack. In one example (TestEndToEndHybridFrontendModels.test_snli in test_jit.py) the function `Sleef_tanhf8_u10avx2` would spill 30kB of AVX registers onto the stack and run two orders of magnitude slower than in opt mode, causing the test to take minutes rather than seconds. I've verified that this behavior is not present with `-O1`
Pull Request resolved: pytorch/pytorch#11942

Differential Revision: D9994658

Pulled By: jamesr66a

fbshipit-source-id: cdd9474c6ae3aa9898d5715ac19a900f5f90468a
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants