Skip to content

Conversation

@colesbury
Copy link
Member

Previously, we were only enabling Flush-To-Zero (FTZ) and
Denormals-Are-Zero (DAZ) when compiling with SSE3 enabled. After,
Christian's patch (#12109) we
won't be compiling core files with SSE3 or SSE4 enabled, to better
support older AMD processors.

This moves the FTZ and DAZ code behind a runtime CPU check in
preparation for that change.

Previously, we were only enabling Flush-To-Zero (FTZ) and
Denormals-Are-Zero (DAZ) when compiling with SSE3 enabled. After,
Christian's patch (pytorch#12109) we
won't be compiling core files with SSE3 or SSE4 enabled, to better
support older AMD processors.

This moves the FTZ and DAZ code behind a runtime CPU check in
preparation for that change.
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

static constexpr unsigned int DENORMALS_ZERO = 0x0040;
static constexpr unsigned int FLUSH_ZERO = 0x8000;

bool set_flush_denormal(bool on) {

This comment was marked as off-topic.

This comment was marked as off-topic.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 5, 2018
Summary:
Previously, we were only enabling Flush-To-Zero (FTZ) and
Denormals-Are-Zero (DAZ) when compiling with SSE3 enabled. After,
Christian's patch (pytorch/pytorch#12109) we
won't be compiling core files with SSE3 or SSE4 enabled, to better
support older AMD processors.

This moves the FTZ and DAZ code behind a runtime CPU check in
preparation for that change.
Pull Request resolved: pytorch/pytorch#12386

Differential Revision: D10222237

Pulled By: colesbury

fbshipit-source-id: 7ffe32561ab965e1e5f9eb6e679602bbf4775538
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants