Skip to content

Conversation

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139208

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a56927d with merge base 419a7e1 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 29, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65155971

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65155971

@swolchok swolchok changed the title [PyTorch] Build bf16 gemv fast path & entry points for non-ARM architectures too Build bf16 gemv fast path & entry points for non-ARM architectures too Nov 2, 2024
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 2, 2024
…tectures too"


Very similar to #137917, but for bf16.

Differential Revision: [D65155971](https://our.internmc.facebook.com/intern/diff/D65155971/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65155971

…tectures too"


Very similar to #137917, but for bf16.

Differential Revision: [D65155971](https://our.internmc.facebook.com/intern/diff/D65155971/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65155971

…tectures too"


Very similar to #137917, but for bf16.

Differential Revision: [D65155971](https://our.internmc.facebook.com/intern/diff/D65155971/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65155971

…tectures too"


Very similar to #137917, but for bf16.

Differential Revision: [D65155971](https://our.internmc.facebook.com/intern/diff/D65155971/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65155971

…tectures too"


Very similar to #137917, but for bf16.

Differential Revision: [D65155971](https://our.internmc.facebook.com/intern/diff/D65155971/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65155971

…tectures too"


Very similar to #137917, but for bf16.

Differential Revision: [D65155971](https://our.internmc.facebook.com/intern/diff/D65155971/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65155971

// https://godbolt.org/z/z8P4Yncra
#define COMPILER_SUPPORTS_BF16_TARGET 1
#elif !defined(__clang__) && defined(__GNUC__) && __GNUC__ >= 10
#elif defined(__aarch64__) && !defined(CPU_CAPABILITY_SVE) && !defined(__clang__) && defined(__GNUC__) && __GNUC__ >= 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to say compiler_capabilites header which is included form here, that has a table on top that explains which compiler versions supports what

pytorchmergebot pushed a commit that referenced this pull request Nov 8, 2024
This is the big milestone for bf16 and should enable us to close pytorch/torchchat#1253 .

Testing: ran python torchchat.py generate llama3.2-1b --dtype bf16 --device cpu on x86 machine with AVX512-bf16. observed similar tokens/sec with and without MKL path hand-disabled. Also observed speedup from ~2.1 tok/sec to 7.4 tok/sec on x86 machine with only AVX2.

Differential Revision: [D65170967](https://our.internmc.facebook.com/intern/diff/D65170967/)
Pull Request resolved: #139220
Approved by: https://github.com/malfet
ghstack dependencies: #139084, #139090, #139558, #139081, #139208
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
This is the big milestone for bf16 and should enable us to close pytorch/torchchat#1253 .

Testing: ran python torchchat.py generate llama3.2-1b --dtype bf16 --device cpu on x86 machine with AVX512-bf16. observed similar tokens/sec with and without MKL path hand-disabled. Also observed speedup from ~2.1 tok/sec to 7.4 tok/sec on x86 machine with only AVX2.

Differential Revision: [D65170967](https://our.internmc.facebook.com/intern/diff/D65170967/)
Pull Request resolved: pytorch#139220
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#139084, pytorch#139090, pytorch#139558, pytorch#139081, pytorch#139208
@github-actions github-actions bot deleted the gh/swolchok/683/head branch December 9, 2024 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/linux-aarch64 linux aarch64 CI workflow ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: cpu CPU specific problem (e.g., perf, algorithm) topic: performance topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants