-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Move bf16_gemv_trans to ReducedPrecisionFloatGemvFastPathKernel #139081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rnel Following the previous move of fp16_gemv_trans. Differential Revision: [D64930872](https://our.internmc.facebook.com/intern/diff/D64930872/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139081
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 85dadbd with merge base 419a7e1 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D64930872 |
…vFastPathKernel" Following the previous move of fp16_gemv_trans. Differential Revision: [D64930872](https://our.internmc.facebook.com/intern/diff/D64930872/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64930872 |
…vFastPathKernel" Following the previous move of fp16_gemv_trans. Differential Revision: [D64930872](https://our.internmc.facebook.com/intern/diff/D64930872/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64930872 |
…vFastPathKernel" Following the previous move of fp16_gemv_trans. Differential Revision: [D64930872](https://our.internmc.facebook.com/intern/diff/D64930872/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64930872 |
…ernel" Following the previous move of fp16_gemv_trans. Testing: Checked for performance regression with llm_benchmarks' `python benchmarks/benchmark_torch_mm.py llm`, didn't find one Differential Revision: [D64930872](https://our.internmc.facebook.com/intern/diff/D64930872/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64930872 |
…ernel" Following the previous move of fp16_gemv_trans. Testing: Checked for performance regression with llm_benchmarks' `python benchmarks/benchmark_torch_mm.py llm`, didn't find one Differential Revision: [D64930872](https://our.internmc.facebook.com/intern/diff/D64930872/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64930872 |
…ernel" Following the previous move of fp16_gemv_trans. Testing: Checked for performance regression with llm_benchmarks' `python benchmarks/benchmark_torch_mm.py llm`, didn't find one Differential Revision: [D64930872](https://our.internmc.facebook.com/intern/diff/D64930872/) cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D64930872 |
|
some tests still in progress, but none of them are mps/aarch64 so I'm optimistic and re-requesting review. |
#139208) Very similar to #137917, but for bf16. Differential Revision: [D65155971](https://our.internmc.facebook.com/intern/diff/D65155971/) Pull Request resolved: #139208 Approved by: https://github.com/malfet ghstack dependencies: #139084, #139090, #139558, #139081
This is the big milestone for bf16 and should enable us to close pytorch/torchchat#1253 . Testing: ran python torchchat.py generate llama3.2-1b --dtype bf16 --device cpu on x86 machine with AVX512-bf16. observed similar tokens/sec with and without MKL path hand-disabled. Also observed speedup from ~2.1 tok/sec to 7.4 tok/sec on x86 machine with only AVX2. Differential Revision: [D65170967](https://our.internmc.facebook.com/intern/diff/D65170967/) Pull Request resolved: #139220 Approved by: https://github.com/malfet ghstack dependencies: #139084, #139090, #139558, #139081, #139208
…rch#139081) Following the previous move of fp16_gemv_trans. Testing: Checked for performance regression with llm_benchmarks' `python benchmarks/benchmark_torch_mm.py llm`, didn't find one Differential Revision: [D64930872](https://our.internmc.facebook.com/intern/diff/D64930872/) Pull Request resolved: pytorch#139081 Approved by: https://github.com/malfet ghstack dependencies: pytorch#139084, pytorch#139090, pytorch#139558
pytorch#139208) Very similar to pytorch#137917, but for bf16. Differential Revision: [D65155971](https://our.internmc.facebook.com/intern/diff/D65155971/) Pull Request resolved: pytorch#139208 Approved by: https://github.com/malfet ghstack dependencies: pytorch#139084, pytorch#139090, pytorch#139558, pytorch#139081
This is the big milestone for bf16 and should enable us to close pytorch/torchchat#1253 . Testing: ran python torchchat.py generate llama3.2-1b --dtype bf16 --device cpu on x86 machine with AVX512-bf16. observed similar tokens/sec with and without MKL path hand-disabled. Also observed speedup from ~2.1 tok/sec to 7.4 tok/sec on x86 machine with only AVX2. Differential Revision: [D65170967](https://our.internmc.facebook.com/intern/diff/D65170967/) Pull Request resolved: pytorch#139220 Approved by: https://github.com/malfet ghstack dependencies: pytorch#139084, pytorch#139090, pytorch#139558, pytorch#139081, pytorch#139208
Stack from ghstack (oldest at bottom):
Following the previous move of fp16_gemv_trans.
Testing: Checked for performance regression with llm_benchmarks'
python benchmarks/benchmark_torch_mm.py llm, didn't find oneDifferential Revision: D64930872
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10