-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[nvFuser] add torch.jit.fuser context manager #38993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1. nvFuser context manager facilitates switch to nvFuser over legacy fuser. 2. cleanup updated python tests.
💊 CI failures summary and remediationsAs of commit 9cefa09 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 14 times. |
soumith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to see torch.jit.nvFuser.
The public details or names of our different fuser backends shouldn't be exposed to the users.
Either rename torch.jit.nvFuser to torch.jit.fuser('fuser0') where fuser0 is the current default, fuser1 is nvFuser and fuser2 is NNC. That's one option.
Alternatively, ff you just want this for development / testing purposes, have something like with torch.jit._fuser('nv')
… & nnc as well as nvfuser in API
|
merge master again hoping the rocm build error will go away. |
soumith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small change. I see you are using _nv_fuser, can you make it _nvfuser, it is one word
|
rocm failure doesn't look like related to me and it doesn't want to go away after merging master. 😢 |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
FYI for future reference, the context manager was missing a docblock. Please make sure all added public API has docblocks. |
Summary: 1. `torch.jit.fuser(str)` context manager facilitates switch between backend fusers: str - 'fuser0' enables only legacy fuser; str - 'fuser1' enables only NNC; str - 'fuser2' enables only nvFuser; 2. cleanup updated python tests. Pull Request resolved: pytorch#38993 Reviewed By: nairbv, pbelevich Differential Revision: D21800620 Pulled By: soumith fbshipit-source-id: 7fe855f5a5b97368e5e84c98c28d04b2e1276c85
Summary: 1. `torch.jit.fuser(str)` context manager facilitates switch between backend fusers: str - 'fuser0' enables only legacy fuser; str - 'fuser1' enables only NNC; str - 'fuser2' enables only nvFuser; 2. cleanup updated python tests. Pull Request resolved: pytorch#38993 Reviewed By: nairbv, pbelevich Differential Revision: D21800620 Pulled By: soumith fbshipit-source-id: 7fe855f5a5b97368e5e84c98c28d04b2e1276c85
Summary: 1. `torch.jit.fuser(str)` context manager facilitates switch between backend fusers: str - 'fuser0' enables only legacy fuser; str - 'fuser1' enables only NNC; str - 'fuser2' enables only nvFuser; 2. cleanup updated python tests. Pull Request resolved: pytorch#38993 Reviewed By: nairbv, pbelevich Differential Revision: D21800620 Pulled By: soumith fbshipit-source-id: 7fe855f5a5b97368e5e84c98c28d04b2e1276c85
* Simplify a few test cases Replace custom exception checks with ASSERT_THROW macros. * ExpressionEvaluator * Stricter EvaluationContext binding rules 1. Don't allow overwriting concrete values 2. Don't allow binding values to expression results * Fix clang-format errors * Switch to Int::ScalarType The expression evaluator is now using Int::ScalarType instead of plain int. * Avoid a fight with clang-tidy * Check the numbers of kernel input and output parameters * Add an optional arc from TensorView to its root domain This is generated for detail_level >= DetailLevel::Explicit * Checks kernel arguments * Prefer pointers over references * Bug fix * Fix accidental construction of IValue * Use noReduction * Add const to const pointer * Make an integer tensor an error as it is not yet supported * clang-tidy * Incorporate review feedback * added lerp support in parser * add missing addcmul parser and tests * clang_format * Return TensorView* from binary/compound/ternary ops * clang-format * Use TensorView* param in reductionOp and sum * Prefer as instead of static_cast * Transform replay refactor (#53) Goal of this work is to have the transformation history be specific to IterDomains instead of TensorDomains. This should make it a lot easier to match up IterDomains during replay which can be complicated when taking into consideration reduction axes, rfactors, and broadcast axes. Co-authored-by: Jie <[email protected]> Co-authored-by: Kevin Stephano <[email protected]> * python test fixes (#52) fix python tests failure: 1. put Fusion inside cudaKernel to facilitate runtime arg check. 2. relax rank check for broadcast support in integration; 3. add shape propagation for newly added opeartion: [addcmul, lerp]; 4. adding utility function to create FusionGuard from CudaKernel directly. * [nvFuser] add torch.jit.fuser context manager (pytorch#38993) (#54) Summary: 1. `torch.jit.fuser(str)` context manager facilitates switch between backend fusers: str - 'fuser0' enables only legacy fuser; str - 'fuser1' enables only NNC; str - 'fuser2' enables only nvFuser; 2. cleanup updated python tests. Pull Request resolved: pytorch#38993 Reviewed By: nairbv, pbelevich Differential Revision: D21800620 Pulled By: soumith fbshipit-source-id: 7fe855f5a5b97368e5e84c98c28d04b2e1276c85 * Add another reduction example, change fusion printMath. * Small test fix. * Change Reduction4 test to use TIDx.x * Minor cleanup. * Clean up some noexcepts. * More cleanup. * Refactor computeAt, get first broadcast example working. * Validate first non-trivial broadcast kernel. * Fix replay when broadcast is merged with non-broadcast dim. * Add constness in replay and index compute. * Add another broadcast test. Rework index computation for producers, base on consumer computed indices. * Val isCconst fix. * Add dot product gemm example. * Clang. * Minor bug fixes. * Format and add comments to GEMM test. * WIP: Fix for enabling broadcast after reduction plus a Softmax test. (#66) * Fix for enabling broadcast after reduction plus a Softmax test. * Cleaner way of fixing checks for matching non-broadcast dims to non-reduction dims. * Clang. Co-authored-by: Kevin Stephano <[email protected]> Co-authored-by: Christian Sarofeen <[email protected]> * Backout bad merge conflict resolutions. * More post rebase cleanup. * Refix a few tests. Some from a bad rebase. * Address comments. * Missed some review comments. * tmp Co-authored-by: Lemo <[email protected]> Co-authored-by: Naoya Maruyama <[email protected]> Co-authored-by: Jie <[email protected]> Co-authored-by: Kevin Stephano <[email protected]> Co-authored-by: Kevin Stephano <[email protected]>
* Simplify a few test cases Replace custom exception checks with ASSERT_THROW macros. * ExpressionEvaluator * Stricter EvaluationContext binding rules 1. Don't allow overwriting concrete values 2. Don't allow binding values to expression results * Fix clang-format errors * Switch to Int::ScalarType The expression evaluator is now using Int::ScalarType instead of plain int. * Avoid a fight with clang-tidy * Check the numbers of kernel input and output parameters * Add an optional arc from TensorView to its root domain This is generated for detail_level >= DetailLevel::Explicit * Checks kernel arguments * Prefer pointers over references * Bug fix * Fix accidental construction of IValue * Use noReduction * Add const to const pointer * Make an integer tensor an error as it is not yet supported * clang-tidy * Incorporate review feedback * added lerp support in parser * add missing addcmul parser and tests * clang_format * Return TensorView* from binary/compound/ternary ops * clang-format * Use TensorView* param in reductionOp and sum * Prefer as instead of static_cast * Transform replay refactor (#53) Goal of this work is to have the transformation history be specific to IterDomains instead of TensorDomains. This should make it a lot easier to match up IterDomains during replay which can be complicated when taking into consideration reduction axes, rfactors, and broadcast axes. Co-authored-by: Jie <[email protected]> Co-authored-by: Kevin Stephano <[email protected]> * python test fixes (#52) fix python tests failure: 1. put Fusion inside cudaKernel to facilitate runtime arg check. 2. relax rank check for broadcast support in integration; 3. add shape propagation for newly added opeartion: [addcmul, lerp]; 4. adding utility function to create FusionGuard from CudaKernel directly. * [nvFuser] add torch.jit.fuser context manager (pytorch#38993) (#54) Summary: 1. `torch.jit.fuser(str)` context manager facilitates switch between backend fusers: str - 'fuser0' enables only legacy fuser; str - 'fuser1' enables only NNC; str - 'fuser2' enables only nvFuser; 2. cleanup updated python tests. Pull Request resolved: pytorch#38993 Reviewed By: nairbv, pbelevich Differential Revision: D21800620 Pulled By: soumith fbshipit-source-id: 7fe855f5a5b97368e5e84c98c28d04b2e1276c85 * Add another reduction example, change fusion printMath. * Small test fix. * Change Reduction4 test to use TIDx.x * Minor cleanup. * Clean up some noexcepts. * More cleanup. * Refactor computeAt, get first broadcast example working. * Validate first non-trivial broadcast kernel. * Fix replay when broadcast is merged with non-broadcast dim. * Add constness in replay and index compute. * Add another broadcast test. Rework index computation for producers, base on consumer computed indices. * Val isCconst fix. * Add dot product gemm example. * Clang. * Minor bug fixes. * Format and add comments to GEMM test. * WIP: Fix for enabling broadcast after reduction plus a Softmax test. (#66) * Fix for enabling broadcast after reduction plus a Softmax test. * Cleaner way of fixing checks for matching non-broadcast dims to non-reduction dims. * Clang. Co-authored-by: Kevin Stephano <[email protected]> Co-authored-by: Christian Sarofeen <[email protected]> * Backout bad merge conflict resolutions. * More post rebase cleanup. * Refix a few tests. Some from a bad rebase. * Address comments. * Missed some review comments. * tmp Co-authored-by: Lemo <[email protected]> Co-authored-by: Naoya Maruyama <[email protected]> Co-authored-by: Jie <[email protected]> Co-authored-by: Kevin Stephano <[email protected]> Co-authored-by: Kevin Stephano <[email protected]>
torch.jit.fuser(str)context manager facilitates switch between backend fusers:str - 'fuser0' enables only legacy fuser;
str - 'fuser1' enables only NNC;
str - 'fuser2' enables only nvFuser;