Skip to content

Conversation

@masahi
Copy link
Member

@masahi masahi commented Jul 14, 2023

  • Allow capturing input buffers passed from a user if num_input attribute is set. The last func->params.size() - num_inputs inputs are assumed to be fixed and thus they can be captured into a cuda graph. Users need to be careful if they intend to pass different parameter tensors, for example in LoRa deployment.

  • Cache the instantiated exec object rather than the captured graph, since we assume that the graph is fixed anyway. I found that calling cudaGraphLaunch on every launch is super expensive.

  • Support CUDA graph for CUTLASS BYOC.

@vinx13

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jul 14, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@masahi masahi merged commit 783b467 into apache:unity Jul 14, 2023
junrushao pushed a commit that referenced this pull request Jul 18, 2023
* allow capturing input parameters in a cuda graph

* remove unnecessary cudaGraphLaunch

* support cuda graph for cutlass

* add test

* add test for cutlass

* revert LiftTransformParams change

* comment

* update test

* update builtin

* update

* delete exec properly

* run cuda graph twice in the test to make sure cached launch works
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants