Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Allow capturing input buffers passed from a user if
num_inputattribute is set. The lastfunc->params.size() - num_inputsinputs are assumed to be fixed and thus they can be captured into a cuda graph. Users need to be careful if they intend to pass different parameter tensors, for example in LoRa deployment.Cache the instantiated
execobject rather than the captured graph, since we assume that the graph is fixed anyway. I found that callingcudaGraphLaunchon every launch is super expensive.Support CUDA graph for CUTLASS BYOC.
@vinx13