Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR fixes a bug of the previous decode-GeMV dlight scheduling.

Previously, when the inner dimension of the largest tensor is spatial, in the end the fused epilogue block was not bound to any thread axis, which is wrong and will generate wrong GPU code with wrong numerical results. That is because after doing reverse-compute-at of the epilogue block, there are at lease one remaining spatial axis, and such axis is supposed to be bound to threadIdx.

This PR fixes this issue, and add three test cases which can cover both the reduction-inner and spatial-inner cases with or without broadcasting.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jul 16, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

…casting

This PR fixes a bug of the previous decode-GeMV dlight scheduling.

Previously, when the inner dimension of the largest tensor is spatial,
in the end the fused epilogue block was not bound to any thread axis,
which is wrong and will generate wrong GPU code with wrong numerical
results. That is because after doing reverse-compute-at of the epilogue
block, there are at lease one remaining spatial axis, and such axis
is supposed to be bound to threadIdx.

This PR fixes this issue, and add three test cases which can cover
both the reduction-inner and spatial-inner cases with or without
broadcasting.
@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2023-07-15-dlight-decode-gemv-spatial-inner branch from 3a31781 to 4c04c40 Compare July 16, 2023 10:12
@tqchen tqchen merged commit cf401bc into apache:unity Jul 16, 2023
junrushao pushed a commit that referenced this pull request Jul 18, 2023
…casting (#15330)

This PR fixes a bug of the previous decode-GeMV dlight scheduling.

Previously, when the inner dimension of the largest tensor is spatial,
in the end the fused epilogue block was not bound to any thread axis,
which is wrong and will generate wrong GPU code with wrong numerical
results. That is because after doing reverse-compute-at of the epilogue
block, there are at lease one remaining spatial axis, and such axis
is supposed to be bound to threadIdx.

This PR fixes this issue, and add three test cases which can cover
both the reduction-inner and spatial-inner cases with or without
broadcasting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants