[ROCm][CK][Inductor] enable dynamic shapes for CK backend to gemm max autotune #133285

tenpercent · 2024-08-13T01:34:05Z

This PR enables dynamic shapes for the CK backend for gemm max autotune (see #125453).

This is achieved via unhardcoding the problem sizes from the template body and passing them as parameters instead.

We handle passing the problem sizes for the kernel call as well as for the benchmark call.

Testing

pytest test/inductor/test_ck_backend.py [-k dynamic]

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @zjing14

pytorch-bot · 2024-08-13T01:34:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133285

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 1ef26dc with merge base 3965f11 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

linux-binary-manywheel / manywheel-py3_8-cuda12_4-test / test (gh) (detected as infra flaky with no log or failing log classifier)
rocm / linux-focal-rocm6.1-py3.8 / test (default, 1, 6, linux.rocm.gpu.2) (gh) (similar failure)
inductor/test_flex_decoding.py::TestFlexDecoding::test_load_from_bias_head_seq_batch_float16
rocm / linux-focal-rocm6.1-py3.8 / test (default, 6, 6, linux.rocm.gpu.2) (gh) (similar failure)
inductor/test_b2b_gemm.py::B2BGEMMTest::test_b2b_gemm_trivial_right_assoc_good_shape

This comment was automatically generated by Dr. CI and updates every 15 minutes.

tenpercent · 2024-08-13T22:28:50Z

@pytorchbot rebase -s

pytorchmergebot · 2024-08-13T22:30:25Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-08-13T22:30:29Z

Successfully rebased ck-unhardcode-mm-size onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ck-unhardcode-mm-size && git pull --rebase)

ColinPeppler · 2024-08-13T23:07:47Z

test/inductor/test_ck_backend.py

Might be worth it to test with a different sized a like this?

new_a = torch.randn(2345, 256, **tensor_options) Yy = mm(new_a, b)

ColinPeppler · 2024-08-13T23:12:48Z

torch/_inductor/codegen/rocm/ck_universal_gemm_template.py

qq: what doessympy.expand do? I'm thinking if there's a better alternative in our big dynamic shape library =p

for example, there's this from the PT2 core library:

pytorch/torch/fx/experimental/symbolic_shapes.py

Line 4679 in c17d26c

def simplify(self, expr: "sympy.Expr") -> "sympy.Expr":

there's also this in Inductor which is very similar to above:

pytorch/torch/_inductor/sizevars.py

Lines 91 to 92 in c17d26c

def simplify(self, expr: Expr):

return sympy.expand(expr).xreplace(self.replacements)

I think I took expand from some part of triton kernel code. Not sure which method here is actually correct. From the docs, it transforms polynomial expressions to their canonical form https://docs.sympy.org/latest/tutorials/intro-tutorial/simplification.html#expand

Okay I think it's okay to use sympy.expand then.

If this is used by codegen then we may benefit from sizevar's simplify which will substitute all symbols in self.replacements to make expr more canonical.

Looks like it would make sense to move the simplification to only kernel call site

ColinPeppler · 2024-08-13T23:21:41Z

torch/_inductor/codegen/rocm/rocm_kernel.py

qq: what should size_args look like? does it look like smth c_int(s0) or c_int(123)?

I think with the size_hint call here, size_args would always be a scalar even when it's symbolic?

pytorch/torch/_inductor/codegen/rocm/rocm_template.py

Lines 93 to 101 in 9de023d

extra_args = V.graph.sizevars.size_hints(

map(sympy.expand, call_args[len(expected_args) :])

)

# create the BenchmarkRequest

bmreq = ROCmBenchmarkRequest(

kernel_name=kernel_name,

input_tensor_meta=TensorMeta.from_irnodes(self.input_nodes),

output_tensor_meta=TensorMeta.from_irnodes(self.output_node),

extra_args=extra_args,

size_args is a list of [M, N, K, LDA, LDB, LDC, LDD]. They all need to be scalars. For the kernel call, the scalar may look like c_int(s0) as s0 is obtained in the wrapper from one of the call args. For the benchmark call it should look like c_int(123) where 123 is the result of evaluating the size hint. Not sure if hinting always produces a scalar, we can provide a fallback in case it's not

I see, thanks for explaining, so we need scalar here.

Then this makes sense! size_hint(x) should produce scalar if x is a scalar or backed symint. It won't if x is an unbacked symint, but less common.

tenpercent · 2024-08-14T16:57:10Z

@pytorchbot rebase -s

pytorchmergebot · 2024-08-14T16:58:34Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-08-14T16:58:36Z

Successfully rebased ck-unhardcode-mm-size onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ck-unhardcode-mm-size && git pull --rebase)

ColinPeppler · 2024-08-15T01:04:19Z

torch/_inductor/codegen/rocm/rocm_template.py

+            self.size_args() if hasattr(self, "size_args") else ()
+        )  # subclass should define def size_args()
+        size_args_ints = [
+            V.graph.sizevars.symbolic_hint(arg) for arg in size_args


I think sizevars.size_hint is the more preferred method here since it's more unbacked symint friendly =)

ColinPeppler · 2024-08-15T01:04:59Z

Looks pretty good!

tenpercent · 2024-08-15T15:05:12Z

@pytorchbot rebase -s

pytorchmergebot · 2024-08-15T15:06:39Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-08-15T15:06:42Z

Successfully rebased ck-unhardcode-mm-size onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ck-unhardcode-mm-size && git pull --rebase)

ColinPeppler · 2024-08-15T22:58:52Z

test/inductor/test_ck_backend.py

+        Test matmul with dynamic shapes
+        """
+
+        torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False


Just wondering if this was disabled to avoid certain ROCm kernels.

Well, it's here to avoid numeric mismatches. Since the CK kernels do the computation with fp32 dtype, I just hope this setting enables fp32 accumulation in the aten counterpart

I think you're right if tf32 is enabled then should do fp32 accum

Well, it's here to avoid numeric mismatches. Since the CK kernels do the computation with fp32 dtype, I just hope this setting enables fp32 accumulation in the aten counterpart

Btw, this setting doesn't do what you think it does. Aten always does accumulation in fp32. This setting simply allows aten to truncate to fp16 intermittently for things like fp16.

tenpercent · 2024-08-15T23:08:36Z

@pytorchbot merge

pytorchmergebot · 2024-08-15T23:10:26Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

tenpercent · 2024-08-16T00:08:10Z

@pytorchbot label "topic: not user facing"

tenpercent · 2024-08-16T02:34:09Z

@pytorchbot merge

pytorchmergebot · 2024-08-16T02:36:01Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm module: inductor module: rocm AMD GPU support for Pytorch labels Aug 13, 2024

tenpercent changed the title ~~[ROCm][Indoctor][Draft] enable dynamic shapes~~ [ROCm][Inductor][Draft] enable dynamic shapes Aug 13, 2024

pytorchbot added the open source label Aug 13, 2024

tenpercent force-pushed the ck-unhardcode-mm-size branch 4 times, most recently from 92ab278 to 34035a1 Compare August 13, 2024 01:56

pytorchmergebot force-pushed the ck-unhardcode-mm-size branch from 34035a1 to 686936c Compare August 13, 2024 22:30

ColinPeppler reviewed Aug 13, 2024

View reviewed changes

tenpercent force-pushed the ck-unhardcode-mm-size branch from 686936c to 2a40ba0 Compare August 14, 2024 03:59

pytorchmergebot force-pushed the ck-unhardcode-mm-size branch from 2a40ba0 to 23be5a7 Compare August 14, 2024 16:58

ColinPeppler reviewed Aug 15, 2024

View reviewed changes

tenpercent marked this pull request as ready for review August 15, 2024 03:00

tenpercent changed the title ~~[ROCm][Inductor][Draft] enable dynamic shapes~~ [ROCm][CK][Inductor] enable dynamic shapes for CK backend to gemm max autotune Aug 15, 2024

enable dynamic shapes

11b928a

tenpercent added 4 commits August 15, 2024 15:06

lints

6b56657

{symbolic->size}_hint

3deeee0

refactor sizee simplify and remove dead code

6db0227

remove dead code

b1e83fd

pytorchmergebot force-pushed the ck-unhardcode-mm-size branch from 698570b to b1e83fd Compare August 15, 2024 15:06

tenpercent added 2 commits August 15, 2024 15:49

lint

6f99c26

disable dynamic shape check for addmm

1ef26dc

ColinPeppler reviewed Aug 15, 2024

View reviewed changes

ColinPeppler approved these changes Aug 15, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 15, 2024

pytorchmergebot added the merging label Aug 15, 2024

pytorchmergebot removed the merging label Aug 15, 2024

pytorch-bot bot added the topic: not user facing topic category label Aug 16, 2024

pytorchmergebot added the merging label Aug 16, 2024

pytorchmergebot added the Merged label Aug 16, 2024

pytorchmergebot closed this in 3d45717 Aug 16, 2024

pytorchmergebot removed the merging label Aug 16, 2024

	def simplify(self, expr: Expr):
	return sympy.expand(expr).xreplace(self.replacements)

	extra_args = V.graph.sizevars.size_hints(
	map(sympy.expand, call_args[len(expected_args) :])
	)
	# create the BenchmarkRequest
	bmreq = ROCmBenchmarkRequest(
	kernel_name=kernel_name,
	input_tensor_meta=TensorMeta.from_irnodes(self.input_nodes),
	output_tensor_meta=TensorMeta.from_irnodes(self.output_node),
	extra_args=extra_args,

[ROCm][CK][Inductor] enable dynamic shapes for CK backend to gemm max autotune #133285

[ROCm][CK][Inductor] enable dynamic shapes for CK backend to gemm max autotune #133285

Uh oh!

Conversation

tenpercent commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

pytorch-bot bot commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133285

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

tenpercent commented Aug 13, 2024

Uh oh!

pytorchmergebot commented Aug 13, 2024

Uh oh!

pytorchmergebot commented Aug 13, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tenpercent Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ColinPeppler Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tenpercent Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tenpercent commented Aug 14, 2024

Uh oh!

pytorchmergebot commented Aug 14, 2024

Uh oh!

pytorchmergebot commented Aug 14, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ColinPeppler commented Aug 15, 2024

Uh oh!

tenpercent commented Aug 15, 2024

Uh oh!

pytorchmergebot commented Aug 15, 2024

Uh oh!

pytorchmergebot commented Aug 15, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tenpercent commented Aug 15, 2024

Uh oh!

pytorchmergebot commented Aug 15, 2024

Merge failed

Uh oh!

tenpercent commented Aug 16, 2024

Uh oh!

tenpercent commented Aug 16, 2024

Uh oh!

pytorchmergebot commented Aug 16, 2024

Merge started

Uh oh!

Reviewers

Assignees

tenpercent commented Aug 13, 2024 •

edited

Loading

pytorch-bot bot commented Aug 13, 2024 •

edited

Loading

tenpercent Aug 14, 2024 •

edited

Loading

ColinPeppler Aug 13, 2024 •

edited

Loading

tenpercent Aug 14, 2024 •

edited

Loading