Skip to content

rocm: autodetect default amdgpu_target#35233

Open
cgmb wants to merge 1 commit intospack:developfrom
cgmb:amdgpu-target-default
Open

rocm: autodetect default amdgpu_target#35233
cgmb wants to merge 1 commit intospack:developfrom
cgmb:amdgpu-target-default

Conversation

@cgmb
Copy link
Copy Markdown
Contributor

@cgmb cgmb commented Jan 30, 2023

We can query the driver to see what hardware is available and use that to inform the default variant value. If the query fails for any reason, then the default will be "none", just like it was before hardware autodetect was added.

This should hold us over until archspec can properly detect the amdgpu_target (which was mentioned in the related issue). Setting the default amdgpu_target variant value with a reasonable guess is not a perfect solution, but it seems like a strict improvement over defaulting to "none". I'm not the right person for work on archspec, so I hope you don't mind too much that I'm only contributing a stopgap.

We can query the driver to see what hardware is available and use that
to inform the default variant value. If the query fails for any reason,
then the default will be "none", just like it was before hardware
autodetect was added.

This is a bit of a holdover until archspec can properly detect the
amdgpu_target [1]. Setting the default amdgpu_target variant value with
a reasonable guess is not a perfect solution, but it seems like a strict
improvement over defaulting to "none".

[1]: spack#31581
@spackbot-app spackbot-app bot added build-systems core PR affects Spack core functionality labels Jan 30, 2023
Copy link
Copy Markdown
Member

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. My only concern is that it could be slow and may be run any time any Spack command is run (can someone confirm or deny this?). We had a similar issue when detecting a default variant for MPI a while back.

@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Feb 1, 2023

My only concern is that it could be slow and may be run any time any Spack command is run (can someone confirm or deny this?).

On my workstation (Ubuntu 20.04 with 1x Radeon VII), timeit.timeit(lambda: _query_kfd_gfx_targets(*amdgpu_targets), number=1) takes 305 us. Increasing that to number=100000 gives an average duration of 71 us.

I would expect the performance to scale linearly with the number of GPUs installed, so I'd estimate that this would add roughly 0.5 ms to 2.4 ms of startup overhead to a machine with 8 GPUs.

@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Feb 2, 2023

On the same machine but without the GPU, the query takes roughly two thirds the time (40-200 us). On a server with 8 AMD GPUs, it takes 300-1300 us. In both cases, the higher value is what is reported by timeit with 1 call and the lower value is the average of 100000 calls.

The query runs pretty much whenever a command uses a spec. For example, the query will be executed when you run spack spec zlib or spack external find. However, the query will not be executed for commands like spack --help or spack env create myenv.

IMO, the overhead seems reasonable. spack info zlib takes 700000 us.

@adamjstewart
Copy link
Copy Markdown
Member

@alalazo do you remember how we handled the MPI case?

@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Feb 13, 2023

@adamjstuwart, @alalazo: have you had a chance to consider this?

@adamjstewart
Copy link
Copy Markdown
Member

I think I'm fine with it but we'll likely want to cache the result. Curious what @alalazo thinks.

@haampie
Copy link
Copy Markdown
Member

haampie commented Feb 20, 2023

Sorry, missed this PR. I like it. What about these suffixes -knack or w/e they are?

I think you can get away with a regex, might be faster than a loop in Python.

I agree with caching this somewhere, we probably do the same for the host cpu arch.

@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Feb 20, 2023

What about these suffixes -knack or w/e they are?

When the target id omits any reference to xnack, the compiler will generate code that is compatible with both the xnack- and xnack+ modes. So, it is safe to ignore those target features.

@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Dec 8, 2023

What would be required to get this merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build-systems core PR affects Spack core functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants