Conversation
We can query the driver to see what hardware is available and use that to inform the default variant value. If the query fails for any reason, then the default will be "none", just like it was before hardware autodetect was added. This is a bit of a holdover until archspec can properly detect the amdgpu_target [1]. Setting the default amdgpu_target variant value with a reasonable guess is not a perfect solution, but it seems like a strict improvement over defaulting to "none". [1]: spack#31581
adamjstewart
left a comment
There was a problem hiding this comment.
I like it. My only concern is that it could be slow and may be run any time any Spack command is run (can someone confirm or deny this?). We had a similar issue when detecting a default variant for MPI a while back.
On my workstation (Ubuntu 20.04 with 1x Radeon VII), I would expect the performance to scale linearly with the number of GPUs installed, so I'd estimate that this would add roughly 0.5 ms to 2.4 ms of startup overhead to a machine with 8 GPUs. |
|
On the same machine but without the GPU, the query takes roughly two thirds the time (40-200 us). On a server with 8 AMD GPUs, it takes 300-1300 us. In both cases, the higher value is what is reported by timeit with 1 call and the lower value is the average of 100000 calls. The query runs pretty much whenever a command uses a spec. For example, the query will be executed when you run IMO, the overhead seems reasonable. |
|
@alalazo do you remember how we handled the MPI case? |
|
@adamjstuwart, @alalazo: have you had a chance to consider this? |
|
I think I'm fine with it but we'll likely want to cache the result. Curious what @alalazo thinks. |
|
Sorry, missed this PR. I like it. What about these suffixes -knack or w/e they are? I think you can get away with a regex, might be faster than a loop in Python. I agree with caching this somewhere, we probably do the same for the host cpu arch. |
When the target id omits any reference to xnack, the compiler will generate code that is compatible with both the |
|
What would be required to get this merged? |
We can query the driver to see what hardware is available and use that to inform the default variant value. If the query fails for any reason, then the default will be "none", just like it was before hardware autodetect was added.
This should hold us over until archspec can properly detect the amdgpu_target (which was mentioned in the related issue). Setting the default amdgpu_target variant value with a reasonable guess is not a perfect solution, but it seems like a strict improvement over defaulting to "none". I'm not the right person for work on archspec, so I hope you don't mind too much that I'm only contributing a stopgap.