-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Compute-Specific Architecture Chains #2379
Description
@alalazo I asked you that question at SC16 but did honestly forget how to exactly do the workflow or if it is already implemented (latest docs say no).
In our scenario, we have a heterogeneous cluster with 3 queues of individual compute hardware:
- login nodes (ideally as compile nodes) with AMD Opteron 6376 (gcc:
bdver2) - compute 1: AMD Opteron 6276 (gcc:
bdver1) - compute 2: Intel Xeon E5-2609 (gcc:
corei7-avx) + K20 (nvcc:sm_35) - compute 3: intel Xeon E5-2630 (gcc:
core-avx2) + K80 (nvcc:sm_37)
We see dramatic performance increase in our CPU code(s) if we compile on-node (interactively) with -march=native (queue-specific architecture flags see above).
Will there be a way to configure user-specific architectures beyond the rather coarse x86_64 that can honor, e.g. -march correctly for all (cross-compile) builds, so we can generate a perfectly tailored set of vector instructions in our binaries for each of those queues?
Ideally, one might want to take GPU architectures (sm_XY) directly into account here, too.
CCing @tgamblin