Skip to content

Compute-Specific Architecture Chains #2379

@ax3l

Description

@ax3l

@alalazo I asked you that question at SC16 but did honestly forget how to exactly do the workflow or if it is already implemented (latest docs say no).

In our scenario, we have a heterogeneous cluster with 3 queues of individual compute hardware:

  • login nodes (ideally as compile nodes) with AMD Opteron 6376 (gcc: bdver2)
  • compute 1: AMD Opteron 6276 (gcc: bdver1)
  • compute 2: Intel Xeon E5-2609 (gcc: corei7-avx) + K20 (nvcc: sm_35)
  • compute 3: intel Xeon E5-2630 (gcc: core-avx2) + K80 (nvcc: sm_37)

We see dramatic performance increase in our CPU code(s) if we compile on-node (interactively) with -march=native (queue-specific architecture flags see above).

Will there be a way to configure user-specific architectures beyond the rather coarse x86_64 that can honor, e.g. -march correctly for all (cross-compile) builds, so we can generate a perfectly tailored set of vector instructions in our binaries for each of those queues?

Ideally, one might want to take GPU architectures (sm_XY) directly into account here, too.

CCing @tgamblin

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions