Compute-Specific Architecture Chains

@alalazo I asked you that question at SC16 but did honestly forget how to exactly do the workflow or if it is already implemented (latest docs say no).

In our scenario, we have a heterogeneous cluster with 3 queues of individual compute hardware:
- login nodes (ideally as compile nodes) with AMD Opteron 6376 (gcc: `bdver2`)
- compute 1: AMD Opteron 6276 (gcc: `bdver1`)
- compute 2: Intel Xeon E5-2609 (gcc: `corei7-avx`) + K20 (nvcc: `sm_35`)
- compute 3: intel Xeon E5-2630 (gcc: `core-avx2`) + K80 (nvcc: `sm_37`)

We see dramatic performance increase in our [CPU code(s)](https://github.com/ComputationalRadiationPhysics/alpaka) if we compile on-node (interactively) with `-march=native` (queue-specific architecture flags see above).

Will there be a way to configure user-specific architectures beyond the rather coarse `x86_64` that can honor, e.g. `-march` correctly for all (cross-compile) builds, so we can generate a perfectly tailored set of vector instructions in our binaries for each of those queues?

Ideally, one might want to take GPU architectures (`sm_XY`) directly into account here, too.

CCing @tgamblin 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute-Specific Architecture Chains #2379

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compute-Specific Architecture Chains #2379

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions