-
-
Notifications
You must be signed in to change notification settings - Fork 21
Status check based on counting the number of files #1342
Description
While developing tests for HPCToolkit's CUDA support in the E4S deployment on Perlmutter (NERSC), we came up with several ideas on new features in buildtest that would be useful for validation of the collected HPCToolkit measurement data and post-mortem analysis, in order to determine whether this specific HPCToolkit test passes or fails.
For reference, the HPCToolkit workflow for a GPU-accelerated application consists of 4 phases:
- Performance measurement phase: using
hpcrunto collect profiles and/or traces. This phase results in a "measurement directory". - Binary analysis phase: using
hpcstructto analyze the program structure of executables and libraries loaded in the address space during application execution. This phase modifies the measurement directory. - Attribution phase: using
hpcproforhpcprof-mpito correlate the profiling samples and trace records in the measurement directory into a performance database. This phase takes the measurement directory as input and generates a "database directory" as output. - Analysis phase: using the
hpcviewergraphical user interface (GUI) to view the performance database and conduct performance analysis. This phase uses the database directory.
Our first feature request to support the aforementioned HPCToolkit CUDA tests was file/directory existence checks #1327. This has been implemented in #1329 and further enhanced by #1331. The specific use cases are:
- validating the performance measurement (
hpcrun) phase via the existence of*.hpcrunprofiles and*.hpctracetrace files in the measurement directory - validating the binary analysis (
hpcstruct) phase via the existence of one specific*.hpcstructprogram structure file in the measurement directory. - validating the attribution (
hpcprof/hpcprof-mpi) phase via the existence of performance database components in the database directory.
This issue contains our second feature request for a status check based on counting the number of files. The specific use case is validating the hpcrun phase by verifying the correct number of *.hpcrun profiles and *.hpctrace trace files in the measurement directory. The "correct" numbers depend on the command line flags to hpcrun as well as the Slurm srun parameters. At the moment the HPCToolkit CUDA tests contain two buildspec YAML files:
-
hpctoolkit_cuda_vecadd_perlmutter.ymlfor a coarse-grained profiling-e gpu=nvidiaand tracing-trun using thecuda_vecaddtest case.
Correct = 2*.hpcrunfiles and 2*.hpctracefiles. -
hpctoolkit_cuda_vecadd_pc_perlmutter.ymlfor a fine-grained profiling run using the with-e gpu=nvidia,pcwith no tracing using thecuda_vecaddtest case.
Correct = 1*.hpcrunfile and 0*.hpctracefiles.
Our third feature request for grep into log files is detailed in #1343.