-
-
Notifications
You must be signed in to change notification settings - Fork 21
Status check based on grep or regex into log files produced during the test #1343
Description
While developing tests for HPCToolkit's CUDA support in the E4S deployment on Perlmutter (NERSC), we came up with several ideas on new features in buildtest that would be useful for validation of the collected HPCToolkit measurement data and post-mortem analysis, in order to determine whether this specific HPCToolkit test passes or fails.
For reference, the HPCToolkit workflow for a GPU-accelerated application consists of 4 phases:
- Performance measurement phase: using
hpcrunto collect profiles and/or traces. This phase results in a "measurement directory". - Binary analysis phase: using
hpcstructto analyze the program structure of executables and libraries loaded in the address space during application execution. This phase modifies the measurement directory. - Attribution phase: using
hpcproforhpcprof-mpito correlate the profiling samples and trace records in the measurement directory into a performance database. This phase takes the measurement directory as input and generates a "database directory" as output. - Analysis phase: using the
hpcviewergraphical user interface (GUI) to view the performance database and conduct performance analysis. This phase uses the database directory.
Our first feature request to support the aforementioned HPCToolkit CUDA tests was file/directory existence checks #1327. This has been implemented in #1329 and further enhanced by #1331. The specific use cases are:
- validating the performance measurement (
hpcrun) phase via the existence of*.hpcrunprofiles and*.hpctracetrace files in the measurement directory - validating the binary analysis (
hpcstruct) phase via the existence of one specific*.hpcstructprogram structure file in the measurement directory. - validating the attribution (
hpcprof/hpcprof-mpi) phase via the existence of performance database components in the database directory.
Our second feature request for file counts is detailed in #1342.
This issue contains our third feature request for a status check based on regular expression search into the log files that are produced during the run. The specific use case is validating the hpcrun phase by searching the log file for the number of recorded samples during the run. If the number of GPU samples are above a certain threshold, then the run collected worthwhile measurements that can be further processed post-mortem with hpcstruct and hpcprof+/-mpi. Here are the current implementation with a chained grep | grep | awk:
-
hpctoolkit_cuda_vecadd_perlmutter.ymlfor a coarse-grained profiling-e gpu=nvidiaand tracing-trun using thecuda_vecaddtest case.
https://github.com/buildtesters/buildtest-nersc/blob/53cdc7d0d7aa9684b89b15704527962384973acc/buildspecs/apps/hpctoolkit/hpctoolkit_cuda_vecadd_perlmutter.yml#L35 -
hpctoolkit_cuda_vecadd_pc_perlmutter.ymlfor a fine-grained profiling run using the with-e gpu=nvidia,pcwith no tracing using thecuda_vecaddtest case.
https://github.com/buildtesters/buildtest-nersc/blob/53cdc7d0d7aa9684b89b15704527962384973acc/buildspecs/apps/hpctoolkit/hpctoolkit_cuda_vecadd_pc_perlmutter.yml#L32