Skip to content

Status check based on grep or regex into log files produced during the test #1343

@wyphan

Description

@wyphan

While developing tests for HPCToolkit's CUDA support in the E4S deployment on Perlmutter (NERSC), we came up with several ideas on new features in buildtest that would be useful for validation of the collected HPCToolkit measurement data and post-mortem analysis, in order to determine whether this specific HPCToolkit test passes or fails.

For reference, the HPCToolkit workflow for a GPU-accelerated application consists of 4 phases:

  1. Performance measurement phase: using hpcrun to collect profiles and/or traces. This phase results in a "measurement directory".
  2. Binary analysis phase: using hpcstruct to analyze the program structure of executables and libraries loaded in the address space during application execution. This phase modifies the measurement directory.
  3. Attribution phase: using hpcprof or hpcprof-mpi to correlate the profiling samples and trace records in the measurement directory into a performance database. This phase takes the measurement directory as input and generates a "database directory" as output.
  4. Analysis phase: using the hpcviewer graphical user interface (GUI) to view the performance database and conduct performance analysis. This phase uses the database directory.

Our first feature request to support the aforementioned HPCToolkit CUDA tests was file/directory existence checks #1327. This has been implemented in #1329 and further enhanced by #1331. The specific use cases are:

  • validating the performance measurement (hpcrun) phase via the existence of *.hpcrun profiles and *.hpctrace trace files in the measurement directory
  • validating the binary analysis (hpcstruct) phase via the existence of one specific *.hpcstruct program structure file in the measurement directory.
  • validating the attribution (hpcprof/hpcprof-mpi) phase via the existence of performance database components in the database directory.

Our second feature request for file counts is detailed in #1342.

This issue contains our third feature request for a status check based on regular expression search into the log files that are produced during the run. The specific use case is validating the hpcrun phase by searching the log file for the number of recorded samples during the run. If the number of GPU samples are above a certain threshold, then the run collected worthwhile measurements that can be further processed post-mortem with hpcstruct and hpcprof+/-mpi. Here are the current implementation with a chained grep | grep | awk:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions