Skip to content

Conversation

@karkhaz
Copy link
Contributor

@karkhaz karkhaz commented Apr 28, 2023

This PR adds the ability to add extra columns to the markdown tables
generated by the dump_markdown_results_table visualization. The
following is taken from the visualization's documentation:

'extra_colums' can be an empty dict. The sample configuration below
assumes that each benchmark result has a 'success' and 'runtime'
metric for both variants, 'variant_1' and 'variant_2'. It adds a
'ratio' column to the table for the 'runtime' metric, and a 'change'
column to the table for the 'success' metric. The 'text' lambda is
called once for each benchmark. The 'text' lambda accepts a single
argument---a dict---that maps variant names to the value of that
variant for a particular metric. The lambda returns a string that is
rendered in the benchmark's row in the new column.  This allows you
to emit arbitrary text or markdown formatting in response to
particular combinations of values for different variants, such as
regressions or performance improvements.

Sample configuration:

```
visualize:
- type: dump_markdown_results_table
  out_file: "-"
  extra_columns:
    runtime:
    - column_name: ratio
      text: >
        lambda b: str(b["variant_2"]/b["variant_1"])
        if b["variant_2"] < (1.5 * b["variant_1"])
        else "**" + str(b["variant_2"]/b["variant_1"])
    success:
    - column_name: change
      text: >
        lambda b: "" if b["variant_2"] == b["variant_1"]
        else "newly passing" if b["variant_2"]
        else "regressed"
```

Example output:

```
## runtime

| Benchmark |  variant_1 | variant_2 | ratio |
| --- | --- | --- | --- |
| bench_1 | 5 | 10 | **2.0** |
| bench_2 | 10 | 5 | 0.5 |

## success

| Benchmark |  variant_1 | variant_2 | notes |
| --- | --- | --- | --- |
| bench_1 | True | True |  |
| bench_2 | True | False | regressed |
| bench_3 | False | True | newly passing |
```

This PR also adds columns to each of the benchcomp perf metrics that
help to identify benchmarks that are interesting from the rendered
Markdown summary. Here is an example Action Summary, screenshots are below.

Runtime metrics have a "% change" extra column whose value is rendered in bold if the % change is >50% (either positive or negative), which is the same as the current threshold for regression alerting.

Screen Shot 2023-04-28 at 07 15 29

The 'success' metric has a 'change' column that is blank if the metric either failed on old+new kani or passed with old+new kani. If the result changed between versions, the value in this column for the benchmark will be "❌ newly failing" or "✅ newly passing".
Screen Shot 2023-04-28 at 07 15 39

Numerical benchmarks like 'number_program_steps' display the difference between the old and new value if it is non-zero. The value is bold if the difference is >10% more than the old value.
Screen Shot 2023-04-28 at 07 15 20

Testing:

  • How is this change tested? Extended existing regression test

  • Is this a refactor change?

Checklist

  • Each commit message has a non-empty body, explaining why the change was made
  • Methods or procedures are documented
  • Regression or unit tests are included, or existing tests cover the modified code
  • My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

karkhaz added 2 commits April 28, 2023 05:48
This commit adds the ability to add extra columns to the markdown tables
generated by the dump_markdown_results_table visualization. The
following is taken from the visualization's documentation:

    'extra_colums' can be an empty dict. The sample configuration below assumes
    that each benchmark result has a 'success' and 'runtime' metric for both
    variants, 'variant_1' and 'variant_2'. It adds a 'ratio' column to the table
    for the 'runtime' metric, and a 'change' column to the table for the
    'success' metric. The 'text' lambda is called once for each benchmark. The
    'text' lambda accepts a single argument---a dict---that maps variant
    names to the value of that variant for a particular metric. The lambda
    returns a string that is rendered in the benchmark's row in the new column.
    This allows you to emit arbitrary text or markdown formatting in response to
    particular combinations of values for different variants, such as
    regressions or performance improvements.

    Sample configuration:

    ```
    visualize:
    - type: dump_markdown_results_table
      out_file: "-"
      extra_columns:
        runtime:
        - column_name: ratio
          text: >
            lambda b: str(b["variant_2"]/b["variant_1"])
            if b["variant_2"] < (1.5 * b["variant_1"])
            else "**" + str(b["variant_2"]/b["variant_1"])
        success:
        - column_name: change
          text: >
            lambda b: "" if b["variant_2"] == b["variant_1"]
            else "newly passing" if b["variant_2"]
            else "regressed"
    ```

    Example output:

    ```
    ## runtime

    | Benchmark |  variant_1 | variant_2 | ratio |
    | --- | --- | --- | --- |
    | bench_1 | 5 | 10 | **2.0** |
    | bench_2 | 10 | 5 | 0.5 |

    ## success

    | Benchmark |  variant_1 | variant_2 | notes |
    | --- | --- | --- | --- |
    | bench_1 | True | True |  |
    | bench_2 | True | False | regressed |
    | bench_3 | False | True | newly passing |
    ```
This commit adds columns to each of the benchcomp perf metrics that help
to identify benchmarks that are interesting from the rendered Markdown
summary.
@karkhaz karkhaz requested a review from a team as a code owner April 28, 2023 06:25
@tautschnig tautschnig enabled auto-merge (squash) April 28, 2023 07:43
@tautschnig tautschnig merged commit 47afab5 into model-checking:main Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants