Skip to content

Benchmark complex_layout_scroll_perf__memory needs better metrics to avoid reporting phantom regressions #40406

@flar

Description

@flar

The main issue (after some discussion) is that this benchmark is not reporting metrics that exactly match its concern and so situations such as those described below can cause it to indicate a regression when a change simply affects its underlying assumptions instead. We need to develop better metrics for it to measure so that it only flags when we have an actual memory regression.

Until then the benchmark acts mostly as a "canary" that will often see a change that might negatively affect app performance, but sometimes in cases that don't really impact real-world performance. Changes in the benchmark should be noted, but more investigation will determine if the change in reported metrics really represents a regression.

Originally reported issue:

I'm investigating the historic slowdowns in the complex_layout_scroll_perf__memory benchmark. It looks like we were at around 5000 in late July and since then we've been stuck around 9000+.

It appears that commit 9b150f1 is responsible for that change. Here are 3 runs of the benchmark on a Moto G4 first running on commit 0d0af31 (which is just before the indicated commit) and then on commit 9b150f1

Stats for hash 0d0af3

On G4:

  "data": {
    "start-min": 34210,
    "start-max": 40838,
    "start-median": 39220,
    "end-min": 44052,
    "end-max": 46190,
    "end-median": 45034,
    "diff-min": 4464,
    "diff-max": 11980,
    "diff-median": 5033
  },
  "data": {
    "start-min": 35164,
    "start-max": 41233,
    "start-median": 40473,
    "end-min": 44973,
    "end-max": 45653,
    "end-median": 45321,
    "diff-min": 4125,
    "diff-max": 10322,
    "diff-median": 4487
  },
  "data": {
    "start-min": 32907,
    "start-max": 40356,
    "start-median": 39693,
    "end-min": 44393,
    "end-max": 48066,
    "end-median": 44879,
    "diff-min": 4099,
    "diff-max": 15159,
    "diff-median": 4907
  },

Stats for hash 9b150f:

On G4:

  "data": {
    "start-min": 31106,
    "start-max": 33471,
    "start-median": 31184,
    "end-min": 40494,
    "end-max": 45019,
    "end-median": 40980,
    "diff-min": 7793,
    "diff-max": 11548,
    "diff-median": 9319
  },
  "data": {
    "start-min": 31144,
    "start-max": 32886,
    "start-median": 31240,
    "end-min": 40506,
    "end-max": 41104,
    "end-median": 40906,
    "diff-min": 7932,
    "diff-max": 9798,
    "diff-median": 9690
  },
  "data": {
    "start-min": 30958,
    "start-max": 32838,
    "start-median": 31365,
    "end-min": 40332,
    "end-max": 41224,
    "end-median": 40892,
    "diff-min": 8054,
    "diff-max": 9874,
    "diff-median": 9501
  },

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Issues that are less important to the Flutter projectc: contributor-productivityTeam-specific productivity, code health, technical debt.c: performanceRelates to speed or footprint issues (see "perf:" labels)engineflutter/engine related. See also e: labels.perf: memoryPerformance issues related to memoryteam-engineOwned by Engine teamtriaged-engineTriaged by Engine team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions