Allow marking a golden check as flaky #111595

yjbanov · 2022-09-14T19:57:10Z

Support marking a golden check as flaky. A flaky check will still generate a golden and report it to the current GoldenFileComparator, but it will not fail the test.

The local file comparator was updated to simply not fail on mismatch.

The Skia Gole comparator was updated to submit the golden to Skia Gold with a virtually unlimited failure threshold. The effect is that the test doesn't fail and it does not generate a triage, but it still allows Skia Gold to track generated screenshots over time. When a fix for flakiness lands, it's easy to go Skia Gold UI to confirm it and remove the isFlaky argument from the test.

Fixes #111325

mdebbar

LGTM but I'll let @Piinks make the final call.

mdebbar · 2022-09-14T21:11:59Z

packages/flutter_goldens_client/lib/skia_client.dart

I think you wanted to do the opposite?

mdebbar · 2022-09-14T21:16:36Z

packages/flutter_test/lib/src/goldens.dart

I'm leaning towards giving it a more self-explanatory name like alwaysPass but I don't feel strongly about it.

I wanted to give is a name that implies some technical debt. I'm not sure if alwaysPass conveys that. But I also don't feel strongly about it.

What about falsePositive? Or disable? I am not a great namer. 🙃

alwaysPassDisabledFalsePositiveFlakyTest 😛

Here's my evaluation:

isFlaky:

Pros: clear, implies technical debt

Cons: could be too specific; perhaps you want to use it for something other than flakiness

disable, skip:

Pros: clear, implies technical debt; skip in particular is already a common name used in package:test

Cons: makes it sounds as if the golden is not generated at all; it may be surprising that a disabled/skipped check throws an error because the act of screenshotting the widget fails for some reason ("why is it failing? I skipped it!").

falsePositive:

Pros: implies technical debt

Cons: makes me think 🤔 (which thing is positive? what's false about it?), especially when it can take on both true and false values, and especially if used in conjunction with isNot (i.e. used for false negatives), but isNot(...falsePositive = false) makes me think even harder.

alwaysPass:

Pros: the most accurate description of the functionality (the check always passes no matter what's in the screenshot)

Cons: does not imply technical debt

I think isFlaky might be fine, but maybe skip or alwaysPass are better. skip is already a standard name used in the test framework, and alwaysPass is so accurate it's hard to resist.

I think skip, especially since it is so widely used in the framework already, indicates the test won't run at all, so this would defy a pretty baked in connotation.
I don't have an opinion otherwise, I am not particularly skilled in crafting names. 🙃

Piinks

Considering again the public API that regular Flutter developers will have to wokr with for their own applications, I don't think that it is appropriate to expose isFlaky to them. I still think this is really specific to our use case.

Piinks · 2022-09-15T18:43:02Z

packages/flutter_goldens/lib/flutter_goldens.dart

Can this print statement be within the block of code for when the test fails/flakes? If the test passes, there's no need to add console output.

Piinks · 2022-09-15T18:44:43Z

packages/flutter_test/lib/src/_goldens_io.dart

Piinks · 2022-09-15T18:46:01Z

packages/flutter_test/lib/src/_matchers_web.dart

Nit:

Suggested change

/// should produce a screenshot and make it available for human review but not

/// will produce a screenshot and make it available for human review but not

This class does not provide an implementation of goldenFileComparato, so it cannot guarantee this behavior. I meant this paragraph as a recommendation for implementors. It's up to the implementation to decide whether to respect this flag, what "human review" means (e.g. it may be something other that Skia Gold or local files).

Piinks · 2022-09-15T18:53:56Z

packages/flutter_test/lib/src/goldens.dart

What about falsePositive? Or disable? I am not a great namer. 🙃

Piinks · 2022-09-15T18:58:07Z

packages/flutter_test/lib/src/matchers.dart

I am still not sure about exposing this to all Flutter developers... If a test fails, ideally it would be fixed, right? For our case, the cause of the failure is way more nuanced, a Flutter developer would not be digging into the depths of the engine code like this, and would not be able to to individually address their specific failure.

If a test fails, and the diff is acceptable to the user, they can just run flutter test --update-goldens and move on.

I don't know that the public API should have an opinion about flakiness.

If the cause of flakiness is in the Flutter SDK, then sure. But flakiness can be caused by app code because:

it depends on current date-time.

it depends on Math.random.

it contains a race condition due to asynchrony.

it runs in non-uniform environment (different OS versions, different hardware).

it has non-hermetic dependencies (e.g. pub dependencies shifting versions behind-the-scenes).

Having said that, "flakiness" specifically may be too specific. There may be other reasons a developer might want to label a test to pass but still generate a golden, such as:

You see that a generated golden is completely non-sensical, but at the same time you don't know what it should look like. So you don't want to update the golden to a non-sensical one, but you still want to generate it for human evaluation.

You spot a test that's too brittle. It's trying to test the look of one button, but it's taking a golden of the entire screen (i.e. overtesting). It breaks too often, so you want to disable it, but alert the owner that this test's scope should be reduced.

You want to divide your tests into two categories. One category is like a fully automated unit-test; it is expected to generate stable goldens and prevent code submissions that change them (this is the system Flutter uses). You also want a second category of tests that generate goldens but never fail. The system you submit the goldens to alerts humans (e.g. a QA team) to review any deltas and either approve or file issues.

So I think this option would be useful to developers. OTOH, all this is speculative. Do you have a recommendation for how to expose this flag to tests, if not through matchesGoldenFile?

I think soliciting feedback from someone other than just me would be beneficial here. I think in all of these cases, the developer would want to just fix their test, not accept it as broken and have it fail silently.

No, if these arguments do not immediately come across as sound, I'm totally fine with keeping it private.

Piinks · 2022-09-15T19:00:57Z

packages/flutter_test/lib/src/goldens.dart

We're already agnostic and un-opinionated about this as this doc comment points out, which makes me think we should preserve that.

jmagman · 2022-09-16T22:50:59Z

Customer errors are related to LocalFileComparator.compare:

| Analyzing dashboard...
|
|   error • 'CocoonFileComparator.compare' ('Future<bool> Function(Uint8List, Uri)') isn't a valid override of 'LocalFileComparator.compare' ('Future<bool> Function(Uint8List, Uri, {bool isFlaky})') • test/utils/golden.dart:38:16 • invalid_override
|

| Analyzing tests...
|
|    info • 'hashValues' is deprecated and shouldn't be used. Use Object.hash() instead. This feature was deprecated in v3.1.0-0.0.pre.897 • lib/src/picture_provider.dart:7:70 • deprecated_member_use
|    info • 'hashValues' is deprecated and shouldn't be used. Use Object.hash() instead. This feature was deprecated in v3.1.0-0.0.pre.897 • lib/src/picture_provider.dart:159:7 • deprecated_member_use
|    info • 'hashValues' is deprecated and shouldn't be used. Use Object.hash() instead. This feature was deprecated in v3.1.0-0.0.pre.897 • lib/src/picture_provider.dart:457:23 • deprecated_member_use
|    info • 'hashValues' is deprecated and shouldn't be used. Use Object.hash() instead. This feature was deprecated in v3.1.0-0.0.pre.897 • lib/src/picture_provider.dart:504:23 • deprecated_member_use
|    info • 'hashValues' is deprecated and shouldn't be used. Use Object.hash() instead. This feature was deprecated in v3.1.0-0.0.pre.897 • lib/src/picture_provider.dart:584:23 • deprecated_member_use
|    info • 'hashValues' is deprecated and shouldn't be used. Use Object.hash() instead. This feature was deprecated in v3.1.0-0.0.pre.897 • lib/src/svg/theme.dart:44:23 • deprecated_member_use
|   error • '_TolerantComparator.compare' ('Future<bool> Function(Uint8List, Uri)') isn't a valid override of 'LocalFileComparator.compare' ('Future<bool> Function(Uint8List, Uri, {bool isFlaky})') • test/widget_svg_test.dart:17:16 • invalid_override
|
| 7 issues found. (ran in 10.8s)
ERROR: One or more tests from flutter_svg failed.

Piinks

@yjbanov I saw your ping for a second review, is this still up for review, or will you be making changes because of the failing customer test? In its current state, this can't land AFAICT.

yjbanov · 2022-09-19T20:32:17Z

@yjbanov I saw your ping for a second review, is this still up for review, or will you be making changes because of the failing customer test? In its current state, this can't land AFAICT.

I cannot make changes in customer tests before we agree on the API design. If we think that the API changes in the comparator are the right thing to do, then I'll first fix downstream tests before landing this PR. Alternatively, if we think a different design is better, I will first change this PR, see what's broken by the new approach, and go from there.

One thing that's not resolved is #111595 (comment). @Piinks I shared some thoughts in that discussion. I feel like an API change in matchesGoldenFile might be appropriate, but I'm not attached to it as I don't know enough about how people use it outside the Flutter team and google3 customers. So I'm quite open to alternatives. Do you have a suggestion?

Piinks

I cannot make changes in customer tests before we agree on the API design.

If this is the path you would like to take with the API, what is the plan for migration? Since this is a breaking change, that should be part of the discussion. :)

One thing that's not resolved is #111595 (comment)

Sorry, this being a breaking change takes precedence over this for me. 🤷 Can you please share your plan for folks that are broken by this change? It's a moot point if this is going to be re-worked to be non-breaking or otherwise.

yjbanov · 2022-09-19T21:36:34Z

@Piinks Please treat this PR only as a design proposal. I'm not attached to code written here. It's just a starting point for a discussion. I can change it to whatever works for you.

If this is the path you would like to take with the API, what is the plan for migration? Since this is a breaking change, that should be part of the discussion. :)

It definitely should. The plan right now is to roughly follow https://github.com/flutter/flutter/wiki/Tree-hygiene#handling-breaking-changes (more on why "roughly" below). We are currently doing this part: https://github.com/flutter/flutter/wiki/Tree-hygiene#2-evaluate-the-breaking-change-proposal. I'm not sure if a design doc is necessary. This PR already contains every single detail of the change.

In particular, I'm not sure if a breaking change is necessary, so I'm soliciting alternative proposals that are some combination of the following:

Not breaking.
Better API design-wise.

Here I am asking for your guidance, and also why I'd like to finish this discussion: #111595 (comment). In particular, you say:

I am still not sure about exposing this to all Flutter developers

I am making some arguments in favor or exposing the API. What are your thoughts on that? I'm curious if you have something in mind what doesn't change the public API. Perhaps a private API? I there a prior art I could look at?

One thing that's not resolved is #111595 (comment)

Sorry, this being a breaking change takes precedence over this for me. 🤷 Can you please share your plan for folks that are broken by this change? It's a moot point if this is going to be re-worked to be non-breaking or otherwise.

Yes, I think it's a bit moot to go in depth with the breaking change process until we agree on an approach (which may end up being non-breaking). This is why I'd like to conclude the discussion of the proposal and alternatives first. However, in a nutshell: the breaking part is the GoldenFileComparator.compare method adding a new parameter. So implementors of that interface need to add the new parameter too. It's a trivial change. I'm betting on there being very few custom implementations of this interface. So the plan is that I will fix all known implementations, such as Cocoon and google3, and I'll provide tech support for other implementations if/when people file issues. I don't think wide announcements are necessary (e.g. flutter-announce@ and #announcements). It will mostly be noise for the community. But again, this is where I need your guidance. If you tell me this is a big breaking change that needs to be broadcast widely, I'll think about it differently. Currently, looking at the announcements, we have announced only a few very visible breaking changes.

Piinks · 2022-09-19T22:26:23Z

What are your thoughts on that?

I have shared them. :) Perhaps soliciting feedback from more than one person will help indicate the appropriate direction? That is typical when making breaking changes.

I'm curious if you have something in mind what doesn't change the public API. Perhaps a private API? I there a prior art I could look at?

We already discussed this, which is why I did not comment further repeating myself. If we were to not expose this to all Flutter developers, it would instead require writing a custom implementation of matchesGoldenFile for flutter/flutter to use specifically. I have shared examples of how this is done in flutter/cocoon, as well as how it might be done like we do with the FlutterGoldenFileComparator - overriding the default comparator for our specific use case.

Piinks · 2022-09-19T22:27:40Z

So the plan is that I will fix all known implementations, such as Cocoon and google3

Can you please say how you plan to do this? This change cannot land before cocoon is fixed, and cocoon cannot be fixed without this change. It is not clear how this will be done.

yjbanov · 2022-09-19T23:21:45Z

Thanks, @Piinks! This helps. Here's my understanding of your proposal (precise names and API shape TBD):

Introduce a new function, say matchesFlakyGoldenFile that's private to Flutter's own tests.
Internally this function uses custom (also private) implementations of GoldenFileComparator and WebGoldenComparator that accept flaky goldens.

This change would be non-breaking, so further discussion of breaking changes is not necessary here, but I'll answer your question in case it's useful in the future:

Can you please say how you plan to do this? This change cannot land before cocoon is fixed, and cocoon cannot be fixed without this change. It is not clear how this will be done.

Methods of an implementation of an interface are allowed to add optional parameters not specified in the interface. So I can first update the implementations, then submit this PR.

Piinks · 2022-09-20T19:05:34Z

Sounds good! Thanks for entertaining different ideas on this. :)

flutter-dashboard · 2022-10-10T19:09:11Z

This pull request executed golden file tests, but it has not been updated in a while (20+ days). Test results from Gold expire after as many days, so this pull request will need to be updated with a fresh commit in order to get results from Gold.

For more guidance, visit Writing a golden file test for package:flutter.

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

yjbanov · 2022-11-08T00:37:09Z

Closing in favor of #114450

yjbanov requested review from keyonghan and mdebbar September 14, 2022 19:57

yjbanov requested a review from Piinks as a code owner September 14, 2022 19:57

mdebbar reviewed Sep 14, 2022

View reviewed changes

Piinks reviewed Sep 15, 2022

View reviewed changes

yjbanov requested review from Piinks and mdebbar September 15, 2022 21:00

yjbanov force-pushed the flaky-golden-api branch from d55d634 to f343f06 Compare September 16, 2022 15:41

yjbanov added 6 commits September 16, 2022 14:21

allow marking a golden check as flaky

c5855d2

flip if/else

de79fee

update web comparators too

7af249c

fix tests

2106bc3

lower pixel delta limit

7c67127

fix doc ref

e360777

yjbanov force-pushed the flaky-golden-api branch from f343f06 to e360777 Compare September 16, 2022 21:22

Piinks reviewed Sep 19, 2022

View reviewed changes

yjbanov requested a review from Piinks September 19, 2022 20:32

Piinks reviewed Sep 19, 2022

View reviewed changes

Piinks added the c: API break Backwards-incompatible API changes label Sep 19, 2022

yjbanov requested a review from Piinks September 19, 2022 21:36

yjbanov mentioned this pull request Oct 13, 2022

Allow marking a golden check as flaky #113396

Closed

4 tasks

yjbanov closed this Nov 8, 2022

	/// should produce a screenshot and make it available for human review but not
	/// will produce a screenshot and make it available for human review but not

Allow marking a golden check as flaky #111595

Allow marking a golden check as flaky #111595

Uh oh!

Conversation

yjbanov commented Sep 14, 2022

Uh oh!

mdebbar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yjbanov Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Piinks left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmagman commented Sep 16, 2022

Uh oh!

Piinks left a comment

Choose a reason for hiding this comment

Uh oh!

yjbanov commented Sep 19, 2022

Uh oh!

Piinks left a comment

Choose a reason for hiding this comment

Uh oh!

yjbanov commented Sep 19, 2022

Uh oh!

Piinks commented Sep 19, 2022

Uh oh!

Piinks commented Sep 19, 2022

Uh oh!

yjbanov commented Sep 19, 2022

Uh oh!

Piinks commented Sep 20, 2022

Uh oh!

flutter-dashboard bot commented Oct 10, 2022

Uh oh!

yjbanov commented Nov 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

yjbanov Sep 15, 2022 •

edited

Loading