-
Notifications
You must be signed in to change notification settings - Fork 29.7k
Adding a Cirrus check for Gold Tryjob Status #49370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks like the gold_tryjob_status timed out on this PR. |
Yeah, I have left the new test untriaged for now until this can get more feedback from the engine team. I think the 1 hour time out is unavoidable on cirrus. It will prevent the autoroller from closing right away though. |
| script: | ||
| - ./dev/bots/firebase_testlab.sh | ||
|
|
||
| - name: gold_tryjob_status # linux- pre-submit- only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the convention is to postfix this with -linux in the job name so it's obvious from the PR what OS it's running on.
| // Continue checking until there are no untriaged digests. | ||
| // TODO(Piinks): This should also have the opportunity to fail when new images | ||
| // have been marked negative. This may be possible after | ||
| // https://bugs.chromium.org/p/skia/issues/detail?id=9783 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: file a flutter/flutter bug tracking updating this.
dnfield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that without some notification back to GitHub, this will just generally fail after an hour of retrying and people will think it's just another running job until it fails.
|
Cirrus tasks can be increased to up to 2 hours, but now that I think about it that probably won't be enough time for the autorollers (e.g. a PR opened at 02:00 PT). It might be better to just do this as a cron job in cocoon instead - similar to the push_build_status_to_github jobs. WDYT? |
|
Another advantage of Cocoon is it's already configured to be able to post stuff to GitHub. |
|
I think using cocoon will be a much better approach. Thanks @dnfield. I am going to close this and see about setting it up there. :) |
Description
This PR adds a cirrus task to manage golden file changes more seamlessly.
Currently, when a golden file change is introduced, the associated cirrus shard will fail. Contributors can then visit https://flutter-gold.skia.org/changelists to view the images, diffs and approve the change on the Flutter Gold dashboard. A re-run of the test shard will then pass and a change can land.
This can be time consuming for long test shards that need to fully re-execute, but more importantly creates a blocker for the engine auto roller. When tests fail on an auto roll PR, the PR is automatically closed and tries again later. This does not allow the opportunity for changes to be triaged and re-tested, and results in the need for a manual engine roll to land golden file changes in the engine.
A number of different options were discussed in the related issue #48744, with this approach being the most popular. This change removes the fail condition of separate test shards for golden file testing and adds a cirrus check that queries the status of the given gold tryjob. This shard will wait for all golden file tests to be uploaded to gold and then begin querying the status of the PR's tryjob. If there is nothing to triage (no new images/changes), the status check will turn green and be good to go. If there are golden file changes, the check will remain running, checking every three minutes until the images have been triaged.
This will prevent the engine auto roller from closing right away due to a golden file change. The current engine sheriff can triage the image and the roller will proceed. This will become even easier when https://bugs.chromium.org/p/skia/issues/detail?id=9006 lands, which will comment on PRs with golden file changes, alerting the engine sheriff.
One caveat - the gold status check will time out per Cirrus after one hour. So the check will eventually fail if the images are not triaged within one hour, which would lead to the roller closing the PR and starting again. If the change is not an auto roller, contributors can restart the check, and it will take a very small amount of time ~< 1 minute to update the status, rather than re-running all of the tests.
Related Issues
Fixes #48744
Checklist
Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes (
[x]). This will ensure a smooth and quick review process.///).flutter analyze --flutter-repo) does not report any problems on my PR.Breaking Change
Did any tests fail when you ran them? Please read Handling breaking changes.