-
Notifications
You must be signed in to change notification settings - Fork 29.7k
[O] Remove many timeouts. #23531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[O] Remove many timeouts. #23531
Conversation
|
cc @tvolkert |
|
The tests will likely fail until #23529 lands. They may be broken after that too, I can't run them locally until that is fixed. |
8436e7e to
b138fae
Compare
|
In many (most?) cases, these timeouts were added to alert the developer that something is likely wrong. That use case is solved by logging trace statements (or warnings), but this PR doesn't always ensure that such logging will take place. What if we introduced something like the following: typedef LogFunction = void Function(int attempt, Duration interval);
void _defaultMonitorLog(int attempt, Duration interval) {
final Duration elapsed = attempt & interval;
print('Future is taking too long ($elapsed)');
}
/// Monitors the specified [future], logging a message (via [log])
/// periodically (according to [interval]) if the future is taking too
/// long to complete.
Future<T> monitor(
Future<T> future,
Duration interval, {
LogFunction log = _defaultMonitorLog,
}) async {
assert(future != null);
assert(interval != null);
assert(interval > Duration.zero);
int attempt = 1;
while (true) {
try {
return await future.timeout(interval);
} on TimeoutException {
log(attempt);
attempt++;
}
}
}Then in cases where logging could be useful (most cases I'm guessing), we could call: Before: await foo.timeout(const Duration(minutes: 1));After: await monitor(foo, const Duration(minutes: 1)); |
|
What are the cases where it doesn't happen? Any time where we're timing out but not showing a progress indicator, we have another bug, namely, that we're not showing a progress indicator. I'd rather fix that bug than only fix the part where we're not showing a message when it times out. |
c93ec39 to
0cc9fc5
Compare
e5ef756 to
70c695c
Compare
|
There might still be some issues, I haven't rereviewed the PR myself yet, but if you are curious to review it, go ahead. It should be reasonably solid at this point. There's one or two random delays that I haven't tried to remove yet, might do that in another PR though as this has gotten quite big already. |
|
ok, still some flakiness i need to shake out. |
DanTup
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor questions/comments, but otherwise LGTM (assuming the builds pass and are not flaky, ofc :-))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to return future to avoid completing when the warning is written? (there are things that await the return of _warnIfSlow)
Scratch that, didn't see the cascade ..!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit confusing:
Failures should be indicated by completing the future with a string
It sounds like the future will complete successfully with a String but the return type is Future<void>. Should it say that they complete with an error/throw a String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Clarified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any significance to the 2 worth commenting (or making a const)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the exit code for the process (documented in the superclass, which declares the abstract method implemented here). We seem to use 1 for user error and 2 for failure. I'll add a comment to the superclass about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for changing from printError to printTrace? Could this stop the user from getting feedback that it failed? (Seems like in the Daemon's logger we just drop printTrace events and only send printErrors- "This is a lot of traffic to send over the wire").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole function is run in the context of a Status progress indicator, so we can't print anything to the console until it's stopped. Also, we should never show a stack trace to the user.
The message is still reported, just after the progress indicator is stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it matters for any of the tests we have, but this code looks racy if the code with the breakpoint only runs once. Is it possible the code for the breakpoint has already run by the time the breakpoint is added, so it never pauses?
(the original code used reloads/restarts to avoid this, but it was kinda clumsy!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment about this. It's certainly possible to use it badly.
|
LGTM with the changes (though some of the attach tests seem to be timing out). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a documentation block about when the completed future returns and the fact that it explicitly will not time out, so if a developer wants that, they need to do it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the point of this delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spoke to the person who added it, and they weren't sure but seemed to think it was prophylactic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support this PR, but it's worth understanding the context in which this timeout was added (in #16762) -- we were seeing cases where the CI container was timing out after indefinitely waiting for a future to complete, and when that happened, the driver test harness wasn't getting a chance to clean up and write out its logs, so we had no context as to what was hanging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add flushing or something in that case, to make sure we're writing the logs promptly. A crash would presumably have the same problem, and a timeout won't help there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another unexplained delay. This was a smell to begin with, but I'm curious why it existed and if we're going to re-introduce an old problem by removing it without correspondingly fixing some underlying issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite possible, but I've no idea what to do about it. Leaving the delay in doesn't seem reasonable either, it's just making the original bug harder to track down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not guaranteed that timeout is set here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove the .timeout() call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed (also removed the argument to the method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout may not be set - looks like we should just remove the .timeout() call here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed (also removed the argument to the method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(timeout != null)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tvolkert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks for the review. Will land on green. |
|
Looks like |
|
The failures are all "Test timed out after 30 seconds.". I think this is just because these tests can be slow - they used to have longer timeouts but I think this changes them. I think we need to put back the increased timeouts for these tests. (I can't repro any failures locally, but I presume my machine is faster what we're getting on the CI bots). |
These are essentially self-inflicted race conditions. Instead of timeouts we're going to try a more verbose logging mechanism that points out when things are taking a long time.
|
@DanTup Yeah, I reproduced the errors locally by running a lot of them at once. I've added a comment next to the extended timeouts explaining why, so that I'm not tempted to remove them again. We really should make devfs sync faster (that's the long pole here). Tests seem good now. Will land on green. |
* flt_master: (143 commits) Roll engine 215ca15..d8c5ec0 (12 commits) (flutter#25728) Provide some more locations for the FAB. (flutter#24736) Undeprecated BigInteger support, but document what it actually does. (flutter#24511) ClipPath.shape and related fixes (flutter#24816) Handle errors in `compute()` by propagating them to the Future. (flutter#24848) Fix merge conflict. (flutter#25718) Some minor tweaks to InputDecoration (mainly docs). (flutter#24643) Expose font fallback API in TextStyle, Roll engine 54a3577..215ca15 (8 commits) (flutter#25585) Updated Shrine demo (flutter#25674) Pin the goldens repo to a specific commit in the android_views test. (flutter#25678) Friendlier flags for Dart compilation training. (flutter#25645) Revert dependency upgrade to see if it helps with build times and APK size (flutter#25642) Depend on the goldens repo through git. (flutter#25479) no period after an alone link in see also section (flutter#25604) Update links for China help (flutter#25238) Roll engine 6a90418..54a3577 (23 commits) (flutter#25649) Roll engine e859296..6a90418 (4 commits) (flutter#25643) Adding support for android app bundle - Issue flutter#17829 (flutter#24440) Revert "[O] Remove many timeouts. (flutter#23531)" (flutter#25646) [O] Remove many timeouts. (flutter#23531) ...
* Remove many timeouts. These are essentially self-inflicted race conditions. Instead of timeouts we're going to try a more verbose logging mechanism that points out when things are taking a long time. * Get the attach tests to pass. * Apply review comments from Todd * More review comment fixes * Put back the extended timeouts here now that I know why we have them...
These are essentially self-inflicted race conditions. Instead of timeouts we're going to try a more verbose logging mechanism that points out when things are taking a long time.