-
Notifications
You must be signed in to change notification settings - Fork 29.7k
[Windows] Fix stack corruption issues #113135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
I wonder if there's a way to prevent this from happening. Other developers might run into this issue if they use platform channels.
Was this easy to debug? if not then extra emphasis on that last point.
This was hard to track, it took me >2 days to find the issue. Debugging was harder since this only affected optimized builds (values are optimized away, debugger skips large code paths, etc...). The crashes happened well after the stack was corrupted, making it difficult to track the source. Also, adding
Agreed. I added this bit to the PR description: |
|
I'd like to hold off on landing this until we have established whether engine changes are needed, because this isn't supposed to be necessary. As pointed out above, lots of actual plugins use this structure, so if there's an engine-level problem this is a band-aid that masks it, rather than a fix. |
This is expected; the behavior here was modeled on the existing iOS behavior (I forget whether Android behaves the same way) where the channel serves as a convenience to set up the handler, but doesn't own it. (I think that's a slightly odd structure, but I explicitly wanted to minimize unnecessary divergence between platform APIs for ease of understanding by clients working on plugins across platforms.)
I had a reply mostly written up confused about this because I was looking at the message/method channel code since I'm more familiar with that, and the PR description suggests this is a problem with both, but now I see the problem with event channels. I'm not sure what the problem with message/method channels is though; are you sure you saw corruption from those? If so, I'm still missing a piece of this. The context here is that I wrote the message/method channel code to match iOS, and the design is explicitly that the handlers are not in any conceptual way owned by the channel objects; they shouldn't ever have references to them (so modifying them later shouldn't be possible). The channel code was a community contribution quite a while later, and I apparently didn't have enough of that context paged back in when I reviewed it. I was comparing the PR to iOS, apparently without thinking through the original constraint of not having self-referencing handlers (which doesn't apply to iOS due to ref counting), so missed the problematic Can we fix this in the engine by just replacing
We should avoid this if at all possible. It would make the Window API confusingly inconsistent with other platforms, and would silently break a ton of plugins. |
|
Thanks for the background Stuart, that's very helpful! I agree that structure is odd, but I also agree we should minimize unnecessary divergence. Given the
The crash only happens with
Interesting idea, this would fix the ecosystem without needing any changes. I'll close this off and try that instead! |
I remembered a potential problem with auto-unregistration, which may be why iOS channels were designed that way: registration for a channel name in the engine is last-wins, so if for some reason you ever had two channel objects for the same channel name, you'd get surprising results. E.g., create A, create B, destroy A would result in B's handler being unregistered. (If I were designing these APIs from scratch I would probably make them lifetime-bound, and use an identifier for registration/unregisterion to avoid that problem, but I don't think the oddities here are with the churn of introducing a replacement API surface across all platforms.) |
flutter/engine#36538 attempted to update the engine to Visual Studio 2019, however this was reverted as it broke the framework tree.
Crash...
The crash is unrelated to the underlying issue, which made debugging this hard 😅
Crash reason:
Access violationCall stack:
The underlying issue is that the Windows samples and tests create stack-allocated
MethodChannels andEventChannels inFlutterWindow::OnCreatemethods. Once theFlutterWindow::OnCreatemethod exits, theMethodChannelandEventChannelare destroyed. However, the desktop embedder keeps pointers to these stack-allocated objects' handlers. These handlers would then corrupt the stack if they attempted to modify the deallocated channel. This corrupted stack may cause crashes later depending on the app's behavior.This fix updates the samples to tie the lifetime of
MethodChannels andEventChannels to the apps'FlutterWindow. In the future, we should reconsider this API to prevent lifetime issues. Perhaps destroyingMethodChannels/EventChannels should deregister the handlers.Part of #110948
These codebases may also be affected (found using this):
Pre-launch Checklist
///).If you need help, consider asking for advice on the #hackers-new channel on Discord.