Skip to content

Conversation

@loic-sharma
Copy link
Member

@loic-sharma loic-sharma commented Oct 7, 2022

flutter/engine#36538 attempted to update the engine to Visual Studio 2019, however this was reverted as it broke the framework tree.

Crash...

The crash is unrelated to the underlying issue, which made debugging this hard 😅

Crash reason: Access violation

Call stack:

flutter_windows.dll!std::_Find_unchecked1(...)
flutter_windows.dll!std::_Find_unchecked(...)
flutter_windows.dll!std::find(...)
flutter_windows.dll!flutter::TaskRunnerWindow::ProcessTasks()
flutter_windows.dll!flutter::TaskRunnerWindow::HandleMessage(...)
user32.dll!UserCallWinProcCheckWow()
user32.dll!DispatchMessageWorker()
platform_channel.exe!wWinMain(...)
platform_channel.exe!invoke_main()
platform_channel.exe!__scrt_common_main_seh()
platform_channel.exe!__scrt_common_main()
platform_channel.exe!wWinMainCRTStartup(...)
kernel32.dll!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart�()

The underlying issue is that the Windows samples and tests create stack-allocated MethodChannels and EventChannels in FlutterWindow::OnCreate methods. Once the FlutterWindow::OnCreate method exits, the MethodChannel and EventChannel are destroyed. However, the desktop embedder keeps pointers to these stack-allocated objects' handlers. These handlers would then corrupt the stack if they attempted to modify the deallocated channel. This corrupted stack may cause crashes later depending on the app's behavior.

This fix updates the samples to tie the lifetime of MethodChannels and EventChannels to the apps' FlutterWindow. In the future, we should reconsider this API to prevent lifetime issues. Perhaps destroying MethodChannels/EventChannels should deregister the handlers.

Part of #110948

These codebases may also be affected (found using this):

  1. https://github.com/flutter/plugins/blob/028e7494bb0c0391ba3e3a4b065af9f52a30cde4/packages/url_launcher/url_launcher_windows/windows/url_launcher_plugin.cpp#L65-L75
  2. https://github.com/flutter/samples/blob/dc50c46ef13c08feb4e4b588101ca70acf840305/experimental/federated_plugin/federated_plugin_windows/windows/federated_plugin_windows_plugin.cpp#L37-L47
  3. https://github.com/flutter/plugins/blob/028e7494bb0c0391ba3e3a4b065af9f52a30cde4/packages/camera/camera_windows/windows/camera_plugin.cpp#L195-L205
  4. https://github.com/flutter/plugins/blob/028e7494bb0c0391ba3e3a4b065af9f52a30cde4/packages/local_auth/local_auth_windows/windows/local_auth_plugin.cpp#L113-L124
  5. https://github.com/flutter/codelabs/blob/b4c9c2cb23db3744395f93f6e4e75cbd5dce5b16/github-client/window_to_front/windows/window_to_front_plugin.cpp#L31-L43
  6. https://github.com/flutter-webrtc/flutter-webrtc/blob/3e4ef91d6aa1a975ce38abf9cdcebf8e4632b5b6/common/cpp/flutter_webrtc_plugin.cc#L15-L30
  7. https://github.com/juicycleff/flutter-unity-view-widget/blob/799e0ca16510422f8ed6939e9fc2ad3aa7e61eb8/windows/flutter_unity_widget_plugin.cpp#L37-L49
  8. https://github.com/getsentry/sentry-dart/blob/fe217baa64bf65dac702f8ef5acd8a267f02b46c/flutter/windows/sentry_flutter_plugin.cpp#L37-L49
  9. And a bunch more :(

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the Flutter Style Guide, including Features we expect every widget to implement.
  • I signed the CLA.
  • I listed at least one issue that this PR fixes in the description above.
  • I updated/added relevant documentation (doc comments with ///).
  • I added new tests to check the change I am making, or this PR is test-exempt.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

@flutter-dashboard flutter-dashboard bot added d: examples Sample code and demos c: contributor-productivity Team-specific productivity, code health, technical debt. labels Oct 7, 2022
Copy link
Contributor

@a-wallen a-wallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

I wonder if there's a way to prevent this from happening. Other developers might run into this issue if they use platform channels.

Was this easy to debug? if not then extra emphasis on that last point.

@loic-sharma
Copy link
Member Author

Was this easy to debug...

This was hard to track, it took me >2 days to find the issue. Debugging was harder since this only affected optimized builds (values are optimized away, debugger skips large code paths, etc...). The crashes happened well after the stack was corrupted, making it difficult to track the source. Also, adding std::couts would fix the issue (likely because stack was changed in such a way to no longer crash).

I wonder if there's a way to prevent this from happening.

Agreed. I added this bit to the PR description: In the future, we should reconsider this API to prevent lifetime issues. Perhaps destroying MethodChannels/EventChannels should deregister the handlers. Thoughts @cbracken or @stuartmorgan?

@loic-sharma loic-sharma marked this pull request as ready for review October 8, 2022 00:26
@stuartmorgan-g
Copy link
Contributor

I'd like to hold off on landing this until we have established whether engine changes are needed, because this isn't supposed to be necessary. As pointed out above, lots of actual plugins use this structure, so if there's an engine-level problem this is a band-aid that masks it, rather than a fix.

@stuartmorgan-g
Copy link
Contributor

stuartmorgan-g commented Oct 8, 2022

Once the FlutterWindow::OnCreate method exits, the MethodChannel and EventChannel are destroyed.

This is expected; the behavior here was modeled on the existing iOS behavior (I forget whether Android behaves the same way) where the channel serves as a convenience to set up the handler, but doesn't own it. (I think that's a slightly odd structure, but I explicitly wanted to minimize unnecessary divergence between platform APIs for ease of understanding by clients working on plugins across platforms.)

However, the desktop embedder keeps pointers to these stack-allocated objects' handlers. These handlers would then corrupt the stack if they attempted to modify the deallocated channel.

I had a reply mostly written up confused about this because I was looking at the message/method channel code since I'm more familiar with that, and the PR description suggests this is a problem with both, but now I see the problem with event channels. I'm not sure what the problem with message/method channels is though; are you sure you saw corruption from those? If so, I'm still missing a piece of this.

The context here is that I wrote the message/method channel code to match iOS, and the design is explicitly that the handlers are not in any conceptual way owned by the channel objects; they shouldn't ever have references to them (so modifying them later shouldn't be possible). The channel code was a community contribution quite a while later, and I apparently didn't have enough of that context paged back in when I reviewed it. I was comparing the PR to iOS, apparently without thinking through the original constraint of not having self-referencing handlers (which doesn't apply to iOS due to ref counting), so missed the problematic this usage.

Can we fix this in the engine by just replacing is_listening_ with a stateful mutable lambda variable? We're using a new enough version of C++ now (which might not have been true at the time? I forget), but I'm not sure without trying it if there's something else about how we're using the lambda that prevents that.

Perhaps destroying MethodChannels/EventChannels should deregister the handlers.

We should avoid this if at all possible. It would make the Window API confusingly inconsistent with other platforms, and would silently break a ton of plugins.

@loic-sharma
Copy link
Member Author

loic-sharma commented Oct 8, 2022

Thanks for the background Stuart, that's very helpful! I agree that structure is odd, but I also agree we should minimize unnecessary divergence.

Given the EventChannel structure, I'm worried that years from now we'll forget about these constraints again and reintroduce a similar bug. Is there a clean way to enforce that handlers don't reference the channel? Perhaps a test that fails if EventChannel or MethodChannel had such a bug? I'll experiment making a test that fails if stack corruption is detected. In any case, I'll add comments to MethodChannel::SetMethodCallHandler and EventChannel::SetStreamHandler to call out these constraints.

I'm not sure what the problem with message/method channels is though; are you sure you saw corruption from those?

The crash only happens with EventChannels today. This change included both EventChannels and MethodChannels as their implementations are very similar, the mistake in EventChannel could easily be made in MethodChannel as well.

Can we fix this in the engine by just replacing is_listening_ with a stateful mutable lambda variable?

Interesting idea, this would fix the ecosystem without needing any changes. I'll close this off and try that instead!

@loic-sharma loic-sharma closed this Oct 8, 2022
@stuartmorgan-g
Copy link
Contributor

I agree that structure is odd, but I also agree we should minimize unnecessary divergence.

I remembered a potential problem with auto-unregistration, which may be why iOS channels were designed that way: registration for a channel name in the engine is last-wins, so if for some reason you ever had two channel objects for the same channel name, you'd get surprising results. E.g., create A, create B, destroy A would result in B's handler being unregistered.

(If I were designing these APIs from scratch I would probably make them lifetime-bound, and use an identifier for registration/unregisterion to avoid that problem, but I don't think the oddities here are with the churn of introducing a replacement API surface across all platforms.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c: contributor-productivity Team-specific productivity, code health, technical debt. d: examples Sample code and demos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants