Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@gaaclarke
Copy link
Member

@gaaclarke gaaclarke commented Aug 17, 2022

description

This constitutes a 30% reduction in the 1MB test and little change in the other tests.

This will affect platform channel communications, including loading assets from the asset directory.

This is an alternative to the PR that eliminates copies if we have a modifiable buffer: #34018

breaking change notice

This can break users at runtime in the following cases:

  1. usage of a BinaryCodec that edits the ByteData returned from platform message calls
  2. custom implementations of MessageCodec that edit the ByteData returned from platform message calls

It isn't technically a breaking change since it doesn't appear to break any existing tests.

local testing results

Local iOS platform channel benchmark results with the framework pr flutter/flutter#109707:

## before ##
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 47.8 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 647.6 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 52.8 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 485.9 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 25.5 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 39.2 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 22.5 µs

## after ##
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 43.0 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 660.3 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 47.9 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 342.4 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 26.4 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 38.6 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 21.4 µs

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
  • I listed at least one issue that this PR fixes in the description above.
  • I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See testing the engine for instructions on
    writing and running engine tests.
  • I updated/added relevant documentation (doc comments with ///).
  • I signed the CLA.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

@gaaclarke
Copy link
Member Author

@a-siva @rmacnak-google I tested the performance of the UnmodifiableByteDataView, looks good to me. See the results in the description. Thanks! I'll see if I can get the UnmodifiableByteDataView constructor called from c++ to avoid the framework change and put this up for review.

@gaaclarke
Copy link
Member Author

gaaclarke commented Aug 17, 2022

I've tried to eliminate the change in the framework by allocating the UnmodifiableByteDataView in the c++ code but Dart_New crashes with "unreachable" in release builds.

edit: This is because UnmodifiableByteDataView is getting tree shaken.

void PlatformMessageResponseDart::Complete(std::unique_ptr<fml::Mapping> data) {
  PostCompletion(
      std::move(callback_), ui_task_runner_, &is_complete_, channel_,
      [data = std::move(data)]() mutable {
        Dart_Handle byte_buffer;
        intptr_t size = data->GetSize();
        if (data->GetSize() > 1000) {
          // The ByteData will be wrapped with an
          // UnmodifiableByteDataView to protect constness in Dart.
          void* mapping = const_cast<uint8_t*>(data->GetMapping());
          byte_buffer = Dart_NewExternalTypedDataWithFinalizer(
              /*type=*/Dart_TypedData_kByteData,
              /*data=*/mapping,
              /*length=*/size,
              /*peer=*/data.release(),
              /*external_allocation_size=*/size,
              /*callback=*/MappingFinalizer);
        } else {
          byte_buffer =
              tonic::DartByteData::Create(data->GetMapping(), data->GetSize());
        }

        Dart_Handle type_data = Dart_LookupLibrary(
            tonic::DartConverter<std::string>().ToDart("dart:typed_data"));
        FML_DCHECK(!(Dart_IsNull(type_data) || Dart_IsError(type_data)));
        Dart_Handle unmodifiable_byte_data_type =
            Dart_GetNullableType(type_data,
                                 tonic::DartConverter<std::string>().ToDart(
                                     "UnmodifiableByteDataView"),
                                 0, nullptr);
        FML_DCHECK(!(Dart_IsNull(unmodifiable_byte_data_type) ||
                     Dart_IsError(unmodifiable_byte_data_type)));
        return Dart_New(unmodifiable_byte_data_type, Dart_Null(), 1,
                        &byte_buffer);
      });
}

@gaaclarke
Copy link
Member Author

I've also attempted adding the unmodifiedbytedataview allocation in the PlatformDispatcher, but it incurs a higher cost.

BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 51.0 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 665.4 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 49.8 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 348.0 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 24.9 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 40.0 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 22.7 µs
--- a/lib/ui/platform_dispatcher.dart
+++ b/lib/ui/platform_dispatcher.dart
@@ -521,14 +521,17 @@ class PlatformDispatcher {
   /// The framework invokes [callback] in the same zone in which this method was
   /// called.
   void sendPlatformMessage(String name, ByteData? data, PlatformMessageResponseCallback? callback) {
+    PlatformMessageResponseCallback? mutableCallback = callback == null ? null : (ByteData? mutableData) {
+      callback(mutableData == null ? null : UnmodifiableByteDataView(mutableData));
+    };
     final String? error =
-        _sendPlatformMessage(name, _zonedPlatformMessageResponseCallback(callback), data);
+        _sendPlatformMessage(name, _zonedPlatformMessageResponseCallback(mutableCallback), data);
     if (error != null) {
       throw Exception(error);
     }
   }

@gaaclarke
Copy link
Member Author

I tried wrapping the constructor for UnmodifiableByteDataView into a function in dart:ui, this also provides tiny overhead for the standard case as well.

BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 53.4 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 667.3 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 50.7 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 347.7 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 24.9 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 44.4 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 22.7 µs
void PlatformMessageResponseDart::Complete(std::unique_ptr<fml::Mapping> data) {
  PostCompletion(std::move(callback_), ui_task_runner_, &is_complete_, channel_,
                 [data = std::move(data)]() mutable {
                   Dart_Handle byte_buffer;
                   intptr_t size = data->GetSize();
                   if (data->GetSize() > 1000) {
                     // The ByteData will be wrapped with an
                     // UnmodifiableByteDataView to protect constness in Dart,
                     // see //lib/ui/platform_dispatcher.dart
                     void* mapping = const_cast<uint8_t*>(data->GetMapping());
                     byte_buffer = Dart_NewExternalTypedDataWithFinalizer(
                         /*type=*/Dart_TypedData_kByteData,
                         /*data=*/mapping,
                         /*length=*/size,
                         /*peer=*/data.release(),
                         /*external_allocation_size=*/size,
                         /*callback=*/MappingFinalizer);
                   } else {
                     byte_buffer = tonic::DartByteData::Create(
                         data->GetMapping(), data->GetSize());
                   }
                   Dart_Handle ui_lib = Dart_LookupLibrary(
                       tonic::DartConverter<std::string>().ToDart("dart:ui"));
                   FML_DCHECK(!(Dart_IsNull(ui_lib) || Dart_IsError(ui_lib)));
                   Dart_Handle result =
                       Dart_Invoke(ui_lib,
                                   tonic::DartConverter<std::string>().ToDart(
                                       "_wrapUnmodifiableByteData"),
                                   1, &byte_buffer);
                   FML_DCHECK(!(Dart_IsNull(result) || Dart_IsError(result)));
                   return result;
                 });

@gaaclarke
Copy link
Member Author

Here are the results of selectively wrapping only when we are are using external typed data:

BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 50.5 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 670.4 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 52.0 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 340.2 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 24.6 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 39.2 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 22.2 µs

@gaaclarke
Copy link
Member Author

The results between always making the UnmodifiableByteDataView and selectively doing it are so so similar I'm going to go with always creating it to make the behavior more reliable.

@gaaclarke gaaclarke changed the title Eliminated the response copy for large buffers Eliminates Dart->Host response copy for large buffers Aug 17, 2022
@gaaclarke gaaclarke force-pushed the eliminate-copies-unmodifiable branch from 32da9e6 to 09acb28 Compare August 17, 2022 22:01
@gaaclarke gaaclarke marked this pull request as ready for review August 17, 2022 22:23
@rmacnak-google
Copy link
Contributor

Note the existence of _wrapUnmodifiableByteData as an entry point means the unmodifiable views cannot be tree-shaken in all Flutter programs. This in turn means the AOT compiler will not be able to eliminate the is-mutable checks before typed data writes if unmodifiable views are never used (an optimization that does not currently exist). If you find the level of performance without this optimization acceptable, the VM could offer a Dart_NewUnmodifiableTypedData variant of Dart_NewExternalTypedDataWithFinalizer, which would prevent shaking the unmodifiable views for all Dart programs and forgo the possibility of this optimization. The VM would be able to label unmodifiable views created in this was as pass-by-reference within an isolate group.

@mkustermann

@gaaclarke
Copy link
Member Author

This in turn means the AOT compiler will not be able to eliminate the is-mutable checks before typed data writes if unmodifiable views are never used

Platform Channels are fundamental to regular Flutter execution so I don't believe there will ever be a Flutter app that doesn't use unmodifiable views. Even apps that don't use the public platform channel api are using it indirectly inside of the engine. The benchmarks I used for platform channels never writes to the ByteData. You are saying that potentially this will be detrimental to an existing app that is doing a lot of writing to a ByteData though, correct? Do we have some idea about that impact from the Dart benchmarks?

@mkustermann
Copy link
Member

... You are saying that potentially this will be detrimental to an existing app that is doing a lot of writing
to a ByteData though, correct?

The CL that introduces immutability checks has already landed (see dart-lang/sdk@c1e67ac). Applications that didn't use Unmodifiable*View were regressed by it if the compiler cannot determine the exact typed data class (measurements indicate tight loops writing bytes may have become ~ 2x slower). Applications that did use the unmodifiable views were very slow before and become much faster with that change. It is the right change to make.

This PR prevents tree shaking of the unmodifiable class. As a consequence we cannot add a future "If unmodifiable views are tree shaken, never perform immutability checks" - optimization. Though if it's common to have unmodifiable views (some existing g3 apps already use them) - that doesn't matter.

We have other (more heavy compiler optimization) opportunities to minimize impact of immutability checks (e.g. by cloning loops for immutable/non-immutable cases)

If you find the level of performance without this optimization acceptable, the VM could offer a Dart_NewUnmodifiableTypedData variant of Dart_NewExternalTypedDataWithFinalizer, which would prevent shaking the unmodifiable views for all Dart programs and forgo the possibility of this optimization. The VM would be able to label unmodifiable views created in this was as pass-by-reference within an isolate group.

Agree.

Compared with

@pragma('vm:entry-point')
ByteData? _wrapUnmodifiableByteData(ByteData? byteData) =>
    byteData == null ? null : UnmodifiableByteDataView(byteData);

it's much better to make VM API: It will allow safe sharing of immutable bytes across isolates (with this PR we cannot do that because there is mutable access to the backing store of the view (i.e. byteData here).

@gaaclarke
Copy link
Member Author

it's much better to make VM API: It will allow safe sharing of immutable bytes across isolates

FWIW that isn't necessary for our usage since these ByteData objects are temporal objects used to marshal from the host platform to Dart. I can't guarantee that's always the case since usage of the public api could subvert that, but probably always.

The CL that introduces immutability checks has already landed (see dart-lang/sdk@c1e67ac). Applications that didn't use Unmodifiable*View were regressed by it if the compiler cannot determine the exact typed data class

Also FWIW, I went back to the benchmark to make sure there wasn't some regression to platform channels when that CL landed. We are writing to buffers when encoding messages and I didn't see any regression.

@zanderso
Copy link
Member

Is there an issue filed to add Dart_NewUnmodifiableTypedData? If not, could you file one @gaaclarke?

@gaaclarke
Copy link
Member Author

Is there an issue filed to add Dart_NewUnmodifiableTypedData? If not, could you file one @gaaclarke?

done: dart-lang/sdk#49784

@dnfield
Copy link
Contributor

dnfield commented Aug 23, 2022

@mkustermann
Copy link
Member

Looks like https://dart-review.googlesource.com/c/sdk/+/255816 landed the new API, can we just use that here?

👍 yes

@gaaclarke gaaclarke force-pushed the eliminate-copies-unmodifiable branch from 09acb28 to 62ae7e8 Compare August 24, 2022 17:24
@gaaclarke
Copy link
Member Author

Looks like https://dart-review.googlesource.com/c/sdk/+/255816 landed the new API, can we just use that here?

Done PTAL. I didn't add the new api to tonic::DartByteData since we don't want to use it without knowing the data is actually const. It don't have the proper context at that point to know if that is the case.

@dnfield
Copy link
Contributor

dnfield commented Aug 24, 2022

Great!

Can you add a test? I'm assuming one that follows this code path and asserts that the byte data is read only or something?

@gaaclarke
Copy link
Member Author

Great!

Can you add a test? I'm assuming one that follows this code path and asserts that the byte data is read only or something?

We already have tests for this this. It doesn't add new functionality, just changes performance. There is a benchmark that tests the new performance in the framework repo: https://github.com/flutter/flutter/blob/5bc4a7612f7fd430933b046aeb9e36a5c7f3916e/dev/benchmarks/platform_channels_benchmarks/lib/main.dart#L236 I'll request an exception from hixie.

@Hixie
Copy link
Contributor

Hixie commented Aug 24, 2022

Looking at the code, am I right in understanding that the buffer you get in Dart is sometimes mutable and sometimes not, based on how big it is?

@gaaclarke
Copy link
Member Author

Looking at the code, am I right in understanding that the buffer you get in Dart is sometimes mutable and sometimes not, based on how big it is?

Yep, a previous version made that consistent but got lost when I adopted the new c api. The only guarantee that the Dart API makes is that a ByteData will be sent, the UnmodifiableByteDataView is a ByteData so there is no problem syntacticly. It is a potential runtime breaking change, but isn't a breaking change since in practice there is no test that cares. These buffers are ephemeral and usually hidden from users but it's not impossible for a user to grab them and do whatever they want with them.

@gaaclarke
Copy link
Member Author

I thought about it a bit more and I think it makes sense to bring back what we had previously and what @Hixie noted. Conditionally being unmodifiable is difficult to test and reason about and by my measurements on iOS the extra cost to wrapping the result is small enough to get lost in the noise of the microbenchmark.

@Hixie
Copy link
Contributor

Hixie commented Aug 24, 2022

FWIW my worry was that people would depend on it being mutable, and then in some rare edge cases the data would be one byte bigger and their code would throw. I can imagine that would be really difficult to debug. Having the API act consistently across axes the developer isn't thinking of tends to lead to a better developer experience.

This suggests a possible test, though: that the buffer behaves the same for small sizes and big sizes.

@gaaclarke gaaclarke force-pushed the eliminate-copies-unmodifiable branch from a15fab1 to 087b3bd Compare August 24, 2022 21:51
@gaaclarke gaaclarke requested a review from dnfield August 24, 2022 22:03
Copy link
Contributor

@dnfield dnfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gaaclarke gaaclarke added the autosubmit Merge PR when tree becomes green via auto submit App label Aug 24, 2022
@auto-submit auto-submit bot merged commit f10ffb4 into flutter:main Aug 24, 2022
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Aug 24, 2022
Copy link
Member

@zanderso zanderso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this land. Just a suggestion for speeding up small messages.

} else {
Dart_Handle mutable_byte_buffer =
tonic::DartByteData::Create(data->GetMapping(), data->GetSize());
Dart_Handle ui_lib = Dart_LookupLibrary(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBC

You may be able to speed this up by using the Dart_PersistentHandle for the library that is already cached in the tonic::DartState object in its tonic::DartClassLibrary field. Then you won't have to allocate the "dart:ui" string on every small message.

I'd suggest also caching the string "_wrapUnmodifiableByteData" in a Dart_PersistentHandle instead of allocating it on every small message. Not sure if tonic has a convenience class for that already, but you can always add a field to flutter::UIDartState if there's a speedup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yeah allocating these strings does add up if this gets chatty. We store a bunch of things like this in platform_configuration.cc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I considered it but didn't know where to put it. FWIW I don't think the wrapping code had a big impact on the microbenchmark, less than 1% of the round trip time for a message. I expected looking up the library to be expensive too but seemed fast. I'll double check everything this afternoon when i go back and look at the official performance numbers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup is fast but the string creation adds up.

@gaaclarke
Copy link
Member Author

Here are the performance results:

For the 1MB payload on Android we have a 22% drop in round trip time, on iOS we have a 36% drop ( https://flutter-flutter-perf.skia.org/e/?numCommits=50&queries=sub_result%3Dplatform_channel_basic_binary_2host_1MB&begin=1661272772 ).

For the small payload tests it was hard to tell without better statistics. Now that we have a few runs, comparing the median time for the last 40 runs to the 10 following runs it looks like there is a regression of 11.5% on Android, 7% on iOS. (https://flutter-flutter-perf.skia.org/e/?begin=1661272772&queries=sub_result%3Dplatform_channel_basic_standard_2host_small&requestType=0) I think it's worth looking into caching the wrapping function as Zach suggested.

GaryQian pushed a commit to GaryQian/engine that referenced this pull request Aug 25, 2022
@gaaclarke
Copy link
Member Author

I tried to cache the closure to invoke but didn't notice any difference in the performance. I looked at the ui_benchmarks target and the platform channel benchmarks on iOS. I just didn't see any difference in the performance. Furthermore I couldn't reproduce the regression I see in the skiaperf results locally.

Here is the caching code I was testing: 70f545f

I'm going to stop fiddling with it since I can't measure any difference locally.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

autosubmit Merge PR when tree becomes green via auto submit App

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants