Skip to content

[web][dart2wasm] High-frequency integer boxing & GC churn in Skwasm text layout (paragraph.dart & line_breaker.dart) #186972

@kevmoo

Description

@kevmoo

While profiling an animating, heavy widget-churn web benchmark under the WebAssembly backend (dart2wasm + skwasm), we identified a critical performance bottleneck and significant garbage collection (GC) churn in the text measurement and rendering pipeline.

The issue stems from how primitive integers are dynamically boxed into heap-allocated objects ($BoxedInt / struct allocations) within high-frequency, character-by-character loops during paragraph layout and line-breaking. These operations are executed on every single character of text laid out during invalidations, multiplying allocation pressure and forcing continuous GC sweeps that introduce severe rendering jank.

This issue relates closely to:

  • flutter/flutter#170889 (Picture cleanup (we think) causes jank on animating web apps), where we tracked high-overhead SkwasmFinalizationRegistry finalizer loops crossing expensive JS-to-Wasm boundaries on GC sweeps.
  • flutter/flutter#172187 ([skwasm] Decrease reliance on finalizers/GC), which restructured layout and style classes so they explicitly dispose of native resources during build cycles rather than waiting for GC finalizers.
  • flutter/flutter#184858 (memory access out of bounds in paragraphBuilder_dispose during heavy widget rebuilds), which tracked sibling native heap corruption traps.
  • dart-lang/sdk#52714 (dart2wasm List<int> is using fully boxed integer representation), which tracks the absence of unboxed lists for generic collection stores.

📈 Telemetry Performance Profile

Under a 5-second animated widget-churn profiling trace mapping main-thread CPU slices and heap allocation telemetry on a 120Hz display system, the impact of this text rendering churn is highly visible:

  • Average Frame Work Duration: 11.00 ms (Severely exceeding the logical frame budget threshold).
  • Frame Drop Rate: 32.72% (Continuous visible rendering stutter and jank).
  • Dynamic Memory Allocation Churn: 17.40 MB allocated in 5 seconds (~3.48 MB/s).
  • GC Compaction Cost: 280.64 ms lost exclusively to main-thread garbage collection sweeps.
  • Phase Distribution: Layout operations dominated the frame timeline, consuming 3,772.37 ms (nearly 97% of all JavaScript scripting time).

🔍 Deep-Dive Allocation Churn & Boxing Analysis

Statically auditing and scanning the WebAssembly Text (WAT) disassemblies for target functions reveals two specific hot loops where primitive boxing is introduced.


Hotspot #1: Closure-driven Generic Allocations in _addSegmenterData

During layout invalidations, paragraph widgets re-measure, executing _addSegmenterData under the Skwasm builder wrapper.

📍 Source Location

In engine/src/flutter/lib/web_ui/lib/src/engine/skwasm/skwasm_impl/paragraph.dart:

  void _addSegmenterData() => withStackScope((StackScope scope) {
    ...
    final Pointer<Uint32> outSize = scope.allocUint32Array(1);
    final Pointer<Uint8> utf8Data = paragraphBuilderGetUtf8Text(handle, outSize);

    final String text;
    final JSString jsText;
    if (utf8Data == nullptr) {
      text = '';
      jsText = ''.toJS;
    } else {
      // 🚨 The Boxing Churn Engine: List.generate generic callback
      final codeUnitList = List<int>.generate(outSize.value, (int index) => utf8Data[index]);
      text = utf8.decode(codeUnitList);
      ...

⚙️ The Wasm Translation Defect

Because List<T>.generate is a generic factory method taking a generic callback function closure (T generator(int index)), in dart2wasm the generic value placeholder T must represent a boxed reference type on the heap. When translating the closure (int index) => utf8Data[index], the byte value read from native memory must be boxed as a $BoxedInt heap structure before it is packed into the array storage, resulting in one heap allocation per character laid out.

This triggers high-frequency $BoxedInt (struct.new 7) instructions inside the tight iteration block:

(func $_addSegmenterData (;3342;) (type 22) ...
  ...
  loop ;; label = @4
    ...
    struct.get 2 2
    local.get 14
    i32.wrap_i64
    call 14
    i64.extend_i32_u
    local.tee 19
    struct.new $BoxedInt     ;; 🚨 ALLOCATION: Boxing the returned index value of the generic callback
    call 1880
    if ;; label = @6
      ...
    else
      global.get 10480
      i32.const 354
      local.get 19
      struct.new $BoxedInt   ;; 🚨 ALLOCATION: Generic list packer boxing
      call 1880

💡 Workaround / Proposed Fix

We can completely bypass the generic closure boundary by replacing List<int>.generate with a specialized, primitive non-generic Uint8List. Since Uint8List holds unboxed primitive byte fields directly and operates without generic callback functions, dart2wasm translates the copy block to direct, zero-allocation primitive register loop stores:

// ⚡ Zero-Allocation Copy Block
final length = outSize.value;
final codeUnitList = Uint8List(length);
for (var i = 0; i < length; i++) {
  codeUnitList[i] = utf8Data[i];
}
text = utf8.decode(codeUnitList);

Hotspot #2: Generic Set Contains Lookups in breakLinesUsingV8BreakIterator

Line segmentation loops execute character-level metadata matching checks inside the text breaker layout scope.

📍 Source Location

In engine/src/flutter/lib/web_ui/lib/src/engine/text/line_breaker.dart:

const Set<int> _kNewlines = <int>{
  0x000A, // LF
  0x000B, // BK
  ...
};

List<LineBreakFragment> breakLinesUsingV8BreakIterator(...) {
  ...
  while (iterator.next() != -1) {
    final int fragmentEnd = iterator.current().toInt();
    ...
    for (var i = fragmentStart; i < fragmentEnd; i++) {
      final int codeUnit = text.codeUnitAt(i);
      if (_kNewlines.contains(codeUnit)) { // 🚨 The Boxing Churn Engine: Set<int>.contains
        trailingNewlines++;
        trailingSpaces++;
      ...

⚙️ The Wasm Translation Defect

_kNewlines is typed as a generic Set<int>. In Dart's standard library, Set.contains takes an argument typed as Object? value:

bool contains(Object? value);

Because the lookup target represents an unboxed primitive integer (codeUnit), and the lookup parameter is type-erased to Object?, dart2wasm is forced to box codeUnit into a $BoxedInt heap wrapper just to cross the parameter boundary on every single character check.

This causes two heap allocations per character scanned (one lookup for _kNewlines.contains and another for _kSpaces.contains), generating enormous GC overhead in the layout hot path.

💡 Workaround / Proposed Fix

By replacing the generic Set<int> collections with inline zero-cost helper functions using Dart's switch expressions, the compiler translates the metadata checks directly into raw WebAssembly control-flow jump tables. These run strictly on primitive unboxed registers with zero heap allocations:

// ⚡ Zero-Allocation Inline Jump Tables
bool isNewline(int codeUnit) {
  switch (codeUnit) {
    case 0x000A: // LF
    case 0x000B: // BK
    case 0x000C: // BK
    case 0x000D: // CR
    case 0x0085: // NL
    case 0x2028: // BK
    case 0x2029: // BK
      return true;
    default:
      return false;
  }
}

bool isSpace(int codeUnit) {
  switch (codeUnit) {
    case 0x0020: // SP
    case 0x200B: // ZW
      return true;
    default:
      return false;
  }
}

🚀 Steps to Reproduce & Context

  1. Build any animating widget-churn layout containing paragraphs or texts:
    flutter build web --wasm --release
  2. Profile under Headless Chrome, enabling CPU and Heap allocation sampling.
  3. Audit the allocations: notice the massive counts of $BoxedInt allocations and high GC compactions traced back to _addSegmenterData and breakLinesUsingV8BreakIterator.

Metadata

Metadata

Assignees

Labels

P1High-priority issues at the top of the work liste: wasmIssues related to the wasm build of Flutter Web.e: web_skwasmSkwasm rendering backend for webengineflutter/engine related. See also e: labels.platform-webWeb applications specificallyteam-webOwned by Web platform teamtriaged-webTriaged by Web platform team

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions