While profiling an animating, heavy widget-churn web benchmark under the WebAssembly backend (dart2wasm + skwasm), we identified a critical performance bottleneck and significant garbage collection (GC) churn in the text measurement and rendering pipeline.
The issue stems from how primitive integers are dynamically boxed into heap-allocated objects ($BoxedInt / struct allocations) within high-frequency, character-by-character loops during paragraph layout and line-breaking. These operations are executed on every single character of text laid out during invalidations, multiplying allocation pressure and forcing continuous GC sweeps that introduce severe rendering jank.
This issue relates closely to:
- flutter/flutter#170889 (
Picture cleanup (we think) causes jank on animating web apps), where we tracked high-overhead SkwasmFinalizationRegistry finalizer loops crossing expensive JS-to-Wasm boundaries on GC sweeps.
- flutter/flutter#172187 (
[skwasm] Decrease reliance on finalizers/GC), which restructured layout and style classes so they explicitly dispose of native resources during build cycles rather than waiting for GC finalizers.
- flutter/flutter#184858 (
memory access out of bounds in paragraphBuilder_dispose during heavy widget rebuilds), which tracked sibling native heap corruption traps.
- dart-lang/sdk#52714 (
dart2wasm List<int> is using fully boxed integer representation), which tracks the absence of unboxed lists for generic collection stores.
📈 Telemetry Performance Profile
Under a 5-second animated widget-churn profiling trace mapping main-thread CPU slices and heap allocation telemetry on a 120Hz display system, the impact of this text rendering churn is highly visible:
- Average Frame Work Duration:
11.00 ms (Severely exceeding the logical frame budget threshold).
- Frame Drop Rate:
32.72% (Continuous visible rendering stutter and jank).
- Dynamic Memory Allocation Churn:
17.40 MB allocated in 5 seconds (~3.48 MB/s).
- GC Compaction Cost:
280.64 ms lost exclusively to main-thread garbage collection sweeps.
- Phase Distribution: Layout operations dominated the frame timeline, consuming
3,772.37 ms (nearly 97% of all JavaScript scripting time).
🔍 Deep-Dive Allocation Churn & Boxing Analysis
Statically auditing and scanning the WebAssembly Text (WAT) disassemblies for target functions reveals two specific hot loops where primitive boxing is introduced.
Hotspot #1: Closure-driven Generic Allocations in _addSegmenterData
During layout invalidations, paragraph widgets re-measure, executing _addSegmenterData under the Skwasm builder wrapper.
📍 Source Location
In engine/src/flutter/lib/web_ui/lib/src/engine/skwasm/skwasm_impl/paragraph.dart:
void _addSegmenterData() => withStackScope((StackScope scope) {
...
final Pointer<Uint32> outSize = scope.allocUint32Array(1);
final Pointer<Uint8> utf8Data = paragraphBuilderGetUtf8Text(handle, outSize);
final String text;
final JSString jsText;
if (utf8Data == nullptr) {
text = '';
jsText = ''.toJS;
} else {
// 🚨 The Boxing Churn Engine: List.generate generic callback
final codeUnitList = List<int>.generate(outSize.value, (int index) => utf8Data[index]);
text = utf8.decode(codeUnitList);
...
⚙️ The Wasm Translation Defect
Because List<T>.generate is a generic factory method taking a generic callback function closure (T generator(int index)), in dart2wasm the generic value placeholder T must represent a boxed reference type on the heap. When translating the closure (int index) => utf8Data[index], the byte value read from native memory must be boxed as a $BoxedInt heap structure before it is packed into the array storage, resulting in one heap allocation per character laid out.
This triggers high-frequency $BoxedInt (struct.new 7) instructions inside the tight iteration block:
(func $_addSegmenterData (;3342;) (type 22) ...
...
loop ;; label = @4
...
struct.get 2 2
local.get 14
i32.wrap_i64
call 14
i64.extend_i32_u
local.tee 19
struct.new $BoxedInt ;; 🚨 ALLOCATION: Boxing the returned index value of the generic callback
call 1880
if ;; label = @6
...
else
global.get 10480
i32.const 354
local.get 19
struct.new $BoxedInt ;; 🚨 ALLOCATION: Generic list packer boxing
call 1880
💡 Workaround / Proposed Fix
We can completely bypass the generic closure boundary by replacing List<int>.generate with a specialized, primitive non-generic Uint8List. Since Uint8List holds unboxed primitive byte fields directly and operates without generic callback functions, dart2wasm translates the copy block to direct, zero-allocation primitive register loop stores:
// ⚡ Zero-Allocation Copy Block
final length = outSize.value;
final codeUnitList = Uint8List(length);
for (var i = 0; i < length; i++) {
codeUnitList[i] = utf8Data[i];
}
text = utf8.decode(codeUnitList);
Hotspot #2: Generic Set Contains Lookups in breakLinesUsingV8BreakIterator
Line segmentation loops execute character-level metadata matching checks inside the text breaker layout scope.
📍 Source Location
In engine/src/flutter/lib/web_ui/lib/src/engine/text/line_breaker.dart:
const Set<int> _kNewlines = <int>{
0x000A, // LF
0x000B, // BK
...
};
List<LineBreakFragment> breakLinesUsingV8BreakIterator(...) {
...
while (iterator.next() != -1) {
final int fragmentEnd = iterator.current().toInt();
...
for (var i = fragmentStart; i < fragmentEnd; i++) {
final int codeUnit = text.codeUnitAt(i);
if (_kNewlines.contains(codeUnit)) { // 🚨 The Boxing Churn Engine: Set<int>.contains
trailingNewlines++;
trailingSpaces++;
...
⚙️ The Wasm Translation Defect
_kNewlines is typed as a generic Set<int>. In Dart's standard library, Set.contains takes an argument typed as Object? value:
bool contains(Object? value);
Because the lookup target represents an unboxed primitive integer (codeUnit), and the lookup parameter is type-erased to Object?, dart2wasm is forced to box codeUnit into a $BoxedInt heap wrapper just to cross the parameter boundary on every single character check.
This causes two heap allocations per character scanned (one lookup for _kNewlines.contains and another for _kSpaces.contains), generating enormous GC overhead in the layout hot path.
💡 Workaround / Proposed Fix
By replacing the generic Set<int> collections with inline zero-cost helper functions using Dart's switch expressions, the compiler translates the metadata checks directly into raw WebAssembly control-flow jump tables. These run strictly on primitive unboxed registers with zero heap allocations:
// ⚡ Zero-Allocation Inline Jump Tables
bool isNewline(int codeUnit) {
switch (codeUnit) {
case 0x000A: // LF
case 0x000B: // BK
case 0x000C: // BK
case 0x000D: // CR
case 0x0085: // NL
case 0x2028: // BK
case 0x2029: // BK
return true;
default:
return false;
}
}
bool isSpace(int codeUnit) {
switch (codeUnit) {
case 0x0020: // SP
case 0x200B: // ZW
return true;
default:
return false;
}
}
🚀 Steps to Reproduce & Context
- Build any animating widget-churn layout containing paragraphs or texts:
flutter build web --wasm --release
- Profile under Headless Chrome, enabling CPU and Heap allocation sampling.
- Audit the allocations: notice the massive counts of
$BoxedInt allocations and high GC compactions traced back to _addSegmenterData and breakLinesUsingV8BreakIterator.
While profiling an animating, heavy widget-churn web benchmark under the WebAssembly backend (
dart2wasm+skwasm), we identified a critical performance bottleneck and significant garbage collection (GC) churn in the text measurement and rendering pipeline.The issue stems from how primitive integers are dynamically boxed into heap-allocated objects (
$BoxedInt/ struct allocations) within high-frequency, character-by-character loops during paragraph layout and line-breaking. These operations are executed on every single character of text laid out during invalidations, multiplying allocation pressure and forcing continuous GC sweeps that introduce severe rendering jank.This issue relates closely to:
Picture cleanup (we think) causes jank on animating web apps), where we tracked high-overheadSkwasmFinalizationRegistryfinalizer loops crossing expensive JS-to-Wasm boundaries on GC sweeps.[skwasm] Decrease reliance on finalizers/GC), which restructured layout and style classes so they explicitly dispose of native resources during build cycles rather than waiting for GC finalizers.memory access out of bounds in paragraphBuilder_dispose during heavy widget rebuilds), which tracked sibling native heap corruption traps.dart2wasm List<int> is using fully boxed integer representation), which tracks the absence of unboxed lists for generic collection stores.📈 Telemetry Performance Profile
Under a 5-second animated widget-churn profiling trace mapping main-thread CPU slices and heap allocation telemetry on a 120Hz display system, the impact of this text rendering churn is highly visible:
11.00 ms(Severely exceeding the logical frame budget threshold).32.72%(Continuous visible rendering stutter and jank).17.40 MBallocated in 5 seconds (~3.48 MB/s).280.64 mslost exclusively to main-thread garbage collection sweeps.3,772.37 ms(nearly 97% of all JavaScript scripting time).🔍 Deep-Dive Allocation Churn & Boxing Analysis
Statically auditing and scanning the WebAssembly Text (WAT) disassemblies for target functions reveals two specific hot loops where primitive boxing is introduced.
Hotspot #1: Closure-driven Generic Allocations in
_addSegmenterDataDuring layout invalidations, paragraph widgets re-measure, executing
_addSegmenterDataunder the Skwasm builder wrapper.📍 Source Location
In
engine/src/flutter/lib/web_ui/lib/src/engine/skwasm/skwasm_impl/paragraph.dart:⚙️ The Wasm Translation Defect
Because
List<T>.generateis a generic factory method taking a generic callback function closure (T generator(int index)), indart2wasmthe generic value placeholderTmust represent a boxed reference type on the heap. When translating the closure(int index) => utf8Data[index], the byte value read from native memory must be boxed as a$BoxedIntheap structure before it is packed into the array storage, resulting in one heap allocation per character laid out.This triggers high-frequency
$BoxedInt(struct.new 7) instructions inside the tight iteration block:💡 Workaround / Proposed Fix
We can completely bypass the generic closure boundary by replacing
List<int>.generatewith a specialized, primitive non-genericUint8List. SinceUint8Listholds unboxed primitive byte fields directly and operates without generic callback functions,dart2wasmtranslates the copy block to direct, zero-allocation primitive register loop stores:Hotspot #2: Generic Set Contains Lookups in
breakLinesUsingV8BreakIteratorLine segmentation loops execute character-level metadata matching checks inside the text breaker layout scope.
📍 Source Location
In
engine/src/flutter/lib/web_ui/lib/src/engine/text/line_breaker.dart:⚙️ The Wasm Translation Defect
_kNewlinesis typed as a genericSet<int>. In Dart's standard library,Set.containstakes an argument typed asObject? value:Because the lookup target represents an unboxed primitive integer (
codeUnit), and the lookup parameter is type-erased toObject?,dart2wasmis forced to boxcodeUnitinto a$BoxedIntheap wrapper just to cross the parameter boundary on every single character check.This causes two heap allocations per character scanned (one lookup for
_kNewlines.containsand another for_kSpaces.contains), generating enormous GC overhead in the layout hot path.💡 Workaround / Proposed Fix
By replacing the generic
Set<int>collections with inline zero-cost helper functions using Dart's switch expressions, the compiler translates the metadata checks directly into raw WebAssembly control-flow jump tables. These run strictly on primitive unboxed registers with zero heap allocations:🚀 Steps to Reproduce & Context
$BoxedIntallocations and high GC compactions traced back to_addSegmenterDataandbreakLinesUsingV8BreakIterator.