Skip to content

VM/ia32: Bad register allocation in _StringBase._createOneByteString if it is inlined into createFromCharCodes #37800

@alexmarkov

Description

@alexmarkov

On IA32, Base64Encoder golem benchmark:

Base64Encoder(RunTime): 47.596634102808196 us.

If _StringBase._createOneByteString is inlined into _StringBase.createFromCharCodes, this slows down benchmark:

Base64Encoder(RunTime): 63.26622740644672 us.

Hot loop in _StringBase._createOneByteString:

0xf68add7b |   | 0.95% (25) | cmp esp,[esi+0x24] |  
0xf68add7e |   |            | jna 0xf68ade4e |  
0xf68add84 |   | 1.68% (44) | cmp ebx,edi |  
0xf68add86 |   |            | jge 0xf68addaa |  
0xf68add8c |   | 0.38% (10) | mov eax,[ebp+0xc] |  
0xf68add8f |   | 0.91% (24) | add eax,ebx |  
0xf68add91 |   | 1.52% (40) | mov edi,eax |  
0xf68add93 |   | 1.68% (44) | sar edi,1 |  
0xf68add95 |   | 0.65% (17) | movzxb eax,[ecx+edi*1+0xb] |  
0xf68add9a |   | 2.48% (65) | mov edi,ebx |  
0xf68add9c |   | 0.84% (22) | sar edi,1 |  
0xf68add9e |   | 1.03% (27) | movb [edx+edi*1+0xb],eax |  
0xf68adda2 |   | 1.60% (42) | add ebx,2 |  
0xf68adda5 |   | 0.42% (11) | mov edi,[ebp+0x8] |  
0xf68adda8 |   | 0.72% (19) | jmp 0xf68add7b

The same loop if _StringBase._createOneByteString is inlined:

0xf686f001 |   | 2.86% (74)  | cmp esp,[esi+0x24] |  
0xf686f004 |   |             | jna 0xf686f4ec |  
0xf686f00a |   |             | cmp eax,ebx |  
0xf686f00c |   |             | jge 0xf686f09c |  
0xf686f012 |   |             | mov ecx,[ebp+0x10] |  
0xf686f015 |   | 0.04% (1)   | add ecx,eax |  
0xf686f017 |   | 2.74% (71)  | mov ebx,ecx |  
0xf686f019 |   |             | sar ebx,1 |  
0xf686f01b |   |             | movzxb ecx,[edi+ebx*1+0xb] |  
0xf686f020 |   | 0.23% (6)   | xchg ecx,eax |  
0xf686f022 |   | 3.98% (103) | mov ebx,ecx |  
0xf686f024 |   |             | sar ebx,1 |  
0xf686f026 |   | 2.12% (55)  | movb [edx+ebx*1+0xb],eax |  
0xf686f02a |   | 14.90% (386)| add ecx,2 |  
0xf686f02d |   | 0.08% (2)   | mov eax,ecx |  
0xf686f02f |   |             | mov ebx,[ebp-0xc] |  
0xf686f032 |   |             | jmp 0xf686f001

The following patch can be used to reproduce the problem:

diff --git a/runtime/lib/string_patch.dart b/runtime/lib/string_patch.dart
index 6bf950f911..48c53770bd 100644
--- a/runtime/lib/string_patch.dart
+++ b/runtime/lib/string_patch.dart
@@ -219,6 +219,7 @@ abstract class _StringBase implements String {
     return createFromCharCodes(charCodeList, 0, length, bits);
   }
 
+  @pragma("vm:prefer-inline")
   static String _createOneByteString(List<int> charCodes, int start, int len) {
     // It's always faster to do this in Dart than to call into the runtime.
     var s = _OneByteString._allocate(len);

Note that _StringBase._createOneByteString satisfies our inlining_callee_call_sites_threshold heuristic if all integer operations are recognized. Currently it is not inlined because not all integer operations are recognized (there is a remaining unrecognized call). However, with bytecode all integer operations are replaced with BinarySmiOp and _StringBase._createOneByteString is inlined causing performance regression.

/cc @mraleph @mkustermann @aartbik

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-vmUse area-vm for VM related issues, including code coverage, and the AOT and JIT backends.closed-not-plannedClosed as we don't intend to take action on the reported issuetype-performanceIssue relates to performance or code size

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions