feat: optimize frame layout for tail-call-only functions#11608
Conversation
|
@cfallin What do you think of something like this? I only looked into aarch64 for the moment since other ISAs such as x64 s390x looks quite different and more complex to implement. |
|
Unfortunately I don't think this is going to work: the stack pointer has to be 16-aligned, and aarch64 will actually trap if memory accesses occur with a misaligned SP. Furthermore the savings I would expect is not "only push FP, not LR", but "don't push anything at all if the frame is zero-size". This should be the case for tail-calling functions with. no stack storage (spillslots, stackslots or clobbers) and no outgoing argument space. |
|
Don't debuggers rely on frame pointers for stack traces? Could setting the frame size to 0 hurt debugging/unwinding? |
|
Debuggers and profilers should handle missing stack frames for leaf functions already. And besides debuggers actually generally use .eh_frame for stack unwinding, only falling back to frame pointers when .eh_frame is not available. |
|
Right -- we already omit frame pointers for functions that are truly leaf functions (no calls at all, with no frame storage); this is a common optimization. In Wasmtime, where we use our own stack-walking logic and unwinder and want simplicity/robustness, we configure Cranelift never to omit frame pointers; so this optimization largely applies to other uses of Cranelift, like bjorn3's |
|
Then could it be safe to have something like this? // Compute linkage frame size.
let setup_area_size = if flags.preserve_frame_pointers()
// The function arguments that are passed on the stack are addressed
// relative to the Frame Pointer.
|| flags.unwind_info()
|| incoming_args_size > 0
|| clobber_size > 0
|| fixed_frame_storage_size > 0
{
16 // FP, LR
} else {
match function_calls {
FunctionCalls::Regular => 16,
FunctionCalls::None => 0,
- FunctionCalls::TailOnly => 8,
+ FunctionCalls::TailOnly => 0,
}
}; |
|
I think you'll want to check the tail args and outgoing args size as well (the other parameters to |
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
Gonna re-roll review to @cfallin as he's got more context on this area than me |
Reduce frame size from 16 to 8 bytes for functions that only make tail calls (FunctionCalls::TailOnly). This optimization: - Uses single register operations (str/ldr fp) instead of pair operations (stp/ldp fp,lr) - Applies when no other frame requirements exist (no frame pointers, stack args, etc.) - Is instruction-based: functions containing only return_call instructions get optimized - Maintains ABI compatibility and includes comprehensive test coverage
Update AArch64 return-call and pointer-auth filetest outputs to match frameless tail-call lowering from the previous frame-layout change. This restores arm64 filetest stability by checking the emitted direct branch sequences.
cfallin
left a comment
There was a problem hiding this comment.
This looks like a reasonable optimization to me; thanks.
Reduce frame size from 16 to 8 bytes for functions that only make tail calls (FunctionCalls::TailOnly). This optimization: