Skip to content

feat: optimize frame layout for tail-call-only functions#11608

Merged
cfallin merged 2 commits into
bytecodealliance:mainfrom
pnodet:pnodet-11
Apr 10, 2026
Merged

feat: optimize frame layout for tail-call-only functions#11608
cfallin merged 2 commits into
bytecodealliance:mainfrom
pnodet:pnodet-11

Conversation

@pnodet

@pnodet pnodet commented Sep 4, 2025

Copy link
Copy Markdown
Contributor

Reduce frame size from 16 to 8 bytes for functions that only make tail calls (FunctionCalls::TailOnly). This optimization:

@pnodet

pnodet commented Sep 4, 2025

Copy link
Copy Markdown
Contributor Author

@cfallin What do you think of something like this? I only looked into aarch64 for the moment since other ISAs such as x64 s390x looks quite different and more complex to implement.

@cfallin

cfallin commented Sep 4, 2025

Copy link
Copy Markdown
Member

Unfortunately I don't think this is going to work: the stack pointer has to be 16-aligned, and aarch64 will actually trap if memory accesses occur with a misaligned SP.

Furthermore the savings I would expect is not "only push FP, not LR", but "don't push anything at all if the frame is zero-size". This should be the case for tail-calling functions with. no stack storage (spillslots, stackslots or clobbers) and no outgoing argument space.

@pnodet

pnodet commented Sep 4, 2025

Copy link
Copy Markdown
Contributor Author

Don't debuggers rely on frame pointers for stack traces? Could setting the frame size to 0 hurt debugging/unwinding?

@bjorn3

bjorn3 commented Sep 4, 2025

Copy link
Copy Markdown
Contributor

Debuggers and profilers should handle missing stack frames for leaf functions already. And besides debuggers actually generally use .eh_frame for stack unwinding, only falling back to frame pointers when .eh_frame is not available.

@cfallin

cfallin commented Sep 4, 2025

Copy link
Copy Markdown
Member

Right -- we already omit frame pointers for functions that are truly leaf functions (no calls at all, with no frame storage); this is a common optimization.

In Wasmtime, where we use our own stack-walking logic and unwinder and want simplicity/robustness, we configure Cranelift never to omit frame pointers; so this optimization largely applies to other uses of Cranelift, like bjorn3's cg_clif.

@pnodet

pnodet commented Sep 4, 2025

Copy link
Copy Markdown
Contributor Author

Then could it be safe to have something like this?

        // Compute linkage frame size.
        let setup_area_size = if flags.preserve_frame_pointers()
            // The function arguments that are passed on the stack are addressed
            // relative to the Frame Pointer.
            || flags.unwind_info()
            || incoming_args_size > 0
            || clobber_size > 0
            || fixed_frame_storage_size > 0
        {
            16 // FP, LR
        } else {
            match function_calls {
                FunctionCalls::Regular => 16,
                FunctionCalls::None => 0,
-               FunctionCalls::TailOnly => 8,
+               FunctionCalls::TailOnly => 0,
            }
        };

@cfallin

cfallin commented Sep 4, 2025

Copy link
Copy Markdown
Member

I think you'll want to check the tail args and outgoing args size as well (the other parameters to compute_frame_layout) -- basically, if any part of the frame needs to exist, then we need to do the FP setup even if we only have tail calls.

@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. labels Sep 4, 2025
@github-actions github-actions Bot added the isle Related to the ISLE domain-specific language label Apr 8, 2026
@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown

Subscribe to Label Action

cc @cfallin, @fitzgen

Details This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@pnodet pnodet marked this pull request as ready for review April 9, 2026 08:29
@pnodet pnodet requested a review from a team as a code owner April 9, 2026 08:29
@pnodet pnodet requested review from alexcrichton and removed request for a team April 9, 2026 08:29
@alexcrichton alexcrichton requested review from cfallin and fitzgen and removed request for alexcrichton and fitzgen April 9, 2026 14:49
@alexcrichton

Copy link
Copy Markdown
Member

Gonna re-roll review to @cfallin as he's got more context on this area than me

pnodet added 2 commits April 10, 2026 14:37
Reduce frame size from 16 to 8 bytes for functions that only make tail
calls (FunctionCalls::TailOnly). This optimization:

- Uses single register operations (str/ldr fp) instead of pair
operations (stp/ldp fp,lr)
- Applies when no other frame requirements exist (no frame pointers,
stack args, etc.)
- Is instruction-based: functions containing only return_call
instructions get optimized
- Maintains ABI compatibility and includes comprehensive test coverage
Update AArch64 return-call and pointer-auth filetest outputs to match
frameless tail-call lowering from the previous frame-layout change. This
restores arm64 filetest stability by checking the emitted direct branch
sequences.

@cfallin cfallin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a reasonable optimization to me; thanks.

@cfallin cfallin added this pull request to the merge queue Apr 10, 2026
Merged via the queue into bytecodealliance:main with commit a591e3b Apr 10, 2026
48 checks passed
@pnodet pnodet deleted the pnodet-11 branch April 11, 2026 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants