-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic support for profiling Pulley #10034
Add basic support for profiling Pulley #10034
Conversation
This commit adds basic support for profiling the Pulley interpreter. This is partially achievable previously through the use of native profilers, but the downside of that approach is that you can find hot instructions but it's not clear in what context the hot instructions are being executed nor what functions are hot. The goal of this profiler is to show pulley bytecode and time spent in bytecode itself to better understand the shape of code around a hot instruction to identify new macro opcodes for example. The general structure of this new profiler is: * There is a compile-time feature for Pulley which is off-by-default where, when enabled, Pulley will record its current program counter into an `AtomicUsize` before each instruction. * When the CLI has `--profile pulley` Wasmtime will spawn a sampling thread in the same process which will periodically read from this `AtomicUsize` to record where the program is currently executing. * The Pulley profiler additionally records all bytecode through the use of the `ProfilingAgent` trait to ensure that the recording has access to all bytecode as well. * Samples are taken throughout the process and emitted to a `pulley-$pid.data` file. This file is then interpreted and printed by an "example" program `profiler-html.rs` in the `pulley/examples` directory. The end result is that hot functions of Pulley bytecode can be seen and instructions are annotated with how frequently they were executed. This enables finding hot loops and understanding more about the whole loop, bytecodes that were selected, and such.
type Handler = fn(Interpreter<'_>) -> Done; | ||
/// ABI signature of each opcode handler. | ||
/// | ||
/// Note that this "explodes" the internals of `Interpreter` to individual | ||
/// arguments to help get them all into registers. | ||
type Handler = fn(&mut MachineState, UnsafeBytecodeStream, ExecutingPcRef<'_>) -> Done; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll note that this change was done to ensure/guarantee that these three components of Interpreter
are passed in registers. I was worried about crossing a threshold where if Interpeter
got too big it would be passed by-ref instead of "exploded" into components like we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also note that ExecutingPcRef
is a zero-sized-type when profile
is disabled, otherwise it's a pointer-large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super excited for this!
type Handler = fn(Interpreter<'_>) -> Done; | ||
/// ABI signature of each opcode handler. | ||
/// | ||
/// Note that this "explodes" the internals of `Interpreter` to individual | ||
/// arguments to help get them all into registers. | ||
type Handler = fn(&mut MachineState, UnsafeBytecodeStream, ExecutingPcRef<'_>) -> Done; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks for the explanation.
* Add basic support for profiling Pulley This commit adds basic support for profiling the Pulley interpreter. This is partially achievable previously through the use of native profilers, but the downside of that approach is that you can find hot instructions but it's not clear in what context the hot instructions are being executed nor what functions are hot. The goal of this profiler is to show pulley bytecode and time spent in bytecode itself to better understand the shape of code around a hot instruction to identify new macro opcodes for example. The general structure of this new profiler is: * There is a compile-time feature for Pulley which is off-by-default where, when enabled, Pulley will record its current program counter into an `AtomicUsize` before each instruction. * When the CLI has `--profile pulley` Wasmtime will spawn a sampling thread in the same process which will periodically read from this `AtomicUsize` to record where the program is currently executing. * The Pulley profiler additionally records all bytecode through the use of the `ProfilingAgent` trait to ensure that the recording has access to all bytecode as well. * Samples are taken throughout the process and emitted to a `pulley-$pid.data` file. This file is then interpreted and printed by an "example" program `profiler-html.rs` in the `pulley/examples` directory. The end result is that hot functions of Pulley bytecode can be seen and instructions are annotated with how frequently they were executed. This enables finding hot loops and understanding more about the whole loop, bytecodes that were selected, and such. * Add missing source file * Check the profile-pulley feature in CI * Miscellaneous fixes for CI * Fix type-checking of `become` on nightly Rust * Fix more misc CI issues * Fix dispatch in tail loop * Update test expectations * Review comments
This commit adds basic support for profiling the Pulley interpreter. This is partially achievable previously through the use of native profilers, but the downside of that approach is that you can find hot instructions but it's not clear in what context the hot instructions are being executed nor what functions are hot. The goal of this profiler is to show pulley bytecode and time spent in bytecode itself to better understand the shape of code around a hot instruction to identify new macro opcodes for example.
The general structure of this new profiler is:
There is a compile-time feature for Pulley which is off-by-default where, when enabled, Pulley will record its current program counter into an
AtomicUsize
before each instruction.When the CLI has
--profile pulley
Wasmtime will spawn a sampling thread in the same process which will periodically read from thisAtomicUsize
to record where the program is currently executing.The Pulley profiler additionally records all bytecode through the use of the
ProfilingAgent
trait to ensure that the recording has access to all bytecode as well.Samples are taken throughout the process and emitted to a
pulley-$pid.data
file. This file is then interpreted and printed by an "example" programprofiler-html.rs
in thepulley/examples
directory.The end result is that hot functions of Pulley bytecode can be seen and instructions are annotated with how frequently they were executed. This enables finding hot loops and understanding more about the whole loop, bytecodes that were selected, and such.