Skip to content

Conversation

@mcourteaux
Copy link
Contributor

A small cosmetic change to improve the clutter looking like this:

Lowering pass runtimes:
 0.00251 ms : Lowering after asserting that all split factors are positive:
 0.00689 ms : Lowering after removing extern loops:
 0.01272 ms : Lowering after adding atomic mutex allocation:
 0.01365 ms : Lowering after allocation bounds inference:
 0.01827 ms : Lowering after sliding window:
 0.01913 ms : Lowering after injecting tracing:
 0.027019 ms : Lowering after injecting parameter checks:
 0.03292 ms : Lowering after injecting early frees:
 0.03449 ms : Lowering after selecting a GPU API for extern stages:
 0.03472 ms : Lowering after selecting fast math functions:
 0.03795 ms : Lowering after selecting a GPU API:
 0.03798 ms : Lowering after discarding safe promises:
 0.03849 ms : Lowering after bounding constant extent loops:
 0.04144 ms : Lowering after dynamically skipping stages:
 0.04768 ms : Lowering after removing code that depends on undef values:
 0.04792 ms : Lowering after lowering unsafe promises:
 0.04816 ms : Lowering after loop trimming:
 0.05688 ms : Lowering after storage folding:
 0.064049 ms : Lowering after flattening nested ramps:
 0.064189 ms : Lowering after hoisting loop invariant if statements:
 0.0659 ms : Lowering after hoisting prefetches:
 0.07299 ms : Lowering after destructuring tuple-valued realizations:
 0.08953 ms : Lowering after injecting debug_to_file calls:
 0.09103 ms : Lowering after reduce prefetch dimension:
 0.0963 ms : Lowering after uniquifying variable names:
 0.098219 ms : Lowering after unpacking buffer arguments:
 0.10503 ms : Lowering after simplifying correlated differences:
 0.11927 ms : Lowering after storage flattening:
 0.160359 ms : Lowering after forking asynchronous producers:
 0.18092 ms : Lowering after injecting per-block gpu synchronization:
 0.20263 ms : Lowering after canonicalizing GPU var names:
 0.20281 ms : Lowering after injecting host <-> dev buffer copies:
 0.25389 ms : Lowering after unrolling:
 0.260409 ms : Lowering after simplifying correlated differences:
 0.269899 ms : Lowering after first simplification:
 0.289159 ms : Lowering after injecting profiling:
 0.334129 ms : Lowering after bounding small allocations:
 0.375589 ms : Lowering after bounding small realizations:
 0.380779 ms : Lowering after injecting prefetches:
 0.40241 ms : Lowering after clamping unsafe data-dependent accesses
 0.411829 ms : Lowering after staging strided loads:
 0.494578 ms : Lowering after rewriting vector interleavings:
 0.541228 ms : Lowering after computation bounds inference:
 0.671148 ms : Lowering after vectorizing:
 0.691018 ms : Lowering after injecting image checks:
 0.695249 ms : Lowering after creating initial loop nests:
 0.727429 ms : Lowering after simplifying correlated differences:
 0.811168 ms : Lowering after legalizing vectors:
 1.03359 ms : Lowering after finding intrinsics:
 1.04435 ms : Lowering after CSE:
 1.10909 ms : Lowering after second simplification:
 1.16942 ms : Lowering after partitioning loops:
 2.4466 ms : Lowering after removing dead allocations and hoisting loop invariants:

to looking like this:

Lowering pass runtimes:
     0.003 ms : Lowering after asserting that all split factors are positive:
     0.008 ms : Lowering after removing extern loops:
     0.015 ms : Lowering after adding atomic mutex allocation:
     0.019 ms : Lowering after allocation bounds inference:
     0.022 ms : Lowering after injecting tracing:
     0.023 ms : Lowering after sliding window:
     0.030 ms : Lowering after bounding constant extent loops:
     0.032 ms : Lowering after legalizing vectors:
     0.037 ms : Lowering after injecting early frees:
     0.039 ms : Lowering after selecting fast math functions:
     0.044 ms : Lowering after injecting parameter checks:
     0.048 ms : Lowering after lowering unsafe promises:
     0.052 ms : Lowering after discarding safe promises:
     0.055 ms : Lowering after storage folding:
     0.056 ms : Lowering after dynamically skipping stages:
     0.062 ms : Lowering after removing code that depends on undef values:
     0.063 ms : Lowering after reduce prefetch dimension:
     0.067 ms : Lowering after flattening nested ramps:
     0.068 ms : Lowering after hoisting prefetches:
     0.078 ms : Lowering after hoisting loop invariant if statements:
     0.091 ms : Lowering after destructuring tuple-valued realizations:
     0.098 ms : Lowering after injecting debug_to_file calls:
     0.106 ms : Lowering after unpacking buffer arguments:
     0.130 ms : Lowering after storage flattening:
     0.141 ms : Lowering after uniquifying variable names:
     0.146 ms : Lowering after simplifying correlated differences:
     0.181 ms : Lowering after forking asynchronous producers:
     0.254 ms : Lowering after unrolling:
     0.308 ms : Lowering after simplifying correlated differences:
     0.360 ms : Lowering after injecting profiling:
     0.370 ms : Lowering after first simplification:
     0.393 ms : Lowering after clamping unsafe data-dependent accesses
     0.425 ms : Lowering after injecting prefetches:
     0.504 ms : Lowering after bounding small realizations:
     0.538 ms : Lowering after rewriting vector interleavings:
     0.545 ms : Lowering after computation bounds inference:
     0.570 ms : Lowering after bounding small allocations:
     0.726 ms : Lowering after staging strided loads:
     0.801 ms : Lowering after simplifying correlated differences:
     0.814 ms : Lowering after vectorizing:
     0.827 ms : Lowering after creating initial loop nests:
     1.074 ms : Lowering after second simplification:
     1.172 ms : Lowering after finding intrinsics:
     1.252 ms : Lowering after CSE:
     1.328 ms : Lowering after injecting image checks:
     1.805 ms : Lowering after partitioning loops:
     2.386 ms : Lowering after loop trimming:
     2.998 ms : Lowering after removing dead allocations and hoisting loop invariants:
    21.164 ms in total

Copy link
Member

@alexreinking alexreinking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

@alexreinking alexreinking added the code_cleanup No functional changes. Reformatting, reorganizing, or refactoring existing code. label Aug 14, 2025
@mcourteaux mcourteaux merged commit 045fbd4 into halide:main Aug 14, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code_cleanup No functional changes. Reformatting, reorganizing, or refactoring existing code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants