Skip to content

Transform Propagation Optimization: Static Subtree Marking#18589

Merged
alice-i-cecile merged 15 commits intobevyengine:mainfrom
aevyrie:dirty-trees
Mar 30, 2025
Merged

Transform Propagation Optimization: Static Subtree Marking#18589
alice-i-cecile merged 15 commits intobevyengine:mainfrom
aevyrie:dirty-trees

Conversation

@aevyrie
Copy link
Member

@aevyrie aevyrie commented Mar 28, 2025

Objective

Solution

  • Mark hierarchy subtrees with dirty bits to avoid transform propagation where not needed
  • This causes a performance regression when spawning many entities, or when the scene is entirely dynamic.
  • This results in massive speedups for largely static scenes.
  • In the future we could allow the user to change this behavior, or add some threshold based on how dynamic the scene is?

Testing

  • Caldera Hotel scene

@alice-i-cecile alice-i-cecile added this to the 0.16 milestone Mar 28, 2025
@ThierryBerger
Copy link
Member

🤩

@IceSentry IceSentry added C-Performance A change motivated by improving speed, memory usage or compile times A-Transform Translations, rotations and scales S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Mar 28, 2025
Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking down the bug and getting the fix in <3 I wasn't pleased to have to revert this!

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Mar 30, 2025
@alice-i-cecile alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Mar 30, 2025
Merged via the queue into bevyengine:main with commit 8130b22 Mar 30, 2025
40 checks passed
mockersf pushed a commit that referenced this pull request Mar 30, 2025
# Objective

- Optimize static scene performance by marking unchanged subtrees.
-
[bef0209](bef0209)
fixes #18255 and #18363.
- Closes #18365 
- Includes change from #18321

## Solution

- Mark hierarchy subtrees with dirty bits to avoid transform propagation
where not needed
- This causes a performance regression when spawning many entities, or
when the scene is entirely dynamic.
- This results in massive speedups for largely static scenes.
- In the future we could allow the user to change this behavior, or add
some threshold based on how dynamic the scene is?

## Testing

- Caldera Hotel scene
mockersf added a commit to mockersf/bevy that referenced this pull request Apr 6, 2025
github-merge-queue bot pushed a commit that referenced this pull request Dec 30, 2025
# Objective

- Follow up from previous transform optimization (#18589), make the
`mark_dirty_trees` system more intelligent - don't run this expensive
static scene optimization for dynamic scenes.
- Using a threshold was mentioned as a follow up in that PR, and we also
want this threshold to be user-configurable.
- This was not implemented previously because the optimizations were
still large improvements even in dynamic scenes thanks to the improved
parallelism #17840

## Solution

- Don't run static scene optimization (dirty tree tracking) for very
dynamic scenes - defined here as scenes where more than 30% of objects
have their `Transform` updated.
- This is configurable with a percentage threshold, or it can be
unconditionally enabled or disabled when setting to `0.0` or `1.0` to
avoid the cost of computing the threshold.
- For dynamic scenes, this makes transform prop much faster, twice as
fast in the stress tests shown here.

## Testing

transform_hierarchy stress tests, all of these cases spawn about a
quarter million entities:

- humanoids_active - dynamic scene that should be faster than `main`:
<img width="609" height="395" alt="image"
src="https://github.com/user-attachments/assets/bf3d6b93-aa09-4440-b8ac-18af7e46a00f"
/>

- humanoids_inactive - static scene that should be unchanged from
`main`:
<img width="631" height="377" alt="image"
src="https://github.com/user-attachments/assets/a0306109-600b-4cdd-a217-5cc15e269bca"
/>

- humanoids_mixed - half dynamic scene that should be faster than `main`
<img width="604" height="372" alt="image"
src="https://github.com/user-attachments/assets/2751ece2-d4b9-4daa-af24-fe379eaf75b2"
/>

- large_tree - dynamic scene (50% of entities are moved) we expect to
see improvements
<img width="665" height="371" alt="image"
src="https://github.com/user-attachments/assets/c6b08abe-eb1d-44fb-be36-457f9d5ba78e"
/>
mockersf pushed a commit that referenced this pull request Dec 30, 2025
# Objective

- Follow up from previous transform optimization (#18589), make the
`mark_dirty_trees` system more intelligent - don't run this expensive
static scene optimization for dynamic scenes.
- Using a threshold was mentioned as a follow up in that PR, and we also
want this threshold to be user-configurable.
- This was not implemented previously because the optimizations were
still large improvements even in dynamic scenes thanks to the improved
parallelism #17840

## Solution

- Don't run static scene optimization (dirty tree tracking) for very
dynamic scenes - defined here as scenes where more than 30% of objects
have their `Transform` updated.
- This is configurable with a percentage threshold, or it can be
unconditionally enabled or disabled when setting to `0.0` or `1.0` to
avoid the cost of computing the threshold.
- For dynamic scenes, this makes transform prop much faster, twice as
fast in the stress tests shown here.

## Testing

transform_hierarchy stress tests, all of these cases spawn about a
quarter million entities:

- humanoids_active - dynamic scene that should be faster than `main`:
<img width="609" height="395" alt="image"
src="https://github.com/user-attachments/assets/bf3d6b93-aa09-4440-b8ac-18af7e46a00f"
/>

- humanoids_inactive - static scene that should be unchanged from
`main`:
<img width="631" height="377" alt="image"
src="https://github.com/user-attachments/assets/a0306109-600b-4cdd-a217-5cc15e269bca"
/>

- humanoids_mixed - half dynamic scene that should be faster than `main`
<img width="604" height="372" alt="image"
src="https://github.com/user-attachments/assets/2751ece2-d4b9-4daa-af24-fe379eaf75b2"
/>

- large_tree - dynamic scene (50% of entities are moved) we expect to
see improvements
<img width="665" height="371" alt="image"
src="https://github.com/user-attachments/assets/c6b08abe-eb1d-44fb-be36-457f9d5ba78e"
/>
mockersf pushed a commit that referenced this pull request Dec 30, 2025
# Objective

- Follow up from previous transform optimization (#18589), make the
`mark_dirty_trees` system more intelligent - don't run this expensive
static scene optimization for dynamic scenes.
- Using a threshold was mentioned as a follow up in that PR, and we also
want this threshold to be user-configurable.
- This was not implemented previously because the optimizations were
still large improvements even in dynamic scenes thanks to the improved
parallelism #17840

## Solution

- Don't run static scene optimization (dirty tree tracking) for very
dynamic scenes - defined here as scenes where more than 30% of objects
have their `Transform` updated.
- This is configurable with a percentage threshold, or it can be
unconditionally enabled or disabled when setting to `0.0` or `1.0` to
avoid the cost of computing the threshold.
- For dynamic scenes, this makes transform prop much faster, twice as
fast in the stress tests shown here.

## Testing

transform_hierarchy stress tests, all of these cases spawn about a
quarter million entities:

- humanoids_active - dynamic scene that should be faster than `main`:
<img width="609" height="395" alt="image"
src="https://github.com/user-attachments/assets/bf3d6b93-aa09-4440-b8ac-18af7e46a00f"
/>

- humanoids_inactive - static scene that should be unchanged from
`main`:
<img width="631" height="377" alt="image"
src="https://github.com/user-attachments/assets/a0306109-600b-4cdd-a217-5cc15e269bca"
/>

- humanoids_mixed - half dynamic scene that should be faster than `main`
<img width="604" height="372" alt="image"
src="https://github.com/user-attachments/assets/2751ece2-d4b9-4daa-af24-fe379eaf75b2"
/>

- large_tree - dynamic scene (50% of entities are moved) we expect to
see improvements
<img width="665" height="371" alt="image"
src="https://github.com/user-attachments/assets/c6b08abe-eb1d-44fb-be36-457f9d5ba78e"
/>
mockersf pushed a commit that referenced this pull request Dec 30, 2025
# Objective

- Follow up from previous transform optimization (#18589), make the
`mark_dirty_trees` system more intelligent - don't run this expensive
static scene optimization for dynamic scenes.
- Using a threshold was mentioned as a follow up in that PR, and we also
want this threshold to be user-configurable.
- This was not implemented previously because the optimizations were
still large improvements even in dynamic scenes thanks to the improved
parallelism #17840

## Solution

- Don't run static scene optimization (dirty tree tracking) for very
dynamic scenes - defined here as scenes where more than 30% of objects
have their `Transform` updated.
- This is configurable with a percentage threshold, or it can be
unconditionally enabled or disabled when setting to `0.0` or `1.0` to
avoid the cost of computing the threshold.
- For dynamic scenes, this makes transform prop much faster, twice as
fast in the stress tests shown here.

## Testing

transform_hierarchy stress tests, all of these cases spawn about a
quarter million entities:

- humanoids_active - dynamic scene that should be faster than `main`:
<img width="609" height="395" alt="image"
src="https://github.com/user-attachments/assets/bf3d6b93-aa09-4440-b8ac-18af7e46a00f"
/>

- humanoids_inactive - static scene that should be unchanged from
`main`:
<img width="631" height="377" alt="image"
src="https://github.com/user-attachments/assets/a0306109-600b-4cdd-a217-5cc15e269bca"
/>

- humanoids_mixed - half dynamic scene that should be faster than `main`
<img width="604" height="372" alt="image"
src="https://github.com/user-attachments/assets/2751ece2-d4b9-4daa-af24-fe379eaf75b2"
/>

- large_tree - dynamic scene (50% of entities are moved) we expect to
see improvements
<img width="665" height="371" alt="image"
src="https://github.com/user-attachments/assets/c6b08abe-eb1d-44fb-be36-457f9d5ba78e"
/>
mockersf pushed a commit that referenced this pull request Dec 30, 2025
# Objective

- Follow up from previous transform optimization (#18589), make the
`mark_dirty_trees` system more intelligent - don't run this expensive
static scene optimization for dynamic scenes.
- Using a threshold was mentioned as a follow up in that PR, and we also
want this threshold to be user-configurable.
- This was not implemented previously because the optimizations were
still large improvements even in dynamic scenes thanks to the improved
parallelism #17840

## Solution

- Don't run static scene optimization (dirty tree tracking) for very
dynamic scenes - defined here as scenes where more than 30% of objects
have their `Transform` updated.
- This is configurable with a percentage threshold, or it can be
unconditionally enabled or disabled when setting to `0.0` or `1.0` to
avoid the cost of computing the threshold.
- For dynamic scenes, this makes transform prop much faster, twice as
fast in the stress tests shown here.

## Testing

transform_hierarchy stress tests, all of these cases spawn about a
quarter million entities:

- humanoids_active - dynamic scene that should be faster than `main`:
<img width="609" height="395" alt="image"
src="https://github.com/user-attachments/assets/bf3d6b93-aa09-4440-b8ac-18af7e46a00f"
/>

- humanoids_inactive - static scene that should be unchanged from
`main`:
<img width="631" height="377" alt="image"
src="https://github.com/user-attachments/assets/a0306109-600b-4cdd-a217-5cc15e269bca"
/>

- humanoids_mixed - half dynamic scene that should be faster than `main`
<img width="604" height="372" alt="image"
src="https://github.com/user-attachments/assets/2751ece2-d4b9-4daa-af24-fe379eaf75b2"
/>

- large_tree - dynamic scene (50% of entities are moved) we expect to
see improvements
<img width="665" height="371" alt="image"
src="https://github.com/user-attachments/assets/c6b08abe-eb1d-44fb-be36-457f9d5ba78e"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Transform Translations, rotations and scales C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate static tree marking for optimizing transform propagation Transform propagation ignoring earlier branches

5 participants