Parallel Transform Propagation#17840
Conversation
|
Getting this up now that it seems to function, marking as draft because it still needs cleanup. |
|
FYI, I'm merging #17815. The methods there are probably worth benching here. |
|
I'd love a before / after tracy histogram BTW. I've seen enough tests on Discord to be convinced that this is markedly (3x or so) faster, but it would be lovely to show in the release notes. |
|
Alright, I am finally going to be able to give this a look. Right out of the gate I find the increase in run variance concerning, but I'm in favor of merging and leaving that as follow up if we can't easily identify the cause. |
If you are referring to my comment, that was about bevy's existing system. It swings wildly +- 1ms. Variance (timing distribution) within a run is consistent for both. Variance between runs in this PR seems better than main. My guess is this is caused by E-cores vs P-cores. The old implementation usually ended up spending most time in a single core, whereas the new one is spread across all available cores in the thread pool. I think this explains why inter-run variance has improved, the work is spread across a mix of P and E cores. |
|
The CI failure was not caused by this PR, seems to be some sort of MESA failure according to discord. |
|
Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to bevyengine/bevy-website#1995 if you'd like to help out. |
# Objective - Follow up from previous transform optimization (#18589), make the `mark_dirty_trees` system more intelligent - don't run this expensive static scene optimization for dynamic scenes. - Using a threshold was mentioned as a follow up in that PR, and we also want this threshold to be user-configurable. - This was not implemented previously because the optimizations were still large improvements even in dynamic scenes thanks to the improved parallelism #17840 ## Solution - Don't run static scene optimization (dirty tree tracking) for very dynamic scenes - defined here as scenes where more than 30% of objects have their `Transform` updated. - This is configurable with a percentage threshold, or it can be unconditionally enabled or disabled when setting to `0.0` or `1.0` to avoid the cost of computing the threshold. - For dynamic scenes, this makes transform prop much faster, twice as fast in the stress tests shown here. ## Testing transform_hierarchy stress tests, all of these cases spawn about a quarter million entities: - humanoids_active - dynamic scene that should be faster than `main`: <img width="609" height="395" alt="image" src="https://github.com/user-attachments/assets/bf3d6b93-aa09-4440-b8ac-18af7e46a00f" /> - humanoids_inactive - static scene that should be unchanged from `main`: <img width="631" height="377" alt="image" src="https://github.com/user-attachments/assets/a0306109-600b-4cdd-a217-5cc15e269bca" /> - humanoids_mixed - half dynamic scene that should be faster than `main` <img width="604" height="372" alt="image" src="https://github.com/user-attachments/assets/2751ece2-d4b9-4daa-af24-fe379eaf75b2" /> - large_tree - dynamic scene (50% of entities are moved) we expect to see improvements <img width="665" height="371" alt="image" src="https://github.com/user-attachments/assets/c6b08abe-eb1d-44fb-be36-457f9d5ba78e" />
# Objective - Follow up from previous transform optimization (#18589), make the `mark_dirty_trees` system more intelligent - don't run this expensive static scene optimization for dynamic scenes. - Using a threshold was mentioned as a follow up in that PR, and we also want this threshold to be user-configurable. - This was not implemented previously because the optimizations were still large improvements even in dynamic scenes thanks to the improved parallelism #17840 ## Solution - Don't run static scene optimization (dirty tree tracking) for very dynamic scenes - defined here as scenes where more than 30% of objects have their `Transform` updated. - This is configurable with a percentage threshold, or it can be unconditionally enabled or disabled when setting to `0.0` or `1.0` to avoid the cost of computing the threshold. - For dynamic scenes, this makes transform prop much faster, twice as fast in the stress tests shown here. ## Testing transform_hierarchy stress tests, all of these cases spawn about a quarter million entities: - humanoids_active - dynamic scene that should be faster than `main`: <img width="609" height="395" alt="image" src="https://github.com/user-attachments/assets/bf3d6b93-aa09-4440-b8ac-18af7e46a00f" /> - humanoids_inactive - static scene that should be unchanged from `main`: <img width="631" height="377" alt="image" src="https://github.com/user-attachments/assets/a0306109-600b-4cdd-a217-5cc15e269bca" /> - humanoids_mixed - half dynamic scene that should be faster than `main` <img width="604" height="372" alt="image" src="https://github.com/user-attachments/assets/2751ece2-d4b9-4daa-af24-fe379eaf75b2" /> - large_tree - dynamic scene (50% of entities are moved) we expect to see improvements <img width="665" height="371" alt="image" src="https://github.com/user-attachments/assets/c6b08abe-eb1d-44fb-be36-457f9d5ba78e" />
# Objective - Follow up from previous transform optimization (#18589), make the `mark_dirty_trees` system more intelligent - don't run this expensive static scene optimization for dynamic scenes. - Using a threshold was mentioned as a follow up in that PR, and we also want this threshold to be user-configurable. - This was not implemented previously because the optimizations were still large improvements even in dynamic scenes thanks to the improved parallelism #17840 ## Solution - Don't run static scene optimization (dirty tree tracking) for very dynamic scenes - defined here as scenes where more than 30% of objects have their `Transform` updated. - This is configurable with a percentage threshold, or it can be unconditionally enabled or disabled when setting to `0.0` or `1.0` to avoid the cost of computing the threshold. - For dynamic scenes, this makes transform prop much faster, twice as fast in the stress tests shown here. ## Testing transform_hierarchy stress tests, all of these cases spawn about a quarter million entities: - humanoids_active - dynamic scene that should be faster than `main`: <img width="609" height="395" alt="image" src="https://github.com/user-attachments/assets/bf3d6b93-aa09-4440-b8ac-18af7e46a00f" /> - humanoids_inactive - static scene that should be unchanged from `main`: <img width="631" height="377" alt="image" src="https://github.com/user-attachments/assets/a0306109-600b-4cdd-a217-5cc15e269bca" /> - humanoids_mixed - half dynamic scene that should be faster than `main` <img width="604" height="372" alt="image" src="https://github.com/user-attachments/assets/2751ece2-d4b9-4daa-af24-fe379eaf75b2" /> - large_tree - dynamic scene (50% of entities are moved) we expect to see improvements <img width="665" height="371" alt="image" src="https://github.com/user-attachments/assets/c6b08abe-eb1d-44fb-be36-457f9d5ba78e" />
# Objective - Follow up from previous transform optimization (#18589), make the `mark_dirty_trees` system more intelligent - don't run this expensive static scene optimization for dynamic scenes. - Using a threshold was mentioned as a follow up in that PR, and we also want this threshold to be user-configurable. - This was not implemented previously because the optimizations were still large improvements even in dynamic scenes thanks to the improved parallelism #17840 ## Solution - Don't run static scene optimization (dirty tree tracking) for very dynamic scenes - defined here as scenes where more than 30% of objects have their `Transform` updated. - This is configurable with a percentage threshold, or it can be unconditionally enabled or disabled when setting to `0.0` or `1.0` to avoid the cost of computing the threshold. - For dynamic scenes, this makes transform prop much faster, twice as fast in the stress tests shown here. ## Testing transform_hierarchy stress tests, all of these cases spawn about a quarter million entities: - humanoids_active - dynamic scene that should be faster than `main`: <img width="609" height="395" alt="image" src="https://github.com/user-attachments/assets/bf3d6b93-aa09-4440-b8ac-18af7e46a00f" /> - humanoids_inactive - static scene that should be unchanged from `main`: <img width="631" height="377" alt="image" src="https://github.com/user-attachments/assets/a0306109-600b-4cdd-a217-5cc15e269bca" /> - humanoids_mixed - half dynamic scene that should be faster than `main` <img width="604" height="372" alt="image" src="https://github.com/user-attachments/assets/2751ece2-d4b9-4daa-af24-fe379eaf75b2" /> - large_tree - dynamic scene (50% of entities are moved) we expect to see improvements <img width="665" height="371" alt="image" src="https://github.com/user-attachments/assets/c6b08abe-eb1d-44fb-be36-457f9d5ba78e" />
# Objective - Follow up from previous transform optimization (#18589), make the `mark_dirty_trees` system more intelligent - don't run this expensive static scene optimization for dynamic scenes. - Using a threshold was mentioned as a follow up in that PR, and we also want this threshold to be user-configurable. - This was not implemented previously because the optimizations were still large improvements even in dynamic scenes thanks to the improved parallelism #17840 ## Solution - Don't run static scene optimization (dirty tree tracking) for very dynamic scenes - defined here as scenes where more than 30% of objects have their `Transform` updated. - This is configurable with a percentage threshold, or it can be unconditionally enabled or disabled when setting to `0.0` or `1.0` to avoid the cost of computing the threshold. - For dynamic scenes, this makes transform prop much faster, twice as fast in the stress tests shown here. ## Testing transform_hierarchy stress tests, all of these cases spawn about a quarter million entities: - humanoids_active - dynamic scene that should be faster than `main`: <img width="609" height="395" alt="image" src="https://github.com/user-attachments/assets/bf3d6b93-aa09-4440-b8ac-18af7e46a00f" /> - humanoids_inactive - static scene that should be unchanged from `main`: <img width="631" height="377" alt="image" src="https://github.com/user-attachments/assets/a0306109-600b-4cdd-a217-5cc15e269bca" /> - humanoids_mixed - half dynamic scene that should be faster than `main` <img width="604" height="372" alt="image" src="https://github.com/user-attachments/assets/2751ece2-d4b9-4daa-af24-fe379eaf75b2" /> - large_tree - dynamic scene (50% of entities are moved) we expect to see improvements <img width="665" height="371" alt="image" src="https://github.com/user-attachments/assets/c6b08abe-eb1d-44fb-be36-457f9d5ba78e" />
Objective
Solution
Testing