Opportunistically use dense iteration for archetypal iteration#14049
Opportunistically use dense iteration for archetypal iteration#14049cart merged 13 commits intobevyengine:mainfrom
Conversation
|
this pr also unblocks some compiler auto SIMD ,consider following code |
| accum = | ||
| // SAFETY: Matched table IDs are guaranteed to still exist. | ||
| let table = unsafe { self.tables.get(archetype.table_id()).debug_checked_unwrap() }; | ||
| if table.entity_count() == archetype.len() { |
There was a problem hiding this comment.
I think it would be worthwhile to add a comment explaining that this an optimization for hybrid queries where the archetype is actually dense
|
Does this fully fix #2144? I suppose not, since it only optimizes for_each. |
yeah , Optimizing |
|
Sounds good. Let's leave that to follow-up then. |
| v.push(world.spawn(TableData(0.)).id()); | ||
| } | ||
|
|
||
| v.shuffle(&mut deterministic_rand()); |
There was a problem hiding this comment.
v.shuffle(&mut deterministic_rand());
I think this should have a comment to explain why you're doing it.
If I understand correctly you have archetypes
A: [TableData, SparseData]
B: [TableData]
Entities are added in this order:
e0: A
e1: B
e2: A
e3: B
...
So that the order of entities is
Table: [e0, e1, e2, ...]
A: [e0, e2, ..]
B: [e1, e3, ..]
Then if you just despawned all the entities of B in order, you would remove e1, then e3, etc.
Each despawn call would do a swap_remove in both the archetype and the table.
So the order of entities in the table would be somewhat jumbled compared with the order in A, which is still [e0, e2, ...]
By adding the shuffling, you want to guarantee that their orders are very different; which maximimes the benefits of your PR (which keeps cache-locality by iterating through the table order, instead of the archetype order
|
Could you please explain why this code is faster? In this case the entities should be iterated in the same order before/after your change, no? |
it would theoretically have the same cache locality. however, dense iteration could imply to the compiler that we are iterating over continuous memory which could enable automatic SIMD optimizations , This could potentially make the operation nearly 4 times faster. |
ItsDoot
left a comment
There was a problem hiding this comment.
Not an ECS expert, but the code quality overall looks fine.
Co-authored-by: Alice Cecile <[email protected]>
Co-authored-by: Christian Hughes <[email protected]>
…#14673) # Objective - follow of #14049 ,we could use it on our Parallel Iterator,this pr also unified the used function in both regular iter and parallel iterations. ## Performance  no performance regression for regular itertaion 3.5X faster in hybrid parallel iteraion,this number is far greater than the benefits obtained in regular iteration(~1.81) because mutable iterations on continuous memory can effectively reduce the cost of mataining core cache coherence
|
Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to bevyengine/bevy-website#1667 if you'd like to help out. |

Objective
Solution
for_eachstyle iteration.Performance
nearly 2x win in specific scenarios, and no performance degradation in other test cases.