column/table level change detection#11120
Conversation
|
Okay, really neat stuff. Thanks for trying this out and presenting those benchmark results! Initial thoughts:
|
|
Definitely excited to have this, as it opens up an opportunity for my reactive UI crate to skip regenerating the UI when a query has no changed data at the archetype/column level. |
current impl of this pr includes both table (built with their owned table-storage columns) and every columns (both table-storage and sparse-set-storage). schedule cannot guarantee only one mutable access to a table and sparse-set column simultaneously. If we avoid using AtomicType, the viable option would be to only implement column-level change detection specifically for table-storage columns. (of course ,i think not to use atomic is the right way that will greatly reduce overhead )
yeah,i think i can add more benchmark related to change detection in another pr |
| pub fn read_last_mutable_access_tick(&self) -> Tick { | ||
| Tick::new( | ||
| self.last_mutable_access_tick | ||
| .load(std::sync::atomic::Ordering::Relaxed), | ||
| ) | ||
| } | ||
| pub fn update_last_mutable_access_tick(&self, tick: Tick) { | ||
| // self.last_mutable_access_tick | ||
| // .store(tick.get(), std::sync::atomic::Ordering::Relaxed); | ||
| self.last_mutable_access_tick | ||
| .fetch_max(tick.get(), std::sync::atomic::Ordering::AcqRel); | ||
| } |
There was a problem hiding this comment.
Could you explain the choice for memory ordering? Naively, I would have expected the fetch_max to be Release and the load to be Acquire, so that we cannot miss changes. However I'm not confident in my understanding of atomic ordering.
There was a problem hiding this comment.
I haven't deeply considered this aspect yet(so please correct me if i am wrong). My thought was that there might not be any other data flow related to access-tick. Therefore, using a relaxed load operation would be acceptable. Bevy's scheduler may run systems with same table access in parallel, but we cannot guarantee the progress when system would load or store corresponding tick, when two systems write to the same tick concurrently. store operations must be observed. but for load operation ,even if changes are missed,it is not a big deal,our focus is on comparing the loaded value to the system's last_change_tick.
|
The only time we need to update change ticks in parallel is when we're using Also adding branching inside the iterator is very bad for vectorization. You probably want to somehow update the matched archetypes. Probably needs benchmarking to be sure. |
yeah,this impl is just a try to identify a method that balances both ergonomics and performance.still need more benchmark,my current thoughts are:
|
|
This is a critical issue, as current change tick prevents the use of SIMD instructions and significantly reduces effective memory bandwidth. This throws ecs back to the age of even 8086, let alone compete with seriously optimzied OOP designs. In this context, atomic operations could utilize relaxed memory ordering, since strict ordering is unnecessary—as long as system-level access to the table is already properly sequenced. I hope this fix can be implemented soon, since having to write and read massive amounts of unnecessary data would be a rather unfortunate and inefficient outcome for everyone. |
|
im most cases i don't even care the changed ticks. If someone wants to do the old ways why not put a Changed data component to those entities and do the ticking on their own tables. Why our fast and clean table should pay for the crazy O(n) operation. |
Objective
Solution
AtomicU32type to track the last mutable access for tables and columns.set_tableandset_archetypemethods.archetype_filter_fetchmethod onQueryFilterto filter out tables or columns that have not been subject to mutable access since system last run.Notes
I !!!express reservations about the current implementation due to its involvement in multiple data flows, potentially increasing code vulnerability and expensive . I would highly appreciate it if someone could propose an alternative implementation details with fewer complexities and improved security considerations.
Performance
Intel 13400kf NV4070ti
to comprehensively evaluate the change detection mechanism, additional benchmarking across various scenarios is deemed necessary
regression:

The regression in spawn/insert is attributed to the introduction of an atomic operation for each table per component.
iter_fragmented_*method appears to only iterate over few entities, making the maintenance of tick costly.query.get_mut_compoent/unchecked(entity)is expensive becase it set the archetype for each entity retrieved.It is worth mentioning that many_foxes frame_time dropped from 3.06ms to 4.21ms

yellow is main ,red is the pr
the hotspot is


animation_player,propagate_transformwhich parallel use of get_unchecked for each entity seems leading to numerous invalid cache instances across CPU cores.Migration Guide