Skip to content

Refactor moving entity to new table#23151

Merged
alice-i-cecile merged 4 commits intobevyengine:mainfrom
JaySpruce:table_move_refactor
Mar 23, 2026
Merged

Refactor moving entity to new table#23151
alice-i-cecile merged 4 commits intobevyengine:mainfrom
JaySpruce:table_move_refactor

Conversation

@JaySpruce
Copy link
Copy Markdown
Member

@JaySpruce JaySpruce commented Feb 25, 2026

Objective

When moving an entity from one table to another, we have 3 methods available:

  • move_to_superset_unchecked, for moving during insertion.
  • move_to_and_drop_missing_unchecked, for moving during normal removal.
  • move_to_and_forget_missing_unchecked, for moving during take removal.

However, when looking at flecs (to steal ideas, naturally), I noticed it just has one function for moving during insertion and removal: flecs_table_move. It also handles things a bit differently internally (details on that below).

Solution

There are essentially 3 somewhat independent changes here:

  • Combining the 3 methods into 1 called move_row.
  • Moving the move method to Tables and taking TableIds (rather than being Table::move_to(&mut self, other: &mut Table)).
  • Changing the internals of the move method to be more like flecs.

Combining the methods

The 3 methods were already very similar; move_to_superset just didn't check if components were missing. They could be combined without any of the other changes, but it's possible that insertion performance would suffer.

TableId instead of &mut Table

I believe the idea behind using references instead of IDs was to be able to cache the pointers and avoid fetching the tables more than once. However, the only callers of the move_to methods, BundleInserter and BundleRemover, only used the pointers for the move_to methods and didn't need them for anything else, so there isn't much point in caching them. And being able to just use IDs makes the code nicer, since we don't have to dance around the borrow checker.

Internals

flecs_table_move requires that tables' columns are sorted (by component ID, low-to-high). It iterates the source and destination tables at the same time (i.e. their component IDs and corresponding columns), but pauses one iterator when the other "falls behind" (has a lower component ID). If the source table falls behind, the source table's current component ID was removed, and if the destination table falls behind, the destination table's current component ID was added.

Bevy's move_to methods don't rely on tables' columns being in any certain order; they just iterate the source table and fetch each matching column from the destination table individually.

As it turns out, Bevy's tables are sorted in practice, we just never guaranteed or relied on it. A table's list of component IDs are sorted to use as the key in a HashMap, and the same list is used to build the table. So we can just make it official by asserting that a table's components are sorted when the table is finalized (TableBuilder::build).

Evidently, iterating the destination table is faster than fetching from it repeatedly:

Screenshot_20260224_135012

Misc

There's a PERF comment that has survived since 2021 (#2673) and I'm not really sure what it means:

// PERF: store "non bundle" components in edge, then just move those to avoid
// redundant copies

I didn't want to remove it without saying anything, but I think it's lived long enough.

@JaySpruce JaySpruce added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times C-Code-Quality A section of code that is hard to understand or change S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Feb 25, 2026
@github-project-automation github-project-automation bot moved this to Needs SME Triage in ECS Feb 25, 2026
@JaySpruce JaySpruce added the D-Unsafe Touches with unsafe code in some way label Feb 26, 2026
@alice-i-cecile alice-i-cecile added the X-Uncontroversial This work is generally agreed upon label Mar 16, 2026
Copy link
Copy Markdown
Contributor

@chescock chescock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this approach seems much nicer! I think it's right, but I want to check a little more carefully before approving, and I'm out of time for today :).

new_table_id: TableId,
row: TableRow,
) -> TableMoveResult<'_> {
assert!(old_table_id != new_table_id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I suspect we'll eventually want to make this a safety condition and only debug_assert! it, but since this PR already improves performance I don't think we need to micro-optimize too much :).)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't super rigorous, but I did run benchmarks before and after adding the assert!s and didn't see any difference.

let mut src_next = src_iter.next();
let mut dst_next = dst_iter.next();

while let Some((src_component_id, src_column)) = src_next.take() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a little trouble following the control flow here. I think this is supposed to affect each source column exactly once, and either copy it or drop/forget it. Is that right? Could we rework it to use a for loop over the source columns so that it's more clear that it touches each one once?

I think that would look something like this:

let mut dst_iter = dst_table.columns.iter_mut();
let mut dst_next = dst_iter.next();
for (src_component_id, src_column) in src_table.columns.iter_mut() {
    // Skip past any destination columns that don't exist in the source table.
    // The caller is responsible for initializing those columns.
    while dst_next
        .as_ref()
        .is_some_and(|(dst_component_id, _dst_column)| *dst_component_id < src_component_id)
    {
        dst_next = dst_iter.next();
    }
    // Then move the value in the source column if it exists in the destination table,
    // or remove it if it does not.
    if let Some((dst_component_id, dst_column)) = dst_next.as_mut()
        && *dst_component_id == src_component_id
    {
        unsafe {
            dst_column.initialize_from_unchecked(src_column, last_index, row, dst_row);
        }
        dst_next = dst_iter.next();
    } else {
        unsafe {
            src_column.swap_remove_unchecked::<DROP>(last_index, row);
        }
    }
}

If that's equivalent, then I think it will be easier to follow.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely an improvement, and even a bit faster! (being compared to the same main as before):

Screenshot_20260316_201546

@alice-i-cecile alice-i-cecile added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Mar 16, 2026
Co-Authored-By: Chris Russell <[email protected]>
Co-Authored-By: eugineerd <[email protected]>
@JaySpruce JaySpruce added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Mar 17, 2026
Copy link
Copy Markdown
Contributor

@ElliottjPierce ElliottjPierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything wrong with this; the tests seem to agree, and the perf improvement is very nice! Some thoughts on things to look at next:

I don't like panics in unsafe functions. I'd rather have more safety requirements than more asserts, but that's not a big deal and can be saved for later.

I think this moving algorithm is vastly superior to what we had before. It may be doing unnecessary work though if it moves a component value that is going to be overwritten anyway. So there might be room to make this even faster.

I'm pretty indifferent to using table ids instead of pointers, but I don't see what the change buys us. And using ids may loose us some optimization opportunities in the future, but IDK.

There are a few small functions here and there that you might want to inline, but it's probably happening anyway, so up to you.

Anyway, I think this is a great change. Looking forward to seeing where this goes next.

/// If `DROP` is `false`, removed components will be forgotten,
/// allowing ownership to be relinquished to the caller.
///
/// # Panics
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it might be unsound for this to panic...

It looks like this is called after moving the entity between archetypes and stuff. If it unwinds, it leaves the entity in the wrong archetype, which could potentially cause UB. So this might need to be a safety requirement.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't panicking always be safer than continuing with violated safety requirements?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are both the same here IIUC. If the safety requirement is not fulfilled, it's UB. If it panics, it unwinds with the world in an invalid state. The entity thinks it's in the new archetype, even though it isn't initialize fully yet. If something catches the unwind, like bevy's error handler, then the world will continue as is, so a query could return references to uninitialized memory which is UB. It would also probably try to drop uninitialized memory which is UB. So either way, if the caller is wrong, it's UB, so it should be unsafe to reflect that.

(All of that is from memory from where bevy catches unwinds. But I think it's right.) You could also make the argument that documenting the panic makes it implicitly unsafe when called in these contexts, but I don't think that's a patter that bevy addopts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it panics, it unwinds with the world in an invalid state.

Yeah, see #20368 for another example of it being unsound to unwind during component insertion.

Hmm, we also run arbitrary user code in the drop impls within this. Does that mean it's already unsound to remove a component with a panicking drop? ... Well, it's definitely buggy, since it will swap components between entities while doing swap_remove, but I haven't actually been able to trigger UB since it doesn't update the EntityLocation until afterwards. I might try to put together a clean example and make an issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those things that's really hard to get right. Especially with drop logic, allocator panics, hook panics (which could leave some hooks not run), etc. All of these leave the world in an invalid state, which could be totally fine or cause UB depending on how the world is used next.

From my research in building my own ecs, I find the simplest thing to do is to declare a variable that panics when it's dropped and forget it afterward.

let panic_bomb = PanicOnDrop("Unacceptable panic in structural world change");

// Structural change here (ex: spawn, insert, etc.)

core::mem::forget(panic_bomb);

If something does panic inside, it will cause a non-unwinding panic when it hits the panic_bomb, aborting the process. IMO, this is the best balance of minimal safety requirements that doesn't require catching unwinds in component drops, hooks, and such.

There are other things you can do, too. For example, you can make sure the hook and drop function pointers include a catch_unwind or something themselves, which rust can make 0 cost if it can verify that a panic is impossible. But this gets messy IMO.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those things that's really hard to get right.

Agreed!

I find the simplest thing to do is to declare a variable that panics when it's dropped and forget it afterward.

Yeah, this is a good pattern to ensure safety!

But we also want to reduce the number of ways that users can cause Bevy to abort the process, of course :).

I think it should be possible to at least make sure we aren't running arbitrary user code while the panic_bomb is armed. If we move the drop calls to happen last, after all the EntityLocation bookkeeping is finished, then any panics there will only cause other values to leak. Conversely, if we initialize required components after reserving capacity in their tables, but before doing anything else, then any panics there will only cause other required component values to leak.

(I also wonder whether we could move the drop calls out far enough that they can happen where we have static types. That could eliminate the indirect calls and let the optimizer inline them.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting. I definitely think this is worth splitting out into its own issue to discuss. There are lots of paths forward here. I think the simplest thing for this PR is to just make it unsafe and call it good for now.

But we should try to solidify a "Bevy way" for these "unwinding in unsafe code" questions and patch all the existing soundness holes, ideally before any users start running into aborts.

Copy link
Copy Markdown
Contributor

@chescock chescock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked over all the safety requirements, and I'm pretty convinced this is sound, except for the panic that @ElliottjPierce noted. And it's simpler and faster, so it seems like a great change!

We should update the safety comments on move_row and get rid of the asserts there before merging, since unwinding is UB anyway.

It would also be good to fix the safety requirements on BlobArray, but those were wrong before this PR so I wouldn't block on that.

/// If `DROP` is `false`, removed components will be forgotten,
/// allowing ownership to be relinquished to the caller.
///
/// # Panics
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it panics, it unwinds with the world in an invalid state.

Yeah, see #20368 for another example of it being unsound to unwind during component insertion.

Hmm, we also run arbitrary user code in the drop impls within this. Does that mean it's already unsound to remove a component with a panicking drop? ... Well, it's definitely buggy, since it will swap components between entities while doing swap_remove, but I haven't actually been able to trigger UB since it doesn't update the EntityLocation until afterwards. I might try to put together a clean example and make an issue.

@JaySpruce JaySpruce added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Mar 20, 2026
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Mar 23, 2026
Merged via the queue into bevyengine:main with commit 4725267 Mar 23, 2026
40 checks passed
@github-project-automation github-project-automation bot moved this from Needs SME Triage to Done in ECS Mar 23, 2026
@JaySpruce JaySpruce deleted the table_move_refactor branch March 31, 2026 02:34
splo pushed a commit to splo/bevy that referenced this pull request Mar 31, 2026
## Objective

When moving an entity from one table to another, we have 3 methods
available:
- `move_to_superset_unchecked`, for moving during insertion.
- `move_to_and_drop_missing_unchecked`, for moving during normal
removal.
- `move_to_and_forget_missing_unchecked`, for moving during `take`
removal.

However, when looking at flecs (to steal ideas, naturally), I noticed it
just has one function for moving during insertion and removal:
[`flecs_table_move`](https://github.com/SanderMertens/flecs/blob/689adf8f6c1d069aa51fa059b8a2cbb4fb761f1d/src/storage/table.c#L1961).
It also handles things a bit differently internally (details on that
below).

## Solution

There are essentially 3 somewhat independent changes here:
- Combining the 3 methods into 1 called `move_row`.
- Moving the `move` method to `Tables` and taking `TableId`s (rather
than being `Table::move_to(&mut self, other: &mut Table)`).
- Changing the internals of the `move` method to be more like flecs.

### Combining the methods

The 3 methods were already very similar; `move_to_superset` just didn't
check if components were missing. They could be combined without any of
the other changes, but it's possible that insertion performance would
suffer.

### `TableId` instead of `&mut Table`

I believe the idea behind using references instead of IDs was to be able
to cache the pointers and avoid fetching the tables more than once.
However, the only callers of the `move_to` methods, `BundleInserter` and
`BundleRemover`, only used the pointers for the `move_to` methods and
didn't need them for anything else, so there isn't much point in caching
them. And being able to just use IDs makes the code nicer, since we
don't have to dance around the borrow checker.

### Internals

`flecs_table_move` requires that tables' columns are sorted (by
component ID, low-to-high). It iterates the source and destination
tables at the same time (i.e. their component IDs and corresponding
columns), but pauses one iterator when the other "falls behind" (has a
lower component ID). If the source table falls behind, the source
table's current component ID was removed, and if the destination table
falls behind, the destination table's current component ID was added.

Bevy's `move_to` methods don't rely on tables' columns being in any
certain order; they just iterate the source table and fetch each
matching column from the destination table individually.

As it turns out, Bevy's tables *are* sorted in practice, we just never
guaranteed or relied on it. A table's list of component IDs are sorted
to use as the key in a `HashMap`, and the same list is used to build the
table. So we can just make it official by `assert`ing that a table's
components are sorted when the table is finalized
(`TableBuilder::build`).

Evidently, iterating the destination table is faster than fetching from
it repeatedly:

<img width="644" height="848" alt="Screenshot_20260224_135012"
src="https://github.com/user-attachments/assets/118cb78a-3267-411e-aa8e-3d1d7ea4a274"
/>

### Misc

There's a `PERF` comment that has survived since 2021 (bevyengine#2673) and I'm
not really sure what it means:

```
// PERF: store "non bundle" components in edge, then just move those to avoid
// redundant copies
```

I didn't want to remove it without saying anything, but I think it's
lived long enough.

---------

Co-authored-by: Chris Russell <[email protected]>
Co-authored-by: eugineerd <[email protected]>
Co-authored-by: Eagster <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Code-Quality A section of code that is hard to understand or change C-Performance A change motivated by improving speed, memory usage or compile times D-Unsafe Touches with unsafe code in some way S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it X-Uncontroversial This work is generally agreed upon

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants