Skip to content

[Merged by Bors] - [ecs] Improve Commands performance#2332

Closed
NathanSWard wants to merge 25 commits intobevyengine:mainfrom
NathanSWard:nward/commands-no-alloc
Closed

[Merged by Bors] - [ecs] Improve Commands performance#2332
NathanSWard wants to merge 25 commits intobevyengine:mainfrom
NathanSWard:nward/commands-no-alloc

Conversation

@NathanSWard
Copy link
Copy Markdown
Contributor

@NathanSWard NathanSWard commented Jun 11, 2021

Objective

  • Currently Commands are quite slow due to the need to allocate for each command and wrap it in a Box<dyn Command>.
  • For example:
fn my_system(mut cmds: Commands) {
    cmds.spawn().insert(42).insert(3.14);
}

will have 3 separate Box<dyn Command> that need to be allocated and ran.

Solution

  • Utilize a specialized data structure keyed CommandQueueInner.
  • The purpose of CommandQueueInner is to hold a collection of commands in contiguous memory.
  • This allows us to store each Command type contiguously in memory and quickly iterate through them and apply the Command::write trait function to each element.

@github-actions github-actions bot added the S-Needs-Triage This issue needs to be labelled label Jun 11, 2021
@NathanSWard NathanSWard added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times and removed S-Needs-Triage This issue needs to be labelled labels Jun 11, 2021
@NathanSWard
Copy link
Copy Markdown
Contributor Author

Here is a list of the minimal performance tests I created.
It seems that the AnyStack has a nice size performance improvement from the currently implementation on main.

group                                                         commands-anystack                      commands-main
-----                                                         -----------------                      -------------
empty_commands/Commands: empty                                1.00      6.9±0.10ns        ? ?/sec    1.24      8.5±0.16ns        ? ?/sec
spawning_commands/Commands: Spawn entities with components    1.00      2.4±0.29ms        ? ?/sec    1.77      4.3±0.29ms        ? ?/sec

@NathanSWard
Copy link
Copy Markdown
Contributor Author

I have a feeling that since this is a pretty specialized structure I should probably just move it into bevy_ecs from bevy_util.
This will also allow me to remove some of the unsafe code around the user_data: *mut u8.

@TheRawMeatball
Copy link
Copy Markdown
Member

Hmm, should we maybe pull the commands benchmark into a separate PR and merge it earlier? It's far less controversial than the unsafe anystack impl, and reducing the scope of PR s with unsafe is usually a good thing

Copy link
Copy Markdown
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of using an fn here. Apart from the alignment issue you have pointed out, this looks good.

Copy link
Copy Markdown
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked over the tests in detail, but if this passes miri, I can believe it's good to go.

@NathanSWard
Copy link
Copy Markdown
Contributor Author

I haven't checked over the tests in detail, but if this passes miri, I can believe it's good to go.

I'll go ahead and run miri with this sometime today/tomorrow and report back the results.

@NathanSWard NathanSWard requested a review from DJMcNab June 13, 2021 02:44
Copy link
Copy Markdown
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few docs thoughts

@NathanSWard NathanSWard force-pushed the nward/commands-no-alloc branch from 51b2fe0 to 0d75c6b Compare June 14, 2021 00:55
@NathanSWard
Copy link
Copy Markdown
Contributor Author

NathanSWard commented Jun 14, 2021

Here are the most recent benchmarks (taken from #2334).
It seems this new implementation still has a decent performance gain.

group                                  COMMANDS-MAIN               COMMANDS-NEW
-----                                  -------------               ------------
commands/2000_entities                 1.68   818.8±13.29µs        1.00   486.1±13.52µs
commands/4000_entities                 1.71  1654.4±38.16µs        1.00   967.2±21.10µs
commands/6000_entities                 1.68      2.4±0.04ms        1.00  1449.6±59.79µs
commands/8000_entities                 1.71      3.3±0.05ms        1.00  1912.0±53.50µs
empty_commands/0_entities              1.33      8.4±0.15ns        1.00      6.4±0.12ns
rng_commands/2000_entities             1.61   493.1±27.43µs        1.00    306.3±8.56µs
rng_commands/4000_entities             1.64   984.3±24.50µs        1.00   599.8±14.73µs
rng_commands/6000_entities             1.60  1458.4±44.39µs        1.00   908.9±17.67µs
rng_commands/8000_entities             1.59  1917.4±39.45µs        1.00  1209.5±28.52µs

@NathanSWard NathanSWard force-pushed the nward/commands-no-alloc branch from 543522c to 2fb8a5a Compare June 14, 2021 05:36
@NathanSWard NathanSWard requested a review from mockersf June 14, 2021 16:53
@NathanSWard
Copy link
Copy Markdown
Contributor Author

NathanSWard commented Jun 14, 2021

As @mockersf suggested I created a benchmark that utilized FakeCommands (e.g. commands that don't actually do anything to the world.
Here are the results for that test.

group                                  COMMANDS-MAIN                          COMMANDS-NEW
-----                                  -------------                          ------------
empty_commands/0_entities              1.32      8.4±0.18ns        ? ?/sec    1.00      6.4±0.11ns        ? ?/sec
fake_commands/2000_commands            3.26     66.4±1.15µs        ? ?/sec    1.00     20.3±0.35µs        ? ?/sec
fake_commands/4000_commands            3.24    132.0±2.17µs        ? ?/sec    1.00     40.7±0.78µs        ? ?/sec
fake_commands/6000_commands            3.22    195.1±3.33µs        ? ?/sec    1.00     60.6±1.07µs        ? ?/sec
fake_commands/8000_commands            3.22    261.0±4.17µs        ? ?/sec    1.00     81.0±1.45µs        ? ?/sec
spawn_commands/2000_entities           1.59   489.5±10.41µs        ? ?/sec    1.00   307.5±10.07µs        ? ?/sec
spawn_commands/4000_entities           1.56   946.9±27.91µs        ? ?/sec    1.00   605.3±16.40µs        ? ?/sec
spawn_commands/6000_entities           1.60  1428.0±36.34µs        ? ?/sec    1.00   893.3±17.30µs        ? ?/sec
spawn_commands/8000_entities           1.58  1874.1±39.38µs        ? ?/sec    1.00  1187.7±28.82µs        ? ?/sec

@NathanSWard NathanSWard force-pushed the nward/commands-no-alloc branch 3 times, most recently from ae297f1 to 2a614b8 Compare June 15, 2021 04:40
@NathanSWard
Copy link
Copy Markdown
Contributor Author

NathanSWard commented Jun 15, 2021

benchmarks for the most recent version. (with special casing for 0 sized commands and ensuring the vector isn't null).
This special casing for 0 sized seemed to squeeze out a bit more perf 🤷

group                           COMMANDS-MAIN                          COMMANDS-NEW
-----                           -------------                          -------------
empty_commands/0_entities       1.34      8.4±0.18ns        ? ?/sec    1.00      6.3±0.12ns        ? ?/sec
fake_commands/2000_commands     3.49     66.4±1.15µs        ? ?/sec    1.00     19.0±0.39µs        ? ?/sec
fake_commands/4000_commands     3.51    132.0±2.17µs        ? ?/sec    1.00     37.6±0.39µs        ? ?/sec
fake_commands/6000_commands     3.46    195.1±3.33µs        ? ?/sec    1.00     56.3±0.40µs        ? ?/sec
fake_commands/8000_commands     3.47    261.0±4.17µs        ? ?/sec    1.00     75.3±1.02µs        ? ?/sec
spawn_commands/2000_entities    1.63   489.5±10.41µs        ? ?/sec    1.00    300.8±5.55µs        ? ?/sec
spawn_commands/4000_entities    1.59   946.9±27.91µs        ? ?/sec    1.00   595.7±11.86µs        ? ?/sec
spawn_commands/6000_entities    1.65  1428.0±36.34µs        ? ?/sec    1.00   863.3±16.51µs        ? ?/sec
spawn_commands/8000_entities    1.61  1874.1±39.38µs        ? ?/sec    1.00  1167.6±27.94µs        ? ?/sec

@NathanSWard NathanSWard force-pushed the nward/commands-no-alloc branch from 9dbe32f to a9be103 Compare July 16, 2021 16:36
@NathanSWard
Copy link
Copy Markdown
Contributor Author

@cart this should be good to go now :)

@cart
Copy link
Copy Markdown
Member

cart commented Jul 16, 2021

bors r+

bors bot pushed a commit that referenced this pull request Jul 16, 2021
# Objective

- Currently `Commands` are quite slow due to the need to allocate for each command and wrap it in a `Box<dyn Command>`.
- For example:
```rust
fn my_system(mut cmds: Commands) {
    cmds.spawn().insert(42).insert(3.14);
}
```
will have 3 separate `Box<dyn Command>` that need to be allocated and ran.

## Solution

- Utilize a specialized data structure keyed `CommandQueueInner`. 
- The purpose of `CommandQueueInner` is to hold a collection of commands in contiguous memory. 
- This allows us to store each `Command` type contiguously in memory and quickly iterate through them and apply the `Command::write` trait function to each element.
@bors bors bot changed the title [ecs] Improve Commands performance [Merged by Bors] - [ecs] Improve Commands performance Jul 16, 2021
@bors bors bot closed this Jul 16, 2021
ostwilkens pushed a commit to ostwilkens/bevy that referenced this pull request Jul 27, 2021
# Objective

- Currently `Commands` are quite slow due to the need to allocate for each command and wrap it in a `Box<dyn Command>`.
- For example:
```rust
fn my_system(mut cmds: Commands) {
    cmds.spawn().insert(42).insert(3.14);
}
```
will have 3 separate `Box<dyn Command>` that need to be allocated and ran.

## Solution

- Utilize a specialized data structure keyed `CommandQueueInner`. 
- The purpose of `CommandQueueInner` is to hold a collection of commands in contiguous memory. 
- This allows us to store each `Command` type contiguously in memory and quickly iterate through them and apply the `Command::write` trait function to each element.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants