Skip to content

Comments

map: avoid allocations during PerCPUMap batch lookups#1823

Merged
lmb merged 1 commit intocilium:mainfrom
aibor:feat-per-cpu-batch-zero-alloc
Jul 19, 2025
Merged

map: avoid allocations during PerCPUMap batch lookups#1823
lmb merged 1 commit intocilium:mainfrom
aibor:feat-per-cpu-batch-zero-alloc

Conversation

@aibor
Copy link
Contributor

@aibor aibor commented Jul 16, 2025

Extend the optimization for batch lookups done in b89a4cb to PerCPUMaps as well.

It adapts the flow of the single value maps. For passed values objects that can not be used as backing memory, the behavior is the same as before.

goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf
cpu: AMD Ryzen 5 5600G with Radeon Graphics
                                          │   old.txt   │               new.txt               │
                                          │   sec/op    │   sec/op     vs base                │
Iterate/PerCPUHash/BatchLookup-8            551.3µ ± 1%   129.7µ ± 1%  -76.47% (p=0.000 n=10)
Iterate/PerCPUHash/BatchLookupAndDelete-8   568.2µ ± 1%   148.4µ ± 1%  -73.88% (p=0.000 n=10)
geomean                                     559.7µ        138.7µ       -75.21%

                                          │   old.txt    │              new.txt               │
                                          │     B/op     │    B/op     vs base                │
Iterate/PerCPUHash/BatchLookup-8            89709.0 ± 0%   136.0 ± 0%  -99.85% (p=0.000 n=10)
Iterate/PerCPUHash/BatchLookupAndDelete-8   89799.0 ± 0%   224.0 ± 1%  -99.75% (p=0.000 n=10)
geomean                                     87.65Ki        174.5       -99.81%

                                          │    old.txt    │              new.txt               │
                                          │   allocs/op   │ allocs/op   vs base                │
Iterate/PerCPUHash/BatchLookup-8            1006.000 ± 0%   5.000 ± 0%  -99.50% (p=0.000 n=10)
Iterate/PerCPUHash/BatchLookupAndDelete-8   1006.000 ± 0%   5.000 ± 0%  -99.50% (p=0.000 n=10)
geomean                                       1.006k        5.000       -99.50%

@aibor aibor requested a review from a team as a code owner July 16, 2025 21:45
@aibor aibor force-pushed the feat-per-cpu-batch-zero-alloc branch from 251da1f to b794663 Compare July 16, 2025 22:05
@lmb
Copy link
Contributor

lmb commented Jul 18, 2025

IIRC this will only work for types which do not have any implicit padding like uint64.

struct {
  a uint64
  b uint32
}

will still go via the slow path because there is a 4 byte trailer. This is because we don't want to allow reading into memory that is not "visible" in the go type system. Users can work around this with explicit padding.

struct {
  a uint64
  b uint32
  _ uint32
}

Do you still want to add this in that case? If yes please update the commit message to explain this and probably drop a comment in the source code as well.

Extend the optimization from b89a4cb ("map: avoid allocations during
batch lookup of common types") to BatchLookup with PerCPUMaps This
means that types without implicit padding will not incur allocations.
See 4609dc7 ("map: zero-allocation operations for common types") for a
detailed description of the behavior.

```
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf
cpu: AMD Ryzen 5 5600G with Radeon Graphics
                                          │   old.txt   │               new.txt               │
                                          │   sec/op    │   sec/op     vs base                │
Iterate/PerCPUHash/BatchLookup-8            551.3µ ± 1%   129.7µ ± 1%  -76.47% (p=0.000 n=10)
Iterate/PerCPUHash/BatchLookupAndDelete-8   568.2µ ± 1%   148.4µ ± 1%  -73.88% (p=0.000 n=10)
geomean                                     559.7µ        138.7µ       -75.21%

                                          │   old.txt    │              new.txt               │
                                          │     B/op     │    B/op     vs base                │
Iterate/PerCPUHash/BatchLookup-8            89709.0 ± 0%   136.0 ± 0%  -99.85% (p=0.000 n=10)
Iterate/PerCPUHash/BatchLookupAndDelete-8   89799.0 ± 0%   224.0 ± 1%  -99.75% (p=0.000 n=10)
geomean                                     87.65Ki        174.5       -99.81%

                                          │    old.txt    │              new.txt               │
                                          │   allocs/op   │ allocs/op   vs base                │
Iterate/PerCPUHash/BatchLookup-8            1006.000 ± 0%   5.000 ± 0%  -99.50% (p=0.000 n=10)
Iterate/PerCPUHash/BatchLookupAndDelete-8   1006.000 ± 0%   5.000 ± 0%  -99.50% (p=0.000 n=10)
geomean                                       1.006k        5.000       -99.50%
```

Signed-off-by: Tobias Böhm <[email protected]>
@aibor aibor force-pushed the feat-per-cpu-batch-zero-alloc branch from b794663 to 70075be Compare July 18, 2025 17:50
@aibor
Copy link
Contributor Author

aibor commented Jul 18, 2025

Yes, I am aware of the behavior and would like to have the zero-allocations behavior for types satisfying the requirements as described by you.

I have updated the commit message and referenced the original commit that has a very detailed description of the behavior.

How would you like to have it documented in the code? As far as I can see the behavior is not really documented for the Map methods directly yet, or have I missed it?

@lmb
Copy link
Contributor

lmb commented Jul 18, 2025

Maybe having it in the commit message is fine then!

@lmb lmb merged commit 49ae13c into cilium:main Jul 19, 2025
30 of 34 checks passed
@aibor aibor deleted the feat-per-cpu-batch-zero-alloc branch July 19, 2025 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants