-
Notifications
You must be signed in to change notification settings - Fork 565
virtio-devices: net: reduce vec allocations for iovec conversion #6636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| let mut next_desc = Some(desc); | ||
|
|
||
| let mut iovecs = Vec::new(); | ||
| let mut iovecs = self.iovecs.borrow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution - but maybe we could just use smallvec instead - like we do elsewhere in the codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use SmallVec efficiently we need to know beforehand the optimum size, otherwise it will allocate and deallocate. Not sure how feasible this is with the code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some debugging (e.g. adding info!("iocount {}", iovecs.len())) and running various workloads shows that the vector never exceeds length 4 (and that only occurs when copying a very large file). I think a smallvec with reserved capacity of 4 is the best solution.
|
How much improvement do you see with this change? |
Haven't benchmarked. Could you provide some docs about how to bench it? Virtio-net is a component that is used very frequently in real scenarios. Reducing one allocation when sending and receiving each single frame should have a relatively good performance benefit. Using smallvec can work here too if the desc chain's length is small enough(but it is unpredictable). However, since we make a syscall immediately after constructing iovecs, there won't be multiple iovecs at the same time, so it's more appropriate to use a shared buffer to cache iovecs and store the buffer in a long-lived structure. |
The best case is to run your target workload or the synthetic workload you care about, and show that it increases the throughput or decreases the latency. Alternatively, there is this In generally I think this is a good change. The code is simple enough. I was just curious to know if there is any directly visible benefit. Somewhat related -- recently I observed that CH's block perf has more to be desired. I don't have any concrete ideas yet, but if I end up tuning the internal buffer size, it may make sense to move away from SmallVec to this sort of persistent buffer, too. Lastly, please properly sign off your patch. |
3188b91 to
2f81849
Compare
Ok, it is added now.
I can confirm it won't be more than 4 for most cases if I understood it right(I searched the reference to SmallVec works here based on the current kernel's driver implementation, using a persistent buffer works too. |
Assuming the kernel driver is not change very frequently, I think using SmallVec is better. You can even use 8 to leave some room to grow in the future. In the commit message, please say why a specific value is chosen. |
It was my mistake. I'm not familiar with kernel's code yet. I did a little bit more investigation on it and found there would be at most 19 descriptors linked in a single chain.
The maximum possibility of desc chain size is Though at most cases the desc chain won't be very long and using smallvec with N=4 is enough. Considering the possibility above, I still think using a persistent buffer is always a better way as it can definitely avoid allocation at packet level and saves copy cost of smallvec. Comparing to smallvec I don't see any drawbacks since in our case there would not be more than one instance exist at the same time. However, I'm open to change it as smallvec if we decide to, it is better than current Vec anyway. @rbradford @liuw , would love to hear your opinion! |
I don't feel too strongly either way. Like I said, the code you have is simple to reason about. One request I have is if we're to use persistent buffers, you should allocate the vectors to an appropriate size (for example 16) from the get go. |
| #[derive(Default, Clone)] | ||
| struct IovecBuffer(Vec<libc::iovec>); | ||
|
|
||
| unsafe impl Send for IovecBuffer {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to document these unsafes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send and Sync are usually auto implemented, but not for pointer(libc::iovec contains pointer).
As an empty container, it can be safely move ownership and safely shared between threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but without a comment your code won't pass the CI check. We require safety comments for all unsafes.
| struct IovecBuffer(Vec<libc::iovec>); | ||
|
|
||
| unsafe impl Send for IovecBuffer {} | ||
| unsafe impl Sync for IovecBuffer {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need to be Sync? What happens if you drop it?
|
|
||
| impl Drop for IovecBufferBorrowed<'_> { | ||
| fn drop(&mut self) { | ||
| self.0.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My Rust knowledge on this part is a bit rusty (pun intended), but why clear here?
Do you even need to implement Drop? When IovecBuffer is dropped, its only member will also get dropped, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here IovecBufferBorrowed is a zero-cost wrapper for a mutable borrowed Vec. It is introduced to impl drop manually(instead of just returning &mut Vec<_>).
Clear the Vec when drop can make sure there's no old value inside for the next user. With the Rust borrow checker, we can ensure that the buffer can not be touched by anyone else between a single {write, syscall, clear}(otherwise it failed compiling), and that no old value remains when it is used next time.
This make it works just like creating a new Vec.
IovecBuffer::borrow->Vec::newIovecBufferBorrowed::some_fn->Vec::some_fnbecause of Deref/DerefMutIovecBufferBorrowed::drop->Vec::drop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I misread the code. I thought you were implementing Drop for IovecBuffer.
2f81849 to
5e8ce28
Compare
|
Typo in commit message convertion -> conversion |
5e8ce28 to
f169548
Compare
Thank you, it has been updated. |
net_util/src/queue_pair.rs
Outdated
|
|
||
| impl IovecBuffer { | ||
| const fn new() -> Self { | ||
| IovecBuffer(Vec::new()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well do Vec::with_capacity(4) as from experimentation that is a sensible starting number to use (and avoid the initial reallocations.)
I'm still not 100% convinced this is better than smallvec in terms of maintenance overhead (avoiding heap allocations like this is exactly why we started using smallvec)
But I can see you've put a lot of effort into this work - so with the change I propose above lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I've updated it as 4 as you suggested.
But I can see you've put a lot of effort into this work - so with the change I propose above lgtm.
Please don't do it just because this reason. I believe I can convince you of my point of view. This is a small change and didn't cost much time, but in order to justify this change I read some kernel code and did benchmarks, which was very rewarding and I think it is good.
I did a simple benchmark(but only simulated not with vmm) here: https://gist.github.com/ihciah/dcfaf13393bfada1ff8f372b4fa16e3f
And the result shows even for len=3, borrowed vec only cost 48% CPU time of smallvec(N=4). I believe it is because some stack copying cost is saved. For longer data, borrowed vec saves more. In the link above there's more results and reproducing guide.
Signed-off-by: ihciah <[email protected]>
f169548 to
2bd078d
Compare
Currently when converting desc chain to iovecs, it uses
Vec::new(). To save allocation, this PR proposed to using a long lifetime buffer to save the result.A simulated benchmark about whether use persistent buffer or smallvec: https://gist.github.com/ihciah/dcfaf13393bfada1ff8f372b4fa16e3f