-
Notifications
You must be signed in to change notification settings - Fork 565
Add support for cross-host live migration over TCP #6850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vmm/src/vm.rs
Outdated
| impl Pausable for Vm { | ||
| fn pause(&mut self) -> std::result::Result<(), MigratableError> { | ||
| event!("vm", "pausing"); | ||
| info!("Pause VM for migration"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log may not be right, 'cause vm may pause from api request also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version has fixed this problem:)
Thanks
a8dbb4c to
8425ec4
Compare
rbradford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add integration tests.
Integration tests have been added in the new pushed PR:) Thanks |
|
@ljrcore Would you mind rebasing to apply the new clippy fix? |
Done. Thanks |
|
@blitz The live migration documentation has been updated. |
vmm/src/lib.rs
Outdated
| vm.send_memory_fds(unix_socket)?; | ||
| } | ||
| SocketStream::Tcp(tcp_socket) => { | ||
| vm.send_memory_fds_tcp(tcp_socket)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of --local is to not need to read/write the memory and instead use SCM_RIGHTS to move the fd across the process boundaries. --local makes no sense with TCP and should generate an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I modified the code as you suggested and if the user performs a local migration over TCP they will get an error message:)
Thanks
vmm/src/lib.rs
Outdated
| )) | ||
| })?; | ||
| } | ||
| SocketStream::Tcp(tcp_socket) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case should be an error - MemoryFd command should not be sent over TCP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case should be an error -
MemoryFdcommand should not be sent over TCP.
Yes, TCP does not use MemoryFd. The TCP related code has been removed here:)
Thanks
vmm/src/vm.rs
Outdated
|
|
||
| impl Pausable for Vm { | ||
| fn pause(&mut self) -> std::result::Result<(), MigratableError> { | ||
| info!("VM has been paused"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO here is not the right place for this and contradicts just the line below, would line 2476 and lines after be more appropriate 🧐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO here is not the right place for this and contradicts just the line below, would line 2476 and lines after be more appropriate 🧐
I agree:)
vmm/src/lib.rs
Outdated
| // For TCP sockets, we cannot transfer file descriptors | ||
| warn!( | ||
| "MemoryFd command received over TCP socket, which is not supported" | ||
| ); | ||
| Response::error().write_to(&mut socket).map_err(|e| { | ||
| MigratableError::MigrateReceive(anyhow!( | ||
| "Error sending response: {}", | ||
| e | ||
| )) | ||
| })?; | ||
| return Err(MigratableError::MigrateReceive(anyhow!( | ||
| "MemoryFd command not supported over TCP sockets" | ||
| ))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since send_memory_fd would (and should) never be invoked on TcpSocket, should we change the whole section to unreachable!("MemoryFd command not supported over TCP sockets")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since
send_memory_fdwould (and should) never be invoked onTcpSocket, should we change the whole section tounreachable!("MemoryFd command not supported over TCP sockets")
You are right. TCP does not use MemoryFd, so I will remove the TCP related code here:)
e0c509f to
c3ea7fa
Compare
|
Hi, @rbradford A check failed, how can I fix it? |
rbradford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good - just some refactoring I think would be helpful!
vmm/src/lib.rs
Outdated
| // Set up the socket connection | ||
| let socket = if send_data_migration.destination_url.starts_with("tcp:") { | ||
| let address = &send_data_migration.destination_url[4..]; | ||
| info!("Connecting to TCP socket at {}", address); | ||
| let socket = TcpStream::connect(address).map_err(|e| { | ||
| MigratableError::MigrateSend(anyhow!("Error connecting to TCP socket: {}", e)) | ||
| })?; | ||
| SocketStream::Tcp(socket) | ||
| } else { | ||
| let path = Self::socket_url_to_path(&send_data_migration.destination_url)?; | ||
| info!("Connecting to UNIX socket at {:?}", path); | ||
| let socket = UnixStream::connect(path).map_err(|e| { | ||
| MigratableError::MigrateSend(anyhow!("Error connecting to UNIX socket: {}", e)) | ||
| })?; | ||
| SocketStream::Unix(socket) | ||
| }; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract this into a method that returns Result<SocketStream,...>
vmm/src/lib.rs
Outdated
| // Determine whether to use TCP or UNIX listener based on receiver_url | ||
| let listener = if receive_data_migration.receiver_url.starts_with("tcp:") { | ||
| let address = &receive_data_migration.receiver_url[4..]; | ||
| let listener = TcpListener::bind(address).map_err(|e| { | ||
| MigratableError::MigrateReceive(anyhow!("Error binding to TCP socket: {}", e)) | ||
| })?; | ||
| SocketListener::Tcp(listener) | ||
| } else { | ||
| let path = Self::socket_url_to_path(&receive_data_migration.receiver_url)?; | ||
| let listener = UnixListener::bind(&path).map_err(|e| { | ||
| MigratableError::MigrateReceive(anyhow!("Error binding to UNIX socket: {}", e)) | ||
| })?; | ||
| SocketListener::Unix(listener, path) | ||
| }; | ||
|
|
||
| Response::ok().write_to(&mut socket)?; | ||
| } | ||
| Command::Complete => { | ||
| info!("Complete Command Received"); | ||
| if let Some(ref mut vm) = self.vm.as_mut() { | ||
| vm.resume()?; | ||
| Response::ok().write_to(&mut socket)?; | ||
| } else { | ||
| warn!("VM not created yet"); | ||
| Response::error().write_to(&mut socket)?; | ||
| } | ||
| break; | ||
| } | ||
| Command::Abandon => { | ||
| info!("Abandon Command Received"); | ||
| self.vm = None; | ||
| self.vm_config = None; | ||
| Response::ok().write_to(&mut socket).ok(); | ||
| break; | ||
| } | ||
| // Accept the connection and get the socket | ||
| let socket = match listener { | ||
| SocketListener::Unix(listener, path) => { | ||
| let (socket, _addr) = listener.accept().map_err(|e| { | ||
| MigratableError::MigrateReceive(anyhow!("Error unlinking UNIX socket: {}", e)) | ||
| })?; | ||
| std::fs::remove_file(path).map_err(|e| { | ||
| MigratableError::MigrateReceive(anyhow!("Error accepting on TCP socket: {}", e)) | ||
| })?; | ||
| SocketStream::Unix(socket) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, I think this could be extracted to a function.
vmm/src/lib.rs
Outdated
| let socket = match listener { | ||
| SocketListener::Unix(listener, path) => { | ||
| let (socket, _addr) = listener.accept().map_err(|e| { | ||
| MigratableError::MigrateReceive(anyhow!("Error unlinking UNIX socket: {}", e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this error message?
It could be flakes, you could pin me to restart the failed job to find out |
Got it, thanks:) |
Thanks for your suggestion, the code has been updated. |
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution. Generally looking good to me. Some comments below.
One note, I can see that using TCP socket can reduce the live migration downtime over using Unix socket, but I won't draw the conclusion that it reduces the downtime by 50%. The live migration downtime depends a lot on the workload being running.
Thanks for your review. The live migration downtime does depend a lot on the workload being run. I got this data when I applied memory pressure using the following command. If the memory pressure changes, the data will change accordingly. Maybe it is also related to the test environment. I used a 4c8g virtual machine for testing. |
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you accidentally dropped the commit that adds an integration test.
I took a closer look at the PR and have some additional comments below.
Bump vm-memory from 0.16.0 to 0.16.1 to include the implementations of ReadVolatile and WriteVolatile for TcpStream. Signed-off-by: Jinrong Liang <[email protected]>
|
Thanks for your patience and careful review of my code. I have modified and updated the code according to your suggestions. I appreciate your time and effort in this process. |
@TimePrinciple Hi, Ruoqing Could you restart failed jobs? |
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost forgot this last one issue. Other than this, I think this PR is ready to be landed.
Add support for cross-host live migration over TCP, which significantly improves performance in cross-host live migration scenarios compared to the Unix socket forwarding method. Signed-off-by: Jinrong Liang <[email protected]>
Updated the live migration documentation to include instructions for performing cross-host live migrations over TCP connections. Signed-off-by: Jinrong Liang <[email protected]>
Add integration test for tcp live migration to ensure live migration functions as expected. Signed-off-by: Jinrong Liang <[email protected]>
Summary
This pull request updates the vm-memory crate from version 0.16.0 to 0.16.1, incorporating the necessary implementations for TcpStream. It introduces TCP-based cross-host live migration support in Cloud Hypervisor, enhancing migration performance and simplifying the process. Additionally, the live migration documentation has been updated to provide instructions for executing cross-host live migrations over TCP connections.
Changes Made
build: Bumpvm-memoryfrom0.16.0to0.16.1vm-memorycrate dependency to version0.16.1to include the implementations ofReadVolatileandWriteVolatileforTcpStream, which are necessary for TCP-based live migration.vmm: Add support for cross-host live migration over TCPvm_send_migrationandvm_receive_migrationfunctions to supportdestination_urlandreceiver_urlstarting withtcp:, allowing live migration over TCP sockets.vm_send_migrationandvm_receive_migration, simplifying the migration functions.docs: Add documentation for cross-host TCP live migrationTest
We conducted tests to compare the performance of the new TCP-based cross-host live migration with the existing Unix socket migration using
socatfor cross-host communication. The key metrics measured were the Total Migration Time and Downtime experienced during the migration process.Testing Environment
Host Machines: Both the source and destination hosts are equipped with identical hardware specifications.
Virtual Machine Under Test:
Memory Pressure Application
Inside the guest VM, we applied memory pressure using the following command:
Estimating Migration Time and Downtime from Logs
The migration time and downtime were estimated using timestamps from the Cloud Hypervisor logs. The methodology is as follows:
VmSendMigrationAPI request is received.Migration completelog entry).VM has been pausedlog entry).Log Examples and Calculations
TCP Migration Example
Calculations:
Test Results
TCP Migration Results
Unix Socket Migration with Socat Results
Analysis
socattakes longer, averaging approximately 11,566 ms.The implementation of TCP-based cross-host live migration significantly improves migration performance when compared to Unix socket migration using
socat. By reducing both the total migration time and the downtime by over 50%, this enhancement increases the efficiency and reliability of live migrations in cross-host scenarios.