Multiple TCP Connections for Migration #30

blitz · 2025-10-28T14:22:39Z

Summary

This PR implements VM migration using multiple TCP connections.

The send-migration HTTP command now accepts a connections parameter (defaults to 1) that specifies how many connections to use for migration.

Ticket: https://github.com/cobaltcore-dev/cobaltcore/issues/205

Design

If connections is larger than 1, the sender will attempt to establish additional TCP connections to the same migration destination. The main (initial) connection handles most of the migration protocol. Additional connections handle only Memory commands for transferring chunks of VM memory.

We use a thread-per-connection model. This adds the burden of creating, keeping track of and tearing threads down. But given the I/O model of Cloud Hypervisor this seemed to be the only feasible option.

For each iteration of sending memory, the MemoryTable that describes dirty memory will be split into chunks of fixed size (CHUNK_SIZE). These chunks will then be distributed among the available threads in a round-robin fashion. Threads can have a configurable backlog of outstanding chunks to send (BUFFERED_REQUESTS_PER_THREAD). This is currently 1 to avoid accumulating too much backlog on connections that encountered throughput issues.

We still use the original request-response scheme. Since we don't pipeline requests, but always wait for the other side to acknowledge them, we have a fundamental limit on throughput that we can reach. The original code only expected one ACK for the whole dirty memory table. We now have one ACK per chunk.

I've come up with this formula of the upper bound of throughput per connection:

effective_throughput = chunk_size / (chunk_size / throughput_per_connection + round_trip_time)

This formula is also in the code. I've played around with this and with large enough chunks, the impact seems negligable, especially since we can scale up connections. Feel free to plug in your favorite numbers.

In a better world, we would pipeline requests and not wait for responses. The good thing: The sender is in full control over the parameters, so we can make them configurable if necessary. But I would rather keep this complexity from users.

Noteworthy Issues

New

Teardown of threads will hang if the sender stops sending data on a connection. But then the whole VM migration will not succeed anyhow. This is difficult to fix without rewriting some of the I/O code and probably not worth it for now.

Still There

The code scans the whole dirty bitmap with one thread with an unoptimized loop and then stores the result in a large vector. The dirty bitmap can reach hundreds of MB in size on huge instances. For lots of non-contiguous writes, the size of the resulting MemoryTable can be huge as well. Speed up dirty_log() (a lot) #36
The code only starts sending memory once the whole bitmap is scanned.

These issues give the VM time to mutate more memory and thus lengthen the time needed for migration.

Noteworthy Changes in the Code

More Consistent Error Handling

Previously, the protocol handling had some issues, such as not sending responses for unknown commands etc. Now we always send a response. Some errors resulted in error responses, some errors resulted in dying. This should all be consistent now and can be changed in one place.

What's still missing is to handle payloads for all migration commands. For now any payload bytes are ignored in the most inconvenient way (going out-of-sync between sender and receiver).

State Machine for Receiver

Moved to #34.

You may ask where the additional connections come into play: When we enter the Configured state, the sender will establish more connections to send memory and tear them down, once we are done sending memory.

Left to be done

phip1611

Wow, excellent work, Julian! Given the time you had and the state of the code, I think you've came up with an excellent trade-off in refactoring and adding the new functionality. I absolutely love the state machine you introduced! Further, kudos for the awesome git commit hygiene!

Regarding my review: I mostly left some nits. If the open discussion points are resolved, I'm happy to merge this,

src/bin/ch-remote.rs

vmm/src/api/mod.rs

vmm/src/lib.rs

phip1611 · 2025-10-29T09:59:47Z

vmm/src/lib.rs

    /// destination.
+    ///
+    /// When this function returns, all memory has been sent and acknowledged.
    fn send_memory(


this handles all memory of a certain memory iteration, right?

That's how it is currently used. But it doesn't have to be.

vmm/src/lib.rs

phip1611 · 2025-10-29T12:40:18Z

Could you split this in three PRs, please?

unrelated misc improvements
state machine
multiple parallel connections?

I think the first two would be merged very easily. I only see potential for further discussions in the last one.

blitz · 2025-11-05T14:39:27Z

@phip1611 I've updated the todo list above with the current status. Let's chat tomorrow whether you need anything else for handover.

phip1611 · 2025-11-10T14:07:23Z

Regarding the open TODOs by Julian: I think we can ignore some, for example the OpenAPI stuff. It is sufficient if we do this when we upstream this. Regarding the remaining testing: I try to take care of it as part of our hackathon this week

amphi · 2025-11-11T15:56:31Z

vmm/src/lib.rs

+            let (send, recv) = std::sync::mpsc::sync_channel::<SendMemoryThreadMessage>(
+                Self::BUFFERED_REQUESTS_PER_THREAD,
+            );


Maybe a stupid question, but: Do I understand it correctly, that you create a mpsc channel per sending thread? And if yes: wouldn't it be more efficient to have some sort of a single producer multiple consumer channel? (like async channel). Then you could just put all partitions into that channel and the threads will take them when they are free.

That's also possible. I didn't know whether we want to add new dependencies. But this would also solve the problem of buffering too many chunks per thread, which is a problem if one connection becomes slow.

Yeah I think it would have some advantages. @phip1611 what do you think? Maybe it would be worthwhile to change this?

I think this is a follow-up / next step 🤔

i think so too. Please open a ticket

@amphi could you please open a ticket here?

I opened an issue: https://github.com/cobaltcore-dev/cobaltcore/issues/322

This is not wired up to anywhere yet. We will use this to establish multiple connections for live migration. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

This has no functional change, but it is a requirement to remove the lock that used to obtain the MemoryManager instance. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

... to avoid having to grab a lock when we receive a chunk of memory over the migration socket. This will come in handy when we have multiple threads for receiving memory. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

To allow for accepting more connections in the migration receive code paths, we need to keep track of the listener. This commit adds a thin abstraction to be able to hold on to it regardless of whether it is a UNIX domain or TCP socket. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

We keep the listening socket around and accept as many connections as the sender wants to open. There are still some problems: We never tear these threads down again. We will handle this in subsequent commits. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

In anticipation of using multiple threads for sending memory, refactor the sending code to be in a single place. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

... to be able to re-use it when establishing multiple send connections. I moved the receive socket creation out for symmetry. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

... to simplify sending memory from multiple connections in future commits. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

For sending memory over multiple connections, we need a way to split up the work. With these changes, we can take a memory table and chop it into same-sized chunks for transmit. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

This does not actually use the additional connections yet, but we are getting closer! On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

This will stop us from listening for more connections on the TCP socket when migration has finished. Tearing down the individual connections will come in a subsequent commit. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

... after the VM migration finishes. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

We don't die if we don't manage to establish more than the initial connection to the other side. To limit the weird failure cases, we do die when the other side only accepts some connections, but not all. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

This restores the old behavior/performance in case, we don't use multiple connections. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

tpressure · 2025-11-12T08:57:08Z

While testing this branch, I realized that we see a

Error:vmm/src/lib.rs:1006 -- Failed to wake up other threads: Invalid argument (es error 22)

This happens on every live migration :(

phip1611 · 2025-11-12T15:36:32Z

I cannot edit or push to the branch of this PR. Before we waste time fixing this, I open a new one.

blitz changed the base branch from main to gardenlinux October 28, 2025 14:23

blitz force-pushed the multiple-tcp branch from 0baf491 to ebca0e5 Compare October 28, 2025 15:26

phip1611 requested changes Oct 29, 2025

View reviewed changes

blitz force-pushed the multiple-tcp branch 7 times, most recently from 92aad24 to 3908982 Compare November 4, 2025 14:09

blitz mentioned this pull request Nov 4, 2025

State Machine for Migration Receive Code #34

Merged

blitz force-pushed the multiple-tcp branch from 3908982 to 0a40694 Compare November 7, 2025 13:31

blitz mentioned this pull request Nov 7, 2025

Speed up dirty_log() (a lot) #36

Merged

1 task

blitz force-pushed the multiple-tcp branch from 0a40694 to d35e0df Compare November 10, 2025 15:02

phip1611 mentioned this pull request Nov 11, 2025

Speed up dirty_log() (a lot) cloud-hypervisor/cloud-hypervisor#7468

Merged

1 task

amphi reviewed Nov 11, 2025

View reviewed changes

blitz added 11 commits November 12, 2025 08:40

vm-migration: add connections field to HTTP API

6dada39

This is not wired up to anywhere yet. We will use this to establish multiple connections for live migration. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

vmm: inline functions in vm_receive_memory

41b384b

This has no functional change, but it is a requirement to remove the lock that used to obtain the MemoryManager instance. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

vmm: use the same function to send memory

f2885bf

In anticipation of using multiple threads for sending memory, refactor the sending code to be in a single place. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

vmm: move socket creation out of impl Vmm

e0298ee

... to be able to re-use it when establishing multiple send connections. I moved the receive socket creation out for symmetry. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

vmm: move sending memory to the other migration code

a6964f9

... to simplify sending memory from multiple connections in future commits. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

vmm: funnel memory data via additional connections abstraction

c58cefe

This does not actually use the additional connections yet, but we are getting closer! On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

vmm: actually send memory using multiple connections

cbfd33c

On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

blitz added 5 commits November 12, 2025 08:40

vmm: tear down memory receive threads

1d72929

... after the VM migration finishes. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

vmm: don't send memory chunked if we only have one connection

4116e33

This restores the old behavior/performance in case, we don't use multiple connections. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>

fixup! vm-migration: add connections field to HTTP API

cc3260c

blitz force-pushed the multiple-tcp branch from d35e0df to cc3260c Compare November 12, 2025 07:40

phip1611 marked this pull request as ready for review November 12, 2025 10:36

phip1611 marked this pull request as draft November 12, 2025 10:36

phip1611 closed this Nov 12, 2025

phip1611 mentioned this pull request Nov 12, 2025

Multiple TCP Connections for Migration #39

Merged

Multiple TCP Connections for Migration #30

Multiple TCP Connections for Migration #30

Uh oh!

Conversation

blitz commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Noteworthy Issues

New

Still There

Noteworthy Changes in the Code

More Consistent Error Handling

State Machine for Receiver

Left to be done

Uh oh!

phip1611 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

phip1611 commented Oct 29, 2025

Uh oh!

blitz commented Nov 5, 2025

Uh oh!

phip1611 commented Nov 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blitz Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tpressure commented Nov 12, 2025

Uh oh!

phip1611 commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

blitz commented Oct 28, 2025 •

edited

Loading

phip1611 left a comment •

edited

Loading

blitz Nov 11, 2025 •

edited

Loading