-
Notifications
You must be signed in to change notification settings - Fork 2
State Machine for Migration Receive Code #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
State Machine for Migration Receive Code #34
Conversation
On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>
This will be useful later when we rebuild the live migration code as a state machine. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>
... and nuke some Option<> while I was there. Given that HashMap has a usable default and we end up passing an empty HashMap anyway, just get rid of the Option. On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>
Previously, state that we accumulated during the migration process in the receiver was kept in `mut Option` variables or HashMaps. The problem is that it is unclear in the code when these variables can be safely used. It's also difficult to add new state, such as the state we need to handle the upcoming feature for performing the migration using multiple connections. To solve this, I've refactored the code to use the state machine pattern. Each state carries the data it needs to. Any state that didn't arrive yet (memory_files, memory_manager) cannot be accessed until we are in the proper state. Some benefits that fall out of this: - We now respond to all requests, even invalid ones, at least with an error message. - Any error handling a request will result in an error message being sent. - There is only a single place where responses are sent and thus it's very hard to mess this up in the code. - The main protocol state machine fits on a screen. I would argue that especially the error cases are now much more consistent. There is still a lot to be done. There is still state transfer via self.vm and similar. In an ideal world, this would also be carried by the state machine. I also want to see better handling of payloads, which are still handled all over the place, but this change is already too big. :) On-behalf-of: SAP [email protected] Signed-off-by: Julian Stecklina <[email protected]>
Testing Done
|
|
|
||
| /// Handle a migration command and advance the protocol state machine. | ||
| /// | ||
| /// **Note**: This function is responsible for consuming any payloads! It also must |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by responsible for consuming any payloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each Request can have a payload. This is indicated by req.length() != 0. At the moment, this function must make sure that these payloads are read from the socket. If this is not the case, the generic request handling code will lose "synchronization" and mix up request headers and payloads.
This is a deficiency in the design that existed before as well. I considered introducing an abstraction to make this harder to misuse, but this is very intrusive to the existing code (hence rebase hell). The idea would be to wrap the SocketStream in a Payload struct. Code can then read the payload via the struct or the struct goes out of scope and its Drop can either consume/ignore the payload or panic if the payload was not read.
The simpler design of just reading the payload in one place doesn't work, because for Memory requests the payload is very large and we don't want to copy it multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice state machine!
The following is a very minor nit that you can choose to ignore: I noticed that vm_receive_migration_step makes the assumption that it does not get called with either of the Completed or Aborted states. If you want to then you can enforce that in the type system by splitting off those two states into a new enum MigrationTermination and changing the return signature of vm_receive_migration_step to be
Result<ControlFlow<MigrationTermination, ReceiveMigrationState>, Error>
vmm/src/lib.rs
Outdated
| if existing_memory_files.is_none() { | ||
| existing_memory_files = Some(HashMap::default()) | ||
| } | ||
| existing_memory_files.push(Self::vm_receive_memory_fd(&mut socket)?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Why not just insert the result of Self::vm_receive_memory_fd into the given hashmap?
I see you removed this code in the next commit so probably not relevant anyway ...
|
That's a good suggestion @olivereanderson ! I would leave it as is for now. But feel free to try to implement your idea. |
This is split off from #30. See there for the longer discussion.
The receiver is now implemented via an explicit state machine. This limits in what order we accept migration commands, but it makes it much easier to carry state around from one state to another.
The state machine is this:
stateDiagram-v2 direction TB [*] --> Started: Start Started --> MemoryFdsReceived: MemoryFd MemoryFdsReceived --> MemoryFdsReceived: MemoryFd Started --> Configured: Config MemoryFdsReceived --> Configured: Config Configured --> Configured: Memory Configured --> StateReceived: State StateReceived --> Completed: CompleteThis state machine accepts migrations from previous Cloud Hypervisor versions. Note that previous Cloud Hypervisor versions were much more liberal in what order they accepted migration commands.