-
Notifications
You must be signed in to change notification settings - Fork 565
disks: use advisory locks for disk image files #6974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bceac2b to
2b03eb1
Compare
|
This should either be opt-in (for backwards compat) or at the very least have a way to opt-out. I'm pretty sure there's a test that migrates on the same host that will fail. |
8560a88 to
17f6654
Compare
|
@iggy I think you're right. I'll look into how QEMU solves this and add a second commit soon. |
1767409 to
0d71e3f
Compare
|
There is an intentional use case here where you
Yes - you will need to add an option to |
|
From what I've seen, there is no code I could reuse for a tristate parsing ( I'd like to suggest that the default behavior from now on is:
What do you think, @rbradford? |
We discussed [0] that the new behaviour should be made configurable with a tristate: off,warn,on. We add it to the next release with "warn" by default and "on" for the release after that. [0] cloud-hypervisor#6974 (comment)
I think I would prefer the enum of three different values as if you know what you're doing and you don't want a warn spamming you then you can choose "off". You can use |
We discussed [0] that the new behaviour should be made configurable with
a tristate: off,warn,on.
off => ignore
warn => Warn when disk-images are already used by other processes and
this VM wants to have exclusive access. Don't acquire any locks.
on => Enforce the acquiring of advisory file-system locks for disk
image files.
"warn" is the default for now and we will soon switch to "on"
[0]
cloud-hypervisor#6974 (comment)
We discussed [0] that the new behaviour should be made configurable with
a tristate: off,warn,on.
off => ignore
warn => Warn when disk-images are already used by other processes and
this VM wants to have exclusive access. Don't acquire any locks.
on => Enforce the acquiring of advisory file-system locks for disk
image files.
"warn" is the default for now and we will soon switch to "on"
[0]
cloud-hypervisor#6974 (comment)
Signed-off-by: Philipp Schuster <[email protected]>
We discussed [0] that the new behaviour should be made configurable with
a tristate: off,warn,on.
off => ignore
warn => Warn when disk-images are already used by other processes and
this VM wants to have exclusive access. Don't acquire any
locks.
on => Enforce the acquiring of advisory file-system locks for disk
image files.
"warn" is the default for now and we will soon switch to "on"
[0]
cloud-hypervisor#6974 (comment)
Signed-off-by: Philipp Schuster <[email protected]>
ba8b35c to
d404fd4
Compare
|
I adjusted the CLI as discussed. I'm happy for a review of that part. I'll look into a solution for the live-migration as last step.
|
We discussed [0] that the new behaviour should be made configurable with
a tristate: off,warn,on.
off => ignore
warn => Warn when disk-images are already used by other processes and
this VM wants to have exclusive access. Don't acquire any
locks.
on => Enforce the acquiring of advisory file-system locks for disk
image files.
"warn" is the default for now and we will soon switch to "on"
[0]
cloud-hypervisor#6974 (comment)
Signed-off-by: Philipp Schuster <[email protected]>
0ec8668 to
1ca0780
Compare
|
@blitz What's the reason this PR is still draft? Can we move ahead with it? |
|
Yikes, took quite a while. The CI integration tests are finally green. Since your last major review, the following changes have been added:
A comprehensive write-up is in the commit message. Excerpt: |
e117b5f to
6ef9f73
Compare
rbradford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor feedback about the last commit.
This simplifies debugging of running VMs. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
This is a prerequisite for the next steps. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
As we can't use BorrowedFd, we should at least create a similar safe alternative. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
This is a prerequisite for the next steps. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
# What This commit introduces file-based advisory locking for the files backing up the block devices by using the fcntl() syscall with OFD locks. The per-open-file-descriptor (OFD) locks are more robust than traditional POSIX locks (F_SETLK) as they are not tied to process IDs and avoid common issues in multithreaded or multi-fd scenarios [1]. Therefore, we don't use `std::fs::File::try_lock()`, which is backed by F_SETLKW. The locking mechanism is aware of the `readonly` property and allows `n` readers or `1` writer (exclusive mode). As the locks are advisory, multiple cloud-hypervisor processes can prevent themselves from writing to the same file. However, this is not a system-wide file-system level locking mechanism preventing to open() a file. The introduced new locking mechanism does not cover vhost-user devices. # Why To prevent misconfiguration and improve safety, it is good practice to protect disk image files with a locking mechanism. Experience and common best practices suggest that advisory locks are preferable over mandatory locks due to better compatibility and fewer pitfalls (in fs space). The introduced functionality is aligned with the approach taken by QEMU [0], and is also recommended in [1]. # Implementation Details We need to ensure that not only normal operation keeps working but also state save/resume and live-migration. Especially for live migration, it is crucial that the sender VMM releases the locks when the VM stops so the receiver VMM can acquire them right after that. Therefore, the locking and releasing happen directly on the block device struct. The device manager knows all block devices and can forward requests to these types. Last but not least, this commit uses on explicit lock acquiring but implicit lock releasing (FD close). It only explicitly releases the locks where this integrates more smoothly into the existing code. # Testing I tested - normal operation - state save/resume, - device hot plugging, - and live-migration with read/shared and write/exclusive locks. One can use the `fcntl-tool` to test if locks are actually acquired or released [2]. # Links [0] https://github.com/qemu/qemu/blob/825b96dbcee23d134b691fc75618b59c5f53da32/util/osdep.c#L266 [1] https://apenwarr.ca/log/20101213 [2] https://crates.io/crates/fcntl-tool Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
The new locking behavior uncovered that unfortunate test situation: Many tests running in parallel access the same disk image with rw permissions. Luckily, none of the tests actually writes to the disk. Therefore, we can set it to readonly=true. In case this changes, the test needs to be moved to the sequential test module. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
|
Great that this is merged! Feel free to ping me in case any bug reports are incoming; I'm happy to help out |
About
Add advisory file locks for disk images, aligning the behavior with QEMU. Please check out the commit messages for more details.
Hints for Reviewers
Please review this commit-by-commit.
Steps to Undraft