incusd/fsmonitor: Read multiple fanotify events#2961
Merged
Conversation
Contributor
Author
|
Seems the GitHub diff viewer doesn't handle the new indent well, |
fanotify(7) states:
> After a successful read(2), the read buffer contains one or more of
> the following structures [events].
...
> For performance reasons, it is recommended to use a large buffer size
> (for example, 4096 bytes), so that multiple events can be retrieved by a
> single read(2).
func getEvents was only processing one event from a buffer which
potentially contained more, leading to dropped events as observed with
a watch on /dev/loop0p{1,2} where frequently only one partition would
make it to the Incus device directory.
Unfortunately, just reading more events did not solve the issue. It
seems the watch callbacks spawned in goroutines would race and again
result in missing partitions in the device directory. I suspect the real
culprit behind issue lxc#2404 was not reading multiple events from the
buffer as is fixed here. To address the concern about kernel-dropped
events, I've added a warning here to diagnose but fanotify(7) seems to
indicate this will be rare (I certainly never saw it trigger):
> FAN_Q_OVERFLOW
> The event queue exceeded the limit of 16384 entries.
In testing I used a small 2-partition img file, two unix-block devices
on loop0p{1,2} with required=false, and following script:
devs=/var/lib/incus/devices/warpstation/unix.hotplug-dev-loop0p
loops() { echo ${devs}${1}.dev-loop0p$1; }
while true; do
losetup -d /dev/loop0
losetup -P /dev/loop0 loopit.img
sleep 1
if [ ! -e $(loops 1) ] || [ ! -e $(loops 2) ]; then
ls -l ${devs}*
break
fi
done
On Incus 6.21 stable, it fails within the first few iterations
(increasing the timeout did not help either). With this patch it has not
once failed over several minutes.
Additionally, a potential fd leak is addressed for the file handle.
Signed-off-by: Gabriel Totev <[email protected]>
Member
Haha, a lot better indeed! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fanotify(7)states:func getEventswas only processing one event from a buffer which potentially contained more, leading to dropped events as observed with a watch on/dev/loop0p{1,2}where frequently only one partition would make it to the Incus device directory.Unfortunately, just reading more events did not solve the issue. It seems the watch callbacks spawned in goroutines would race and again result in missing partitions in the device directory. I suspect the real culprit behind issue #2404 was not reading multiple events from the buffer as is fixed here. To address the concern about kernel-dropped events, I've added a warning here to diagnose but
fanotify(7)seems to indicate this will be rare (I certainly never saw it trigger):In testing I used a small 2-partition img file, two unix-block devices on
loop0p{1,2}withrequired=false, and following script:On Incus 6.21 stable, it fails within the first few iterations (increasing the timeout did not help either). With this patch it has not once failed over several minutes.
Additionally, a potential fd leak is addressed for the file handle.