-
Notifications
You must be signed in to change notification settings - Fork 225
libimage: fix manifest race during listing #2400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK to the general idea (we don’t hold the storage lock long enough to get a consistent snapshot, and if a manifest list goes away between MultiList and querying the “dangling” property, it’s reasonable to consider the components dangling).
I’m worried about silently returning invalid data on I/O errors, especially when that can lead to removing images, though. Can this be restricted to only silently ignore the two relevant errors?
| if err != nil { | ||
| return nil, err | ||
| } | ||
| // ignore errors, common errors are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx is unused as of now, lint test is failing.
I wasn't sure if these would be the only ones it can encounter and I wanted to be on the safer side on |
… ugh, this can theoretically violate causality:
Is that possible? Worth worrying about? Right now we don’t have the infrastructure to do anything else… I guess we would want something like |
if we talk about operations like So overall I don't think we need to worry to about this case to much here. All I care about is to make the commands at least consistent so that they at least don't randomly fail which is far worse IMO. |
|
WFM. I have little sympathy for a user adding a reference to an untagged image, while running |
|
LGTM |
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, feel free to merge as is.
I saw a flake in parallel podman testing, podman images can fail if the manifest was removed at the right time. In general listing should never be able to fail when another image or manifest is removed in parallel. Change the logic to convert to manifest and only collect the digests in the success case and ignore all other errors to make the listing more robust. I observed the following error from podman images: Error: locating image "xxx" for loading instance list: locating image with ID "xxx": image not known Signed-off-by: Paul Holzinger <[email protected]>
All other errors are returned wrapped with the image ID so do the same when the manifest blobl decoding fails. Signed-off-by: Paul Holzinger <[email protected]>
|
created a podman test PR just to see if this breaks anything containers/podman#25840 |
|
Podman PR looks good, PTAL again |
|
/lgtm |
|
Thanks! |
I saw a flake in parallel podman testing, podman images can fail if the manifest was removed at the right time. In general listing should never be able to fail when another image or manifest is removed in parallel.
Change the logic to convert to manifest and only collect the digests in the success case and ignore all other errors to make the listing more robust.
I observed the following error from podman images: Error: locating image "xxx" for loading instance list: locating image with ID "xxx": image not known