-
Notifications
You must be signed in to change notification settings - Fork 267
store: new API ApplyStagedLayer #1826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: new API ApplyStagedLayer #1826
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d190380 to
ba0d7f7
Compare
store.go
Outdated
|
|
||
| StagingDirectory string | ||
| DiffOutput *drivers.DriverWithDifferOutput | ||
| DiffOptions *drivers.ApplyDiffWithDifferOpts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another opportunity to clean up that ApplyDiffWithDiffer and ApplyDiffFromStagingDirectory should have separate options types, so that callers aren’t tempted to set completely ignored options.
I guess that’s non-blocking…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the API can be improved, and this is a good chance. More in details, how would you improve this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure I have the whole picture … All I really wanted here was to have separate ApplyDiffWithDifferOptions and ApplyDiffFromStagingDirectoryOptions, and to drop fields that are not relevant to either operation.
But looking at ApplyDiffOptions below, even that seems fairly invasive to do to the maximum extent.
Looking a bit further… A lot of this might be better as separate PRs, and probably not right now?
- Is there any caller of
ApplyDiffWithDifferwriting to a pre-existing layer? If not, maybe that can be just dropped. In the driver, theusingComposefspaths have already diverged, which seem like a good reason to either share more code (not in this PR!), or to remove the redundant code path entirely. - From a c/image perspective, we now have two “apply” functions which are ~opposites of each other, and it always takes me a bit to tell which one is “stage” and which one is “commit”. So something like
s/ApplyDiffWithDiffer/StageChangesWithDiffer/would be nice — but that also somewhat depends on the above. - Actually having
ApplyDiffOptionsinsideApplyDiffWithDifferOptionsis not a good semantic match at all —ApplyDiffOptionsall revolve around theApplyDiffOptions.Diffstream, and that is completely ignored in those paths. - … and using the
graphdriver.Differinterface to connect the overlay driver and the zstd deduplicator code seems like an imprecise fit as well; maybe that should be just a c-storage-private interface with no exposed methods, so that c/storage can change the mechanics over time. - as mentioned elsewhere, it would be convenient for
CleanupStagingDirectoryto consumeDriverWithDifferOutputso that c/image doesn’t need to care about.Targetat all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A much more general, and much more vague, thought is that the “stage”+“commit” model might be interesting for non-chunked layers as well.
Right now, on the PutBlob path, c/image:
putBlobToPendingFilestores the input stream into a file, fully consuming it. (That must happen to validate digests.) That is fully parallel.commitLayerextracts the file into a the graph driver’s layer. On some graph drivers (DM, VFS), that is inherently serial; in overlay, that is just a tar extraction which could, in principle, be fully parallel- creates the layer record, with parent links, etc. That is inherently serial, or maybe it could be parallel but it is anyway cheap enough not to worry about.
It seems potentially interesting to see whether the “extract tar” part could be parallelized — and whether it would be better. I can imagine that this part is I/O heavy enough that doing two of them would just slow things down; that probably needs building and measuring.
The "store stream” + “extract" + “commit" parts seem very similar to what the chunked path does in the “convert” case. OTOH “true chunked” input does’t have a stream in the first place…
But then, worrying about the traditional tar layers is backwards-looking. What would an ideal API for creating a composefs layer look like? I didn’t look into that and I have no idea. Would that be relevant for building the chunked one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… I think Dan would bite our head off if we worked on changing the traditional-tar API right now :)
|
Thanks for working on this! |
0a76801 to
5e58781
Compare
|
@mtrmac thanks for the review. I've addressed your comments, except one pending question: #1826 (comment) |
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the API, after looking a bit more, seems to to require a bunch of fairly intrusive changes all at once… so I don’t think it is worth blocking this PR on them.
Or maybe you can see some way to make all of that simple.
store.go
Outdated
|
|
||
| StagingDirectory string | ||
| DiffOutput *drivers.DriverWithDifferOutput | ||
| DiffOptions *drivers.ApplyDiffWithDifferOpts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure I have the whole picture … All I really wanted here was to have separate ApplyDiffWithDifferOptions and ApplyDiffFromStagingDirectoryOptions, and to drop fields that are not relevant to either operation.
But looking at ApplyDiffOptions below, even that seems fairly invasive to do to the maximum extent.
Looking a bit further… A lot of this might be better as separate PRs, and probably not right now?
- Is there any caller of
ApplyDiffWithDifferwriting to a pre-existing layer? If not, maybe that can be just dropped. In the driver, theusingComposefspaths have already diverged, which seem like a good reason to either share more code (not in this PR!), or to remove the redundant code path entirely. - From a c/image perspective, we now have two “apply” functions which are ~opposites of each other, and it always takes me a bit to tell which one is “stage” and which one is “commit”. So something like
s/ApplyDiffWithDiffer/StageChangesWithDiffer/would be nice — but that also somewhat depends on the above. - Actually having
ApplyDiffOptionsinsideApplyDiffWithDifferOptionsis not a good semantic match at all —ApplyDiffOptionsall revolve around theApplyDiffOptions.Diffstream, and that is completely ignored in those paths. - … and using the
graphdriver.Differinterface to connect the overlay driver and the zstd deduplicator code seems like an imprecise fit as well; maybe that should be just a c-storage-private interface with no exposed methods, so that c/storage can change the mechanics over time. - as mentioned elsewhere, it would be convenient for
CleanupStagingDirectoryto consumeDriverWithDifferOutputso that c/image doesn’t need to care about.Targetat all
store.go
Outdated
|
|
||
| StagingDirectory string | ||
| DiffOutput *drivers.DriverWithDifferOutput | ||
| DiffOptions *drivers.ApplyDiffWithDifferOpts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A much more general, and much more vague, thought is that the “stage”+“commit” model might be interesting for non-chunked layers as well.
Right now, on the PutBlob path, c/image:
putBlobToPendingFilestores the input stream into a file, fully consuming it. (That must happen to validate digests.) That is fully parallel.commitLayerextracts the file into a the graph driver’s layer. On some graph drivers (DM, VFS), that is inherently serial; in overlay, that is just a tar extraction which could, in principle, be fully parallel- creates the layer record, with parent links, etc. That is inherently serial, or maybe it could be parallel but it is anyway cheap enough not to worry about.
It seems potentially interesting to see whether the “extract tar” part could be parallelized — and whether it would be better. I can imagine that this part is I/O heavy enough that doing two of them would just slow things down; that probably needs building and measuring.
The "store stream” + “extract" + “commit" parts seem very similar to what the chunked path does in the “convert” case. OTOH “true chunked” input does’t have a stream in the first place…
But then, worrying about the traditional tar layers is backwards-looking. What would an ideal API for creating a composefs layer look like? I didn’t look into that and I have no idea. Would that be relevant for building the chunked one?
store.go
Outdated
|
|
||
| StagingDirectory string | ||
| DiffOutput *drivers.DriverWithDifferOutput | ||
| DiffOptions *drivers.ApplyDiffWithDifferOpts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… I think Dan would bite our head off if we worked on changing the traditional-tar API right now :)
5e58781 to
cc2d131
Compare
|
LGTM |
cc2d131 to
ec4f82c
Compare
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I’m afraid this isn’t sufficient.
It is fine for a running system: it prevents other processes from observing the WIP layer.
But it doesn’t handle crashes sufficiently. For that, the layer metadata needs to be saved to disk with incompleteFlag (so that if we are recovering from a crash, we delete everything), and after the contents are set up, the flag is removed again.
(Or, hypothetically, we could first write the on-disk contents and only afterwards write the layer metadata?? But that would be a new unproven code path, and we would have to worry about re-creating a layer on top of previously partially-created files. Seems risky, when the other path is well-understood.)
Very roughly speaking, I think this can be done by applying the staged data from inside layerStore.create, around the place where applyDiffWithOptions is called for non-chunked layers.
|
The API design LGTM. For the record:
This path in c/image also needs to check for, and succeed with, |
ec4f82c to
41deea0
Compare
you are right, we need to use the incompleteFlag. I've pushed a new version where I set the incompleteFlag, barely tested. I'll work more on it through the day |
41deea0 to
a9c9b49
Compare
|
@mtrmac what do you think of the last version? |
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d rather prefer if the incompleteFlag remained an internal implementation detail of layers.go.
a9c9b49 to
e5163a1
Compare
moved the |
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a half-way step, but the partialoption is still basically “leave the layer incomplete” and/or “I promise to callapplyDiffFromStagingDirectorylater” (with neither effect documented), withstore.go` being responsible for that.
It seems to me
if diff != nil {
if size, err = r.applyDiffWithOptions …
+ else if staged != nil {
+ if size, err = r.applyDiffFromStagingDirectory …
else { …should be possible and and not too invasive a change.
34fbf02 to
4c51bea
Compare
|
pushed a new version, I moved the |
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK.
this is needed by the following commit. Signed-off-by: Giuseppe Scrivano <[email protected]>
enforce that the stagingDirectory must have the same value as the diffOutput.Target variable. It allows to simplify the internal API. Signed-off-by: Giuseppe Scrivano <[email protected]>
Add a race-condition-free alternative to using CreateLayer and ApplyDiffFromStagingDirectory, ensuring the store is locked for the entire duration while the layer is being created and populated. Signed-off-by: Giuseppe Scrivano <[email protected]>
It uses the diff output as input and callers are not expected to know about the Target directory. Signed-off-by: Giuseppe Scrivano <[email protected]>
4c51bea to
d36d6c1
Compare
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks!
|
@mtrmac the related change in c/image: containers/image#2301 |
Add a race-condition-free alternative to using CreateLayer and ApplyDiffFromStagingDirectory, ensuring the store is locked for the entire duration while the layer is being created and populated.
Signed-off-by: Giuseppe Scrivano [email protected]
The relative patch for c/image is: