builder: Refactor remote context#31984
Conversation
| RUN find / -xdev -name ba* | ||
| RUN find /tmp/` | ||
| } | ||
| testD := `FROM busybox |
There was a problem hiding this comment.
This is the only UI change in this PR. It seems that the test checks an invalid side-effect caused by the previous implementation. I could make a fake context for this case but don't see a practical reason for it, especially after all the work that was needed in #31236 to hide the Dockerfile if no such file actually exists.
Could also add a second case that pulls a tar. Then it will have the same behavior as the previous test.
There was a problem hiding this comment.
If I understand this correctly the old test expected a dockerfile to exist at Dockerfile and no file named baz to exist.
But the test fixture seems to be just a remote server with a single file at /testD. Where do these extra files come from?
There was a problem hiding this comment.
Previous implementation created a fake context tar after pulling the Dockerfile as having a tar was the only way to start the builder(the implementation didn't bother to create .dockerignore as well so the Dockerfile would not show up). File baz does not exist but is mentioned in the URL. I'm not sure how it would leak, but I don't actually remember why I decided to change the Dockerfile itself. I'll try to change it back and put back the baz assertion.
builder/remotecontext/lazycontext.go
Outdated
| return sum, nil | ||
| } | ||
|
|
||
| func (c *lazyContext) normalize(path string) (cleanPath, fullPath string, err error) { |
There was a problem hiding this comment.
Just a reminder from #31257 (comment)
Also I checked, once you refactored normalize, it is fine to remove the lstat. Any action that was using normalize was doing an Open, or an Lstat, or Remove after it anyways.
6a02f6b to
75c6ebb
Compare
simonferquel
left a comment
There was a problem hiding this comment.
I am not sure if I am best suited for such a review. The changes are quite pervasive over the builder code base and I am not familiar enough yet with the builder code to make constructive feedback
dnephin
left a comment
There was a problem hiding this comment.
I like it, just a few minor comments and questions
builder/builder.go
Outdated
There was a problem hiding this comment.
minor: Comment suggests that maybe Source (namespaced it would be builder.Source) might be a more appropriate name?
builder/dockerfile/detectcontext.go
Outdated
There was a problem hiding this comment.
Looks like bm is not used here, so this could be a function instead of an instance method on buildManager.
builder/dockerfile/detectcontext.go
Outdated
There was a problem hiding this comment.
I often find that using a switch/case here makes the intent a little more obvious, without needing a comment:
switch {
case os.IsNotExist(err):
return nil
case err != nil:
return err
builder/dockerfile/detectcontext.go
Outdated
There was a problem hiding this comment.
No error on invalid .dockerignore format?
There was a problem hiding this comment.
comment still relevant?
s/testRemote/stubRemote/
?
| RUN find / -xdev -name ba* | ||
| RUN find /tmp/` | ||
| } | ||
| testD := `FROM busybox |
There was a problem hiding this comment.
If I understand this correctly the old test expected a dockerfile to exist at Dockerfile and no file named baz to exist.
But the test fixture seems to be just a remote server with a single file at /testD. Where do these extra files come from?
builder/tarsum.go
Outdated
cbe6539 to
e415e86
Compare
e415e86 to
ea52a3d
Compare
builder/dockerfile/builder.go
Outdated
There was a problem hiding this comment.
I don't think we need this field, dockerfile can be just an arg.
builder/dockerfile/builder.go
Outdated
There was a problem hiding this comment.
Can we pass dockerfile this to build() instead? That way it doesn't need to be a field in the struct.
builder/dockerfile/builder.go
Outdated
There was a problem hiding this comment.
This was just removed in a recent PR. I think with the two suggestions above it wont be necessary anymore
builder/dockerfile/imagecontext.go
Outdated
There was a problem hiding this comment.
rename the field here as well? s/ctx/source
builder/dockerfile/internals.go
Outdated
There was a problem hiding this comment.
b here is only used to get the Dockerfile name now, so this could be:
func readAndParseDockerfile(src io.ReadCloser, filename string) (*parser.Result, error {
...
}
builder/dockerfile/internals_test.go
Outdated
There was a problem hiding this comment.
With the suggestion above this could just call readAndParseDockerfile() directly I think.
builder/remotecontext/detect.go
Outdated
ea52a3d to
a1662af
Compare
Redefine a better interface for remote context dependency. Separate Dockerfile build instruction from remote context. Signed-off-by: Tonis Tiigi <[email protected]>
a1662af to
d1faf3d
Compare
|
All green. PTAL |
dnephin
left a comment
There was a problem hiding this comment.
I like this, nice cleanup.
Just a question about the test change, otherwise I think this looks good.
| progressOutput := stdoutFormatter.StreamFormatter.NewProgressOutput(stdoutFormatter.Writer, true) | ||
| progressReader := progress.NewProgressReader(resp.Body, progressOutput, resp.ContentLength, "", "Downloading") | ||
| // Download and dump result to tmp file | ||
| // TODO: add filehash directly |
There was a problem hiding this comment.
Not sure what this means, I just want to make sure that it's not necessary for this PR.
There was a problem hiding this comment.
Because this is a single file there isn't really a need for LazyContext and reading file back from the disk. We could just create a single hash object and pipe data through it. But that is an optimization for edge case so not currently implemented.
| return nil, err | ||
| } | ||
|
|
||
| // TODO: remove, handle dirs in Hash() |
There was a problem hiding this comment.
Yes, currently Hash() only returns hash for a single file so to get the correct hash for a dir we still need to recursively walk it. The todo is about moving the walker code inside Hash so the values from that func don't need post-processing.
| var testD string | ||
| if testEnv.DaemonPlatform() == "windows" { | ||
| testD = `FROM busybox | ||
| COPY * /tmp/ |
There was a problem hiding this comment.
Ahh right, I forgot about this. Thanks!
| @@ -0,0 +1,172 @@ | |||
| package remotecontext | |||
There was a problem hiding this comment.
Elsewhere you renamed Context to Source. I like this, do you think we can do the same for these new *context packages?
There was a problem hiding this comment.
I did not rename LazyContext/TarsumContext/modifiableContext inside this pkg for now. Only in the builder where the conflict with context.Context was very clear. More renames could probably be follow up as this is already quite big.
| if allowWildcards && containsWildcards(origPath) { | ||
| var copyInfos []copyInfo | ||
| if err := context.Walk("", func(path string, info builder.FileInfo, err error) error { | ||
| if err := filepath.Walk(source.Root(), func(path string, info os.FileInfo, err error) error { |
There was a problem hiding this comment.
Are we expecting that this will always be some object on disk?
There was a problem hiding this comment.
Yes, the previous implementation expected it as well. The Path() method was just hidden from the interface but the actual copy could never work if it was not implemented.
depends on #31257only new commit: tonistiigi/docker@nested-build...remote-contextThe builder currently seems to depend on
Contextinterface that could be reimplemented to provide different types of sources. In reality, there is only one context: tarsum. That is because the real functionality for context is finding hashes for paths for cache and tarsum is the only context that can do that. Others are just helpers for making tarsum context from other inputs.Context interface hides direct access to the paths with
.Path()method inbuilder.FileInfo. Actual copy only happens through real paths and doesn't have anything to do with context methods.File hash capability is the main reason
Contextexists but that capability is completely missing from the interface. The implementation just recastsbuilder.FileInfotobuilder.Hashedand would panic if they don't match.ContextandDockerfileare mixed together and builder tries to separate them while it is running. That means recasts toModifiableContextandDockerIgnorContext. Every context passed to builder is alwaysDockerIgnoreContextbut there is nothing predicting that from the function's signature. This has aTODObut there is no way to fix that without separating these concerns.There are also actual bugs that can't be fixed like hardlink handling because the implementation is fixed to Tarsum.
This PR changes the dependent interface to something that better matches the reality. Source is a path from a host and a capability to calculate hashes for paths.
It also renames
ContexttoSourceso that it doesn't collide withcontext.Context.This enables following features:
Cache hashing can be separated from the tarsum package. Raw tar headers are not the correct way calculate file hashes. For example hardlinks are completely broken because of that https://gist.github.com/tonistiigi/775cb15d3918958020bdd0165f776005 . This does not mean breaking the compatibility in non-critical places (yet) as we don't want to invalidate people's cache.
Alternative implementations can be used for defining the remotes. Previous implememtation was centric to specific tar behavior.
No
.dockerignoremagic needs to be implemented for the other implementations. For example Proposal: Incremental build context sending #31829 does not need to worry about this at all.Follow up:
Hashes for directories shouldn't be calculated by the builder. They should be provided same way as they are for the files.
Move allDoneRemoteimplementations (and detection) into a separate package.RemoveDoneModifiableContext. This is now only needed on source detection phase andRemove()function can be directly called on the remote's implementation.