Skip to content

builder: Refactor remote context#31984

Merged
cpuguy83 merged 1 commit intomoby:masterfrom
tonistiigi:remote-context
Apr 27, 2017
Merged

builder: Refactor remote context#31984
cpuguy83 merged 1 commit intomoby:masterfrom
tonistiigi:remote-context

Conversation

@tonistiigi
Copy link
Member

@tonistiigi tonistiigi commented Mar 21, 2017

depends on #31257

only new commit: tonistiigi/docker@nested-build...remote-context

The builder currently seems to depend on Context interface that could be reimplemented to provide different types of sources. In reality, there is only one context: tarsum. That is because the real functionality for context is finding hashes for paths for cache and tarsum is the only context that can do that. Others are just helpers for making tarsum context from other inputs.

Context interface hides direct access to the paths with .Path() method in builder.FileInfo. Actual copy only happens through real paths and doesn't have anything to do with context methods.

File hash capability is the main reason Context exists but that capability is completely missing from the interface. The implementation just recasts builder.FileInfo to builder.Hashed and would panic if they don't match.

Context and Dockerfile are mixed together and builder tries to separate them while it is running. That means recasts to ModifiableContext and DockerIgnorContext. Every context passed to builder is always DockerIgnoreContext but there is nothing predicting that from the function's signature. This has a TODO but there is no way to fix that without separating these concerns.

There are also actual bugs that can't be fixed like hardlink handling because the implementation is fixed to Tarsum.

This PR changes the dependent interface to something that better matches the reality. Source is a path from a host and a capability to calculate hashes for paths.

It also renames Context to Source so that it doesn't collide with context.Context.

This enables following features:

Follow up:

  • Hashes for directories shouldn't be calculated by the builder. They should be provided same way as they are for the files.

  • Move all Remote implementations (and detection) into a separate package. Done

  • Remove ModifiableContext. This is now only needed on source detection phase and Remove() function can be directly called on the remote's implementation. Done

@tonistiigi tonistiigi changed the title Remote context Refactor remote context parsing Mar 21, 2017
RUN find / -xdev -name ba*
RUN find /tmp/`
}
testD := `FROM busybox
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only UI change in this PR. It seems that the test checks an invalid side-effect caused by the previous implementation. I could make a fake context for this case but don't see a practical reason for it, especially after all the work that was needed in #31236 to hide the Dockerfile if no such file actually exists.

Could also add a second case that pulls a tar. Then it will have the same behavior as the previous test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly the old test expected a dockerfile to exist at Dockerfile and no file named baz to exist.

But the test fixture seems to be just a remote server with a single file at /testD. Where do these extra files come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous implementation created a fake context tar after pulling the Dockerfile as having a tar was the only way to start the builder(the implementation didn't bother to create .dockerignore as well so the Dockerfile would not show up). File baz does not exist but is mentioned in the URL. I'm not sure how it would leak, but I don't actually remember why I decided to change the Dockerfile itself. I'll try to change it back and put back the baz assertion.

return sum, nil
}

func (c *lazyContext) normalize(path string) (cleanPath, fullPath string, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder from #31257 (comment)

Also I checked, once you refactored normalize, it is fine to remove the lstat. Any action that was using normalize was doing an Open, or an Lstat, or Remove after it anyways.

@tonistiigi tonistiigi changed the title Refactor remote context parsing builder: Refactor remote context Mar 25, 2017
Copy link
Contributor

@simonferquel simonferquel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I am best suited for such a review. The changes are quite pervasive over the builder code base and I am not familiar enough yet with the builder code to make constructive feedback

@vdemeester vdemeester requested a review from dnephin April 7, 2017 13:37
Copy link
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, just a few minor comments and questions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Comment suggests that maybe Source (namespaced it would be builder.Source) might be a more appropriate name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like bm is not used here, so this could be a function instead of an instance method on buildManager.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often find that using a switch/case here makes the intent a little more obvious, without needing a comment:

switch {
case  os.IsNotExist(err):
    return nil
case err != nil:
    return err

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error on invalid .dockerignore format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment still relevant?

s/testRemote/stubRemote/

?

RUN find / -xdev -name ba*
RUN find /tmp/`
}
testD := `FROM busybox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly the old test expected a dockerfile to exist at Dockerfile and no file named baz to exist.

But the test fixture seems to be just a remote server with a single file at /testD. Where do these extra files come from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New unit test for this?

Copy link
Member

@dnephin dnephin Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this field, dockerfile can be just an arg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass dockerfile this to build() instead? That way it doesn't need to be a field in the struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just removed in a recent PR. I think with the two suggestions above it wont be necessary anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the field here as well? s/ctx/source

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b here is only used to get the Dockerfile name now, so this could be:

func readAndParseDockerfile(src io.ReadCloser, filename string) (*parser.Result, error {
...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the suggestion above this could just call readAndParseDockerfile() directly I think.

Copy link
Member

@dnephin dnephin Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I got most of your changes from #27270 into #32580, which is merged.

However this funtion looks pretty good as is. What's the advantage to returning the parser.Result{} here?

Redefine a better interface for remote context dependency.

Separate Dockerfile build instruction from remote context.

Signed-off-by: Tonis Tiigi <[email protected]>
@tonistiigi
Copy link
Member Author

All green. PTAL

Copy link
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, nice cleanup.

Just a question about the test change, otherwise I think this looks good.

progressOutput := stdoutFormatter.StreamFormatter.NewProgressOutput(stdoutFormatter.Writer, true)
progressReader := progress.NewProgressReader(resp.Body, progressOutput, resp.ContentLength, "", "Downloading")
// Download and dump result to tmp file
// TODO: add filehash directly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this means, I just want to make sure that it's not necessary for this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a single file there isn't really a need for LazyContext and reading file back from the disk. We could just create a single hash object and pipe data through it. But that is an optimization for edge case so not currently implemented.

return nil, err
}

// TODO: remove, handle dirs in Hash()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently Hash() only returns hash for a single file so to get the correct hash for a dir we still need to recursively walk it. The todo is about moving the walker code inside Hash so the values from that func don't need post-processing.

var testD string
if testEnv.DaemonPlatform() == "windows" {
testD = `FROM busybox
COPY * /tmp/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not work anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see #31984 (review)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh right, I forgot about this. Thanks!

Copy link
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tonistiigi tonistiigi requested a review from cpuguy83 April 26, 2017 23:05
@@ -0,0 +1,172 @@
package remotecontext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere you renamed Context to Source. I like this, do you think we can do the same for these new *context packages?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not rename LazyContext/TarsumContext/modifiableContext inside this pkg for now. Only in the builder where the conflict with context.Context was very clear. More renames could probably be follow up as this is already quite big.

if allowWildcards && containsWildcards(origPath) {
var copyInfos []copyInfo
if err := context.Walk("", func(path string, info builder.FileInfo, err error) error {
if err := filepath.Walk(source.Root(), func(path string, info os.FileInfo, err error) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting that this will always be some object on disk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the previous implementation expected it as well. The Path() method was just hidden from the interface but the actual copy could never work if it was not implemented.

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants