spack containerize: allow copying files into the container by alalazo · Pull Request #14917 · spack/spack

alalazo · 2020-02-12T17:58:12Z

This PR adds the ability to copy arbitrary files or directories:

From the host to the build stage
From the host to the final stage
From the build stage to the final stage

It also adds the ability to inject arbitrary instructions before the Spack environment is concretized and built (i.e. at the beginning of the build stage).

This commit allows to specify extra instructions that will be injected before the Spack environment is concretized and built. Can be useful for any setup operation needed for the build itself.

alalazo · 2020-02-12T18:45:59Z

@samcmill @ChristianTackeGSI

alalazo · 2020-02-12T18:48:47Z

Besides unit tests, this PR was test driven with the following spack.yaml:

spack:
  specs:
  - zlib
  config:
    build_jobs: 1
  packages:
    all:
      target: [broadwell]

  container:
    format: docker
      
    base:
      image: "ubuntu:18.04"
      spack: develop

    copy:
      build:
      - source: ./spack-repo-externals/scitasexternal
        destination: /opt/spack/repo/scitasexternal
      - source: ./afile
        destination: /opt/afile
      final:
      - source: 'build:/opt/file_during_setup'
        destination: '/opt/file_during_setup'
      - source: 'build:/opt/file_after_build'
        destination: '/opt/file_after_build'
      - source: /opt/afile
        destination: /opt/afile
    
    strip: true
    extra_instructions:
      setup: |
        RUN echo "Hello world!" > /opt/file_during_setup
      build: |
        RUN echo "Hello world!" > /opt/file_after_build
      final: |
        RUN touch /opt/file_final_stage

hartzell

I can see where this could be useful and am approving it and/but:

I've only eyeballed the code, don't have anywhere to run it; and
While I can see where this would be useful, it really seems like you're inch-by-inch implementing yet another container build system.

Rather than add all this code to Spack so that people can run commands and add packages before/after/during the build, why not just let them use whatever tools they're already using create images (docker, podman, ...) and specify those via build-image and final-image?

The current approach gives the apparent advantage that everything interesting is happening in the Spack configuration and might e.g. be easier to version control, but that's abrogated by giving the user the ability to run arbitrary commands.

We can also tell people "Spack builds containers, you don't have to understand Docker/Singularity!". It's great that that's true for simple use cases via the basic functionality. But, how much code do we (as if I've written much...) want to write/test/document/lug-around to handle cases that aren't simple and have been well addressed with existing tools (see 927)?

It might be useful to have a single configuration/... that would allow people to generate images for any of their container systems. But, how many sites run both (might be common, might not, I don't know); how committed is Spack to keeping up with the various systems; and e.g. how important is it given that systems like Singularity can ingest Docker images? If we're really going to try to keep up with the market, would a system where the user can pass a template to be rendered be better (then Spack doesn't need to learn to support Bocker.

In the parent PR you mentioned that you need to ensure compatibility between the build and final image. Between the earlier PR and this one, I suspect that you've given us more then enough rope to tie our feet together and break things. Wouldn't it be simpler to just document the expectations and/or build tools that confirm that the preconditions are met and allow users to specify images?

I don't doubt that you can make Spack into a container build system but why not rather make it something that can easily be remixed into existing workflows?

It's not lost on me that this comment is an echo of my confusion about reimplementing a module system w/in spack (spack load). I may be missing something about the use-cases and/or you (Spack) may just be Eclipse while I'm vi (or vice versa).

Either way, and as always, Spack's a great and useful tool! Thanks!

ChristianTackeGSI · 2020-02-13T12:39:39Z

I agree with @hartzell:

We probably should not implement everything directly in spack.
Having a simple spack containerize giving you a nice "standard" container is great for simple cases.
For the complex cases, we should rather have some ways to hook easily into the stuff.

I see a few options:

Let people override the jinja2 template used by spack containerize. Then they can add their apt-get install in the right place, do spack repo add PATH, etc.
Put some magic markers in the output, that people can use to patch the recipe.

I think, what would also help:

Have the recipe, that is used to build the spack enabled base images be part of the output of spack containerize, so that it's much easier to use another base image.

I will look into your patches soon and give feedback anyway.

alalazo · 2020-02-13T13:43:32Z

While I can see where this would be useful, it really seems like you're inch-by-inch implementing yet another container build system.

Where do you see that? Spack will just generate a recipe and stop there.

Rather than add all this code to Spack so that people can run commands and add packages before/after/during the build, why not just let them use whatever tools they're already using create images (docker, podman, ...) and specify those via build-image and final-image?

Hmmm not sure I follow here. Which tools do you use to create a Dockerfile? What is the workflow you propose as an alternative?

We can also tell people "Spack builds containers, you don't have to understand Docker/Singularity!". It's great that that's true for simple use cases via the basic functionality. But, how much code do we (as if I've written much...) want to write/test/document/lug-around to handle cases that aren't simple and have been well addressed with existing tools (see 927)?

Do you have concrete use cases that you can share? Which tools?

It's not lost on me that this comment is an echo of my confusion about reimplementing a module system w/in spack (spack load). I may be missing something about the use-cases and/or you (Spack) may just be Eclipse while I'm vi (or vice versa)

@becker33 can say more about #14062, but that effort to me was mainly to reduce requirements and startup time while not breaking backward compatibility.

alalazo · 2020-02-13T14:00:57Z

@ChristianTackeGSI

For the complex cases, we should rather have some ways to hook easily into the stuff.

That's how I see this PR and #14879. The basic idea behind the containerize command is to help generating recipes based on Spack environments. The additional configuration handles here are supposed to be the hooks for those complex cases.

Coming to your suggestions, I personally see 1. as an even bigger hammer (override the template) and 2. well is not that appealing (modulo errors in the configuration done by the user, I'd like the generated recipe to be a valid one not something that needs further edits).

Have the recipe, that is used to build the spack enabled base images be part of the output of spack containerize, so that it's much easier to use another base image.

That's in the roadmap, but will come later - since it needs some coordination to refactor how those base images are generated.

samcmill · 2020-02-13T15:55:28Z

The copy sections seem slightly redundant with the enhanced extra_instructions. I think you could just do something like:

    extra_instructions:
      setup: |
        COPY ./spack-repo-externals/scitasexternal /opt/spack/repo/scitasexternal

I suppose the benefit of copy is to abstract the docker / singularity syntax differences.

I think the best option may be to allow for custom base images and a "spack_is_already_setup_in_the_base_image" boolean setting (if false, then replicate the spack setup steps in the current base image). That would keep things relatively simple and provide the most flexibility?

alalazo · 2020-02-13T16:31:52Z

The copy sections seem slightly redundant with the enhanced extra_instructions

That's true for Docker, not for Singularity definition files where copies needs to be declared in separate sections.

I suppose the benefit of copy is to abstract the docker / singularity syntax differences

Yes, or at least that was my intent.

ChristianTackeGSI

Looks good to me. Did not yet get to test it.

ChristianTackeGSI · 2020-02-13T16:40:41Z

lib/spack/spack/container/writers/__init__.py

+    def from_build_to_final(self):
+        """Files that needs to be copied from the host to the build stage."""
+        files = self._files_to_be_copied('final')
+        files = [x for x in files if x.source.startswith('build:')]


Maybe strip the build: in the code?

Suggested change

files = [x for x in files if x.source.startswith('build:')]

files = [x[6:] for x in files if x.source.startswith('build:')]

Taking a second look at it, this wont work, because x is an object with src and dest in it.
Could you move the CopyItem defintion to module level and then we could do

Suggested change

files = [x for x in files if x.source.startswith('build:')]

files = [CopyItem(source=x.source[6:], destination=x.destionation) for x in files if x.source.startswith('build:')]

ChristianTackeGSI · 2020-02-13T16:41:59Z

share/spack/templates/container/Dockerfile

 COPY --from=builder {{ paths.view }} {{ paths.view }}
 COPY --from=builder /etc/profile.d/z10_spack_environment.sh /etc/profile.d/z10_spack_environment.sh
+{% for item in from_build_to_final %}
+COPY --from=builder {{ item.source | replace('build:', '') }} {{ item.destination }}


And not here.

Suggested change

COPY --from=builder {{ item.source | replace('build:', '') }} {{ item.destination }}

COPY --from=builder {{ item.source }} {{ item.destination }}

ChristianTackeGSI · 2020-02-13T16:43:17Z

share/spack/templates/container/singularity.def

  {{ paths.view }} /opt
  {{ paths.environment }}/environment_modifications.sh {{ paths.environment }}/environment_modifications.sh
+{% for item in from_build_to_final %}
+  {{ item.source | replace('build:', '') }} {{ item.destination }}


And also not here.

Suggested change

{{ item.source | replace('build:', '') }} {{ item.destination }}

{{ item.source }} {{ item.destination }}

ChristianTackeGSI · 2020-02-13T16:59:12Z

For the complex cases, we should rather have some ways to hook easily into the stuff.

That's how I see this PR and #14879. The basic idea behind the containerize command is to help generating recipes based on Spack environments. The additional configuration handles here are supposed to be the hooks for those complex cases.

Fine!

I think, one should be careful to add new features that can be implemented by the user using extra_instructions.

Coming to your suggestions, I personally see 1. as an even bigger hammer (override the template)

Well, depends. For people used to jinja2 it's really nice.

and 2. well is not that appealing (modulo errors in the configuration done by the user, I'd like the generated recipe to be a valid one not something that needs further edits).

Well, those magic markers would have be comments in the normally generated recipe.
We can do all that now with extra_instructions.

Have the recipe, that is used to build the spack enabled base images be part of the output of spack containerize, so that it's much easier to use another base image.

That's in the roadmap, but will come later - since it needs some coordination to refactor how those base images are generated.

Good!

hartzell · 2020-02-13T23:27:33Z

As I said, this looks like it will add useful functionality given this design and I've approved it (though I can't test it).

Someone once told me "Developers will develop...." They were decrying the loss of open space land around the Bay Area, but I've come to realize that it's also true of software engineers (aka "me" and "you").

In that spirit, there's another architecture for this set of features, that makes a different set of tradeoffs, in which Spack's containerization support is loosely coupled to the code that sets up the build and final images. Spack wouldn't support any before/after steps (less code, fewer tests, ...) and/but users would need/get to use other tools to add variation to the basic happy path.

Maybe it's better, maybe it's not.... Details below.

While I can see where this would be useful, it really seems like > you're inch-by-inch implementing yet another container build system.

Where do you see that? Spack will just generate a recipe and stop there.

Imagine the following conversation:

Person 1: I build my Docker images by editing a Dockerfile and running docker build.
Person 2: That's great, we build our Docker images by adding stuff like this to our WORKSPACE file and running Bazel (I've never tried this!)
Person 3: Wow, that's cool, we edit spack.yaml and then run spack containerize > Dockerfile; docker build.

Among those groups, I count some number > 2 and <= 3 of systems for configuring and building Docker images (person 3's system isn't completely general, but it's very flexible and configuration is tracked in yet another location).

Rather than add all this code to Spack so that people can run > commands and add packages before/after/during the build, [...]

Hmmm not sure I follow here. Which tools do you use to create a Dockerfile? What is the workflow you propose as an alternative?

Well, in an attempt to keep this conversation smiley and comfortable:

But more seriously...

There are lots and lots of tools for building Docker images (not just Dockerfiles) from specifications. Some of them use Docker and/or Dockerfiles, some are stand-alone. E.g.

[...] > to handle cases that aren't simple and have been well addressed > with existing tools (see 927)?

Do you have concrete use cases that you can share? Which tools?

The last time I was involved in building production images was two+ years back (e.g. predating multistage builds). We were concerned with reproducibility and particularly worried about tags moving and about image's "content addressable IDs" not being consistently calculated across releases of the Docker registry code or across vendors (e.g. the JFrog registry). We kept our images in our own repositories because we needed to be able to make very strong assurances to our customers by way of our lawyers and etc....

Given that, we built images in stages, hierarchically:

Use a simple Docker file that created a copy of a particular version of a base OS at a particular release and build. We would tag this in our registry and promise ourselves that we wouldn't move the tag.

This protected us from specifying centos:7 and getting 7.7.1908 today (https://hub.docker.com/_/centos) and having gotten 7.1.1503 back in 2015 (this archive.org link is hit or miss... https://web.archive.org/web/20150905150129/https://hub.docker.com/_/centos/).
Then, based on the above image, run a Dockerfile that ran yum update and updated the various packages that were installed and perhaps installed some really-really-necessary additional basic packages.
Then, based on the ..., run a Dockerfile that added our generally required packages, authentication bits, certificates, and other localisms ....
Then, based on ..., run a bunch of Dockerfiles that built images for our various tools/applications.

If/when we needed to update system packages, we'd branch off at step 2 and create a new tree full of images. If we needed to rebuild an individual app image we'd simply update its Dockerfile and rerun its bit of step 4.

At each step, it didn't really matter how the incoming image was created, all that mattered was that it met its "contract".

An alternative loosely coupled architecture for what you're building here would not include the ability to run/add/copy before/after the Spack specific bits and instead say that one needs to supply a pair of images to the containerize command, with the following characteristics:

the "build" image must have everything required for Spack to build your particular set of applications; and
the "final" image must have every non-Spack thing required to run the things built in the "build" step.

For some people step 1 might include the static libraries that Go needs or a Fortran compiler or .... Others might not need those things. Some people might need to run additional steps after the second step to touch up the image.

It would be great if the Spack project provides a couple of build/final image pairs that meet these requirements for people with simple cases (this is already true).

It would also be great if the Spack project provided the well-commented Dockerfiles that it uses to build the above images so that people that are almost-but-not-quite simple can just touch them up and create their own build/final images. This would also let them manage those images with whatever CI/CD and registry systems that they're using (required to use). The Dockerfiles are available, but there's no way to feed the result into containerize.

And, since users are free to specify any build or final image they please, one might use a locally produced image that's based on nVidia's CUDA support, Alpine Linux, or includes all of the bits (sidecars) that a resource manager vendor (perhaps Univa, but I'm out of date) requires but doesn't provide in an install-able form.

These two architectures really do seem to mirror the IDE vs. vi(emacs)+formatter+linter+... "discussions" that come up regularly. On the one hand, a user only needs to learn one tool and can then do [only] the things that the tool can do. On the other hand, a user needs to learn multiple tools and is able to do anything that they can imagine by combining them or adding new tools.

That's also how I see this as similar to spack load support. There's already a good solution (lmod) that's loosely coupled to Spack and generally used. I'm well aware that I'm not the only user (and perhaps not very representative) and that other folks have other use cases and etc..., but I've never needed it and would be personally content if that code didn't exist and the resources were spent in other ways (except of course that my co-Spackers might be frustrated and sad).

alalazo · 2020-02-14T08:11:29Z

@hartzell Thanks for the extensive reply

Well, in an attempt to keep this conversation smiley and comfortable: [ ... ] There are lots and lots of tools for building Docker images (not just Dockerfiles) from specifications.

I think that's exactly my point: here we are trying to substitute emacs (nothing will...), but we definitely don't want to cross the line and produce anything more than a Dockerfile. See also this reply #14202 (comment)

An alternative loosely coupled architecture for what you're building here would not include the ability to run/add/copy before/after the Spack specific bits and instead say that one needs to supply a pair of images to the containerize command, with the following characteristics: [ ... ]

I completely hear you on this and I think it's probably worth comparing the two approaches. More flexibility, less maintenance for us, more effort required to users.

ChristianTackeGSI · 2020-02-14T12:50:06Z

I completely hear you on this and I think it's probably worth comparing the two approaches. More flexibility, less maintenance for us, more effort required to users.

In that sense, allowing users to replace the jinja2 templates might be the "least maintainance for spack" option. Possibly. And yes, it requires a lot more effort by users.
I don't say, this is the best way, just want to say that this is an option to consider when going the "small tools doing their job well" route.

hartzell · 2020-02-14T17:57:29Z

[...] it's probably worth comparing the two approaches [...]

Sounds good. You folks have a wider viewpoint than I do and can probably balance the costs and benefits.

victorusu · 2020-07-03T09:51:15Z

@alalazo and @hartzell I would vote for this PR to be approved.
The reason being that one cannot use the containerize feature with some of the Spack supported applications, e.g. Amber.
What would be your solution for such cases?

alalazo · 2020-07-03T09:58:30Z

@victorusu I think this should be superseded by #15028. You can always provide a custom base image where the file is copied in the right place, so the functionality is there right?

victorusu · 2020-07-03T11:09:36Z

@victorusu I think this should be superseded by #15028. You can always provide a custom base image where the file is copied in the right place, so the functionality is there right?

@alalazo I do not agree. I think that in this case, the solution to create a custom base container and use it is not very neat.
If I have to copy the file into a base container, I would download Spack into that container and run spack myself without using the containerize feature.

alalazo · 2020-07-03T11:22:06Z

@victorusu This can be done in two different ways imo:

Create a base image with the files you need in there
Have a https: mirror somewhere serving the files and hook it in at container build time

Solution 1. in the simplest case involves building something like:

FROM spack/ubuntu-bionic
COPY src dest

If you think of Amber, you would then use that base image for all your builds of Amber. Which use cases do you foresee for that? I can only think of cases where the software is proprietary and sources are released on a commercial license.

alalazo · 2020-07-03T11:34:54Z

The main design point is the following. Currently spack containerize produces recipes that are not depending on the build context. #15028 doesn't change that but generalizes what can be done by relying on the base images being available and respecting a few prerequisites.

Since the format in #15028 can already handle this use case, going forward I see as a better option writing some other command/sub-command that helps users to generate a valid base image, rather than adding more and more commands like tried here to tweak the final recipe in various ways.

hartzell · 2020-07-04T16:01:11Z

Currently spack containerize produces recipes that are not depending on the build context.

I think that this is the key bit and it's quite useful. #15028 provides the basis for fancier things.

alalazo · 2020-11-17T19:26:16Z

Superseded by #15028

alalazo added 2 commits February 12, 2020 16:13

Allow extra instructions at the beginning of the build stage

3a313da

This commit allows to specify extra instructions that will be injected before the Spack environment is concretized and built. Can be useful for any setup operation needed for the build itself.

Files can be copied into the container from host of previous stages

45ed18d

alalazo added feature A feature is missing in Spack containers labels Feb 12, 2020

alalazo self-assigned this Feb 12, 2020

alalazo requested review from becker33, hartzell and tgamblin February 12, 2020 18:45

hartzell approved these changes Feb 13, 2020

View reviewed changes

ChristianTackeGSI approved these changes Feb 13, 2020

View reviewed changes

alalazo mentioned this pull request Feb 17, 2020

spack containerize: permit to customize the base images #15028

Merged

tgamblin self-assigned this Mar 24, 2020

alalazo closed this Nov 17, 2020

alalazo deleted the features/extra_instructions_on_setup branch November 17, 2020 19:26

	files = [x for x in files if x.source.startswith('build:')]
	files = [x[6:] for x in files if x.source.startswith('build:')]

	files = [x for x in files if x.source.startswith('build:')]
	files = [CopyItem(source=x.source[6:], destination=x.destionation) for x in files if x.source.startswith('build:')]

	COPY --from=builder {{ item.source \| replace('build:', '') }} {{ item.destination }}
	COPY --from=builder {{ item.source }} {{ item.destination }}

	{{ item.source \| replace('build:', '') }} {{ item.destination }}
	{{ item.source }} {{ item.destination }}

Conversation

alalazo commented Feb 12, 2020

Uh oh!

alalazo commented Feb 12, 2020

Uh oh!

alalazo commented Feb 12, 2020

Uh oh!

hartzell left a comment

Choose a reason for hiding this comment

Uh oh!

ChristianTackeGSI commented Feb 13, 2020

Uh oh!

alalazo commented Feb 13, 2020

Uh oh!

alalazo commented Feb 13, 2020

Uh oh!

samcmill commented Feb 13, 2020

Uh oh!

alalazo commented Feb 13, 2020

Uh oh!

ChristianTackeGSI left a comment

Choose a reason for hiding this comment

Uh oh!

ChristianTackeGSI Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChristianTackeGSI Feb 13, 2020

Choose a reason for hiding this comment

Uh oh!

ChristianTackeGSI Feb 13, 2020

Choose a reason for hiding this comment

Uh oh!

ChristianTackeGSI commented Feb 13, 2020

Uh oh!

hartzell commented Feb 13, 2020

Uh oh!

alalazo commented Feb 14, 2020

Uh oh!

ChristianTackeGSI commented Feb 14, 2020

Uh oh!

hartzell commented Feb 14, 2020

Uh oh!

victorusu commented Jul 3, 2020

Uh oh!

alalazo commented Jul 3, 2020

Uh oh!

victorusu commented Jul 3, 2020

Uh oh!

alalazo commented Jul 3, 2020

Uh oh!

alalazo commented Jul 3, 2020

Uh oh!

hartzell commented Jul 4, 2020

Uh oh!

alalazo commented Nov 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ChristianTackeGSI Feb 13, 2020 •

edited

Loading