spack containerize: allow copying files into the container#14917
spack containerize: allow copying files into the container#14917alalazo wants to merge 2 commits intospack:developfrom
Conversation
This commit allows to specify extra instructions that will be injected before the Spack environment is concretized and built. Can be useful for any setup operation needed for the build itself.
|
Besides unit tests, this PR was test driven with the following spack:
specs:
- zlib
config:
build_jobs: 1
packages:
all:
target: [broadwell]
container:
format: docker
base:
image: "ubuntu:18.04"
spack: develop
copy:
build:
- source: ./spack-repo-externals/scitasexternal
destination: /opt/spack/repo/scitasexternal
- source: ./afile
destination: /opt/afile
final:
- source: 'build:/opt/file_during_setup'
destination: '/opt/file_during_setup'
- source: 'build:/opt/file_after_build'
destination: '/opt/file_after_build'
- source: /opt/afile
destination: /opt/afile
strip: true
extra_instructions:
setup: |
RUN echo "Hello world!" > /opt/file_during_setup
build: |
RUN echo "Hello world!" > /opt/file_after_build
final: |
RUN touch /opt/file_final_stage |
hartzell
left a comment
There was a problem hiding this comment.
I can see where this could be useful and am approving it and/but:
-
I've only eyeballed the code, don't have anywhere to run it; and
-
While I can see where this would be useful, it really seems like you're inch-by-inch implementing yet another container build system.
Rather than add all this code to Spack so that people can run commands and add packages before/after/during the build, why not just let them use whatever tools they're already using create images (docker, podman, ...) and specify those via build-image and final-image?
The current approach gives the apparent advantage that everything interesting is happening in the Spack configuration and might e.g. be easier to version control, but that's abrogated by giving the user the ability to run arbitrary commands.
We can also tell people "Spack builds containers, you don't have to understand Docker/Singularity!". It's great that that's true for simple use cases via the basic functionality. But, how much code do we (as if I've written much...) want to write/test/document/lug-around to handle cases that aren't simple and have been well addressed with existing tools (see 927)?
It might be useful to have a single configuration/... that would allow people to generate images for any of their container systems. But, how many sites run both (might be common, might not, I don't know); how committed is Spack to keeping up with the various systems; and e.g. how important is it given that systems like Singularity can ingest Docker images? If we're really going to try to keep up with the market, would a system where the user can pass a template to be rendered be better (then Spack doesn't need to learn to support Bocker.
In the parent PR you mentioned that you need to ensure compatibility between the build and final image. Between the earlier PR and this one, I suspect that you've given us more then enough rope to tie our feet together and break things. Wouldn't it be simpler to just document the expectations and/or build tools that confirm that the preconditions are met and allow users to specify images?
I don't doubt that you can make Spack into a container build system but why not rather make it something that can easily be remixed into existing workflows?
It's not lost on me that this comment is an echo of my confusion about reimplementing a module system w/in spack (
spack load). I may be missing something about the use-cases and/or you (Spack) may just be Eclipse while I'm vi (or vice versa).Either way, and as always, Spack's a great and useful tool! Thanks!
|
I agree with @hartzell: We probably should not implement everything directly in spack. I see a few options:
I think, what would also help:
I will look into your patches soon and give feedback anyway. |
Where do you see that? Spack will just generate a recipe and stop there.
Hmmm not sure I follow here. Which tools do you use to create a
Do you have concrete use cases that you can share? Which tools?
@becker33 can say more about #14062, but that effort to me was mainly to reduce requirements and startup time while not breaking backward compatibility. |
That's how I see this PR and #14879. The basic idea behind the Coming to your suggestions, I personally see 1. as an even bigger hammer (override the template) and 2. well is not that appealing (modulo errors in the configuration done by the user, I'd like the generated recipe to be a valid one not something that needs further edits).
That's in the roadmap, but will come later - since it needs some coordination to refactor how those base images are generated. |
|
The I suppose the benefit of I think the best option may be to allow for custom base images and a "spack_is_already_setup_in_the_base_image" boolean setting (if false, then replicate the spack setup steps in the current base image). That would keep things relatively simple and provide the most flexibility? |
That's true for Docker, not for Singularity definition files where copies needs to be declared in separate sections.
Yes, or at least that was my intent. |
ChristianTackeGSI
left a comment
There was a problem hiding this comment.
Looks good to me. Did not yet get to test it.
| def from_build_to_final(self): | ||
| """Files that needs to be copied from the host to the build stage.""" | ||
| files = self._files_to_be_copied('final') | ||
| files = [x for x in files if x.source.startswith('build:')] |
There was a problem hiding this comment.
Maybe strip the build: in the code?
| files = [x for x in files if x.source.startswith('build:')] | |
| files = [x[6:] for x in files if x.source.startswith('build:')] |
Taking a second look at it, this wont work, because x is an object with src and dest in it.
Could you move the CopyItem defintion to module level and then we could do
| files = [x for x in files if x.source.startswith('build:')] | |
| files = [CopyItem(source=x.source[6:], destination=x.destionation) for x in files if x.source.startswith('build:')] |
| COPY --from=builder {{ paths.view }} {{ paths.view }} | ||
| COPY --from=builder /etc/profile.d/z10_spack_environment.sh /etc/profile.d/z10_spack_environment.sh | ||
| {% for item in from_build_to_final %} | ||
| COPY --from=builder {{ item.source | replace('build:', '') }} {{ item.destination }} |
There was a problem hiding this comment.
And not here.
| COPY --from=builder {{ item.source | replace('build:', '') }} {{ item.destination }} | |
| COPY --from=builder {{ item.source }} {{ item.destination }} |
| {{ paths.view }} /opt | ||
| {{ paths.environment }}/environment_modifications.sh {{ paths.environment }}/environment_modifications.sh | ||
| {% for item in from_build_to_final %} | ||
| {{ item.source | replace('build:', '') }} {{ item.destination }} |
There was a problem hiding this comment.
And also not here.
| {{ item.source | replace('build:', '') }} {{ item.destination }} | |
| {{ item.source }} {{ item.destination }} |
Fine! I think, one should be careful to add new features that can be implemented by the user using
Well, depends. For people used to jinja2 it's really nice.
Well, those magic markers would have be comments in the normally generated recipe.
Good! |
|
As I said, this looks like it will add useful functionality given this design and I've approved it (though I can't test it). Someone once told me "Developers will develop...." They were decrying the loss of open space land around the Bay Area, but I've come to realize that it's also true of software engineers (aka "me" and "you"). In that spirit, there's another architecture for this set of features, that makes a different set of tradeoffs, in which Spack's containerization support is loosely coupled to the code that sets up the build and final images. Spack wouldn't support any before/after steps (less code, fewer tests, ...) and/but users would need/get to use other tools to add variation to the basic happy path. Maybe it's better, maybe it's not.... Details below.
Imagine the following conversation:
Among those groups, I count some number > 2 and <= 3 of systems for configuring and building Docker images (person 3's system isn't completely general, but it's very flexible and configuration is tracked in yet another location).
Well, in an attempt to keep this conversation smiley and comfortable: But more seriously... There are lots and lots of tools for building Docker images (not just Dockerfiles) from specifications. Some of them use Docker and/or Dockerfiles, some are stand-alone. E.g.
The last time I was involved in building production images was two+ years back (e.g. predating multistage builds). We were concerned with reproducibility and particularly worried about tags moving and about image's "content addressable IDs" not being consistently calculated across releases of the Docker registry code or across vendors (e.g. the JFrog registry). We kept our images in our own repositories because we needed to be able to make very strong assurances to our customers by way of our lawyers and etc.... Given that, we built images in stages, hierarchically:
If/when we needed to update system packages, we'd branch off at step 2 and create a new tree full of images. If we needed to rebuild an individual app image we'd simply update its Dockerfile and rerun its bit of step 4. At each step, it didn't really matter how the incoming image was created, all that mattered was that it met its "contract". An alternative loosely coupled architecture for what you're building here would not include the ability to run/add/copy before/after the Spack specific bits and instead say that one needs to supply a pair of images to the containerize command, with the following characteristics:
For some people step 1 might include the static libraries that Go needs or a Fortran compiler or .... Others might not need those things. Some people might need to run additional steps after the second step to touch up the image. It would be great if the Spack project provides a couple of build/final image pairs that meet these requirements for people with simple cases (this is already true). It would also be great if the Spack project provided the well-commented Dockerfiles that it uses to build the above images so that people that are almost-but-not-quite simple can just touch them up and create their own build/final images. This would also let them manage those images with whatever CI/CD and registry systems that they're using (required to use). The Dockerfiles are available, but there's no way to feed the result into containerize. And, since users are free to specify any build or final image they please, one might use a locally produced image that's based on nVidia's CUDA support, Alpine Linux, or includes all of the bits (sidecars) that a resource manager vendor (perhaps Univa, but I'm out of date) requires but doesn't provide in an install-able form. These two architectures really do seem to mirror the IDE vs. vi(emacs)+formatter+linter+... "discussions" that come up regularly. On the one hand, a user only needs to learn one tool and can then do [only] the things that the tool can do. On the other hand, a user needs to learn multiple tools and is able to do anything that they can imagine by combining them or adding new tools. That's also how I see this as similar to |
|
@hartzell Thanks for the extensive reply
I think that's exactly my point: here we are trying to substitute emacs (nothing will...), but we definitely don't want to cross the line and produce anything more than a
I completely hear you on this and I think it's probably worth comparing the two approaches. More flexibility, less maintenance for us, more effort required to users. |
In that sense, allowing users to replace the jinja2 templates might be the "least maintainance for spack" option. Possibly. And yes, it requires a lot more effort by users. |
Sounds good. You folks have a wider viewpoint than I do and can probably balance the costs and benefits. |
|
@victorusu I think this should be superseded by #15028. You can always provide a custom base image where the file is copied in the right place, so the functionality is there right? |
@alalazo I do not agree. I think that in this case, the solution to create a custom base container and use it is not very neat. |
|
@victorusu This can be done in two different ways imo:
Solution 1. in the simplest case involves building something like: FROM spack/ubuntu-bionic
COPY src destIf you think of Amber, you would then use that base image for all your builds of Amber. Which use cases do you foresee for that? I can only think of cases where the software is proprietary and sources are released on a commercial license. |
|
The main design point is the following. Currently Since the format in #15028 can already handle this use case, going forward I see as a better option writing some other command/sub-command that helps users to generate a valid base image, rather than adding more and more commands like tried here to tweak the final recipe in various ways. |
I think that this is the key bit and it's quite useful. #15028 provides the basis for fancier things. |
|
Superseded by #15028 |
refers #14802
This PR adds the ability to copy arbitrary files or directories:
It also adds the ability to inject arbitrary instructions before the Spack environment is concretized and built (i.e. at the beginning of the build stage).