WIP: add command to mirror path to AWS s3#8664
Conversation
|
|
2b89f73 to
4c079c2
Compare
|
@tgamblin if you have the opportunity I wanted to see what your thoughts on this were prior to tomorrow's meeting. Thanks! |
tgamblin
left a comment
There was a problem hiding this comment.
@bryonbean: ok I looked at this.
As far as how to upload, I think the boto3 stuff is good. It should fail gracefully if boto3 isn’t available, so that we don’t have to vendor it. Devs interested in making mirrors can install it themselves.
As far as the API, I think this needs some changes.
The usage model for spack mirror has so far been that users call spack mirror create <dir> <specs> to make a mirror and populate it with tarballs for , then they call spack mirror add <name> <url> to add the mirror to repos.yaml and make spack fetch from it. The name is a mnemonic used by other commands (like spack mirror rm). Think of it like the name you give a git remote.
So ideally, the user really only deals with repo names and specs. spack mirror create is a bit odd because it takes a directory, but that is because it is for creating a new mirror. I think this command should work like spack mirror create, except it should take a mirror name (not a directory) and add tarballs for specs to the mirror corresponding to the name. The mirror could be a local directory (as with spack mirror create), or it could be an s3 bucket — the user shouldn’t have to care and spack should figure it out by url.
So here is what I’m proposing:
spack mirror put <name> <specs>
That should look at the registered mirror called and upload tarballs, resources, etc. to the right places in that mirror. This is essentially what spack mirror create does except the mirror would already be registered. I think you can generalize parts of spack mirror create and use them for both create and put.
The nice thing about this is that the user doesn’t have to know much to make a mirror. They just:
spack mirror add production <s3 url>
spack mirror put production [email protected]
# all files and resources needed to build hdf5 v1.4.5
# are added to the prod mirror in the right place.
This way we don’t have to have a directory or a key argument. And the user doesn’t necessarily need to have the tarballs on hand. spack mirror create fetches things as it needs them to upload to a mirror, and this should do the same. If the tarballs it needs is already in var/spack/cache, then it can skip the fetch and upload what it has on hand (provided the checksaum matches).
Does that make sense?
| import os | ||
| from datetime import datetime | ||
|
|
||
| import boto3 |
There was a problem hiding this comment.
I think boto3 should be optional -- I'd rather not vendor the whole thing (there are lots of dependencies) but spack mirror should still work if boto3 isn't present, and just warn the user that they need to install boto. We do something similar with flake8 in spack flake8.
| default=spack.cmd.default_list_scope(), | ||
| help="configuration scope to read from") | ||
|
|
||
| # Binary |
There was a problem hiding this comment.
it's not a binary is it? this is for source tarballs right?
There was a problem hiding this comment.
This particular PR is for the binary mirror and while there is also a story for a source mirror, I mistakenly moved that source mirror card into the In Progress section, then referenced that card here in the PR message. Sorry about that. I presume this may have some consequence on the notes that you've left me here?
There was a problem hiding this comment.
I think binary caches are mostly similar, but they're handled by a different command. Look at spack buildcache create, which takes specs of packages to create binary caches for, then puts them in a mirror directory structure. It's the analog for spack mirror create.
I think for binaries, there should just be:
spack buildcache put <mirror> <spec>
And that should create the buildcache (if necessary) and upload it to the specified mirror. The only piece of logic missing with the buildcache stuff is where to create the cache by default. spack buildcache create currently assumes that . is a mirror if there is no -d, but I think it should probably put buildcaches in var/spack/cache (which is itself a mirror), similar to how tarballs are already put there. Then this workflow would work:
spack mirror add production <s3 url>
spack buildcache put production [email protected]
And that should register an S3 bucket, then create and upload the buildcache to the mirror.
Seem reasonable? The basic theme here is that the user shouldn't have to know too much about directory structure -- they just need to say what they want to do and name the points in configuration space (specs) that they want to deal with.
|
|
||
| # Binary | ||
| upload_parser = sp.add_parser('binary', help=mirror_binary.__doc__) | ||
| upload_parser.add_argument('-d', '--directory', default=None, |
There was a problem hiding this comment.
I think this needs to be a bit more abstract. Right now the user can call this with any directory, but it really only makes sense if you call it with a local mirror (e.g. the one we build in var/spack/cache as tarballs are downloaded). I think we need this operation (upload a directory) but it’s lower level — don’t allow users to upload just any directory.
| upload_parser = sp.add_parser('binary', help=mirror_binary.__doc__) | ||
| upload_parser.add_argument('-d', '--directory', default=None, | ||
| help="directory containing binaries for mirroring") | ||
| upload_parser.add_argument('-f', '--file', default=None, |
There was a problem hiding this comment.
Same with this one — don’t allow the user to upload any old file. They should upload the archive for a spec, but they shouldn’t have to have it on hand, as spack already knows what files are needed to build a spec.
| help="directory containing binaries for mirroring") | ||
| upload_parser.add_argument('-f', '--file', default=None, | ||
| help="a file to upload to the mirror") | ||
| upload_parser.add_argument('-k', '--key', default=None, |
There was a problem hiding this comment.
This should be implicit from the spec being uploaded to the mirror, so the user shouldn’t have to specify a key.
| help="a file to upload to the mirror") | ||
| upload_parser.add_argument('-k', '--key', default=None, | ||
| help="the s3 key") | ||
| upload_parser.add_argument('-b', '--bucket', default=None, |
There was a problem hiding this comment.
The other mirror commands operate on mirror names (supplied via spack mirror add), not on raw urls. So I think the user should first spack mirror add <name> <s3 url>, then run spack mirror put <specs> to upload specs to that mirror. Ideally that would work with local mirrors in the file system and with s3 buckets.
| help="the s3 key") | ||
| upload_parser.add_argument('-b', '--bucket', default=None, | ||
| help="name of s3 bucket") | ||
| upload_parser.add_argument('-p', '--profile', default=None, |
There was a problem hiding this comment.
This looks good to me — though it should also be possible to set this in the environment (though I think that is already possible just from using boto3)
| colify(s.cformat("$_$@") for s in error) | ||
|
|
||
|
|
||
| def mirror_binary(args): |
There was a problem hiding this comment.
This looks good mechanically, in that I think this is how files will need to be uploaded. Command line API-wise we need something higher level so that users don’t have to know how to structure a mirror.
|
@tgamblin working on bryons-mbp:~ bryonbean$ spack mirror list
localmirror file:///Users/bryonbean/spack-mirror-2018-08-22
production https://s3-us-east-2.amazonaws.com/spack-mirror-binary
bryons-mbp:~ bryonbean$ spack mirror remove localmirror
==> Removed mirror localmirror with url file:///Users/bryonbean/spack-mirror-2018-08-22
bryons-mbp:~ bryonbean$ spack mirror create bzip gnupg
==> Error: Package bzip not found.
bryons-mbp:~ bryonbean$ spack mirror create cmake
==> Adding package [email protected] to mirror
==> Fetching https://cmake.org/files/v3.12/cmake-3.12.1.tar.gz
######################################################################## 100.0%
==> [email protected] : checksum passed
==> [email protected] : added
==> Successfully created mirror in spack-mirror-2018-08-28
Archive stats:
0 already present
1 added
0 failed to fetch.
bryons-mbp:~ bryonbean$ spack mirror add local /Users/bryonbean/spack-mirror-2018-08-28
bryons-mbp:~ bryonbean$ spack mirror list
local file:///Users/bryonbean/spack-mirror-2018-08-28
production https://s3-us-east-2.amazonaws.com/spack-mirror-binary
bryons-mbp:~ bryonbean$ spack buildcache list
==> Finding buildcaches in /Users/bryonbean/spack-mirror-2018-08-28/build_cache
==> Error: [Errno 2] No such file or directory: '/Users/bryonbean/spack-mirror-2018-08-28/build_cache'
bryons-mbp:~ bryonbean$ cd spack-mirror-2018-08-28/
bryons-mbp:spack-mirror-2018-08-28 bryonbean$ ls -la
total 0
drwxr-xr-x 3 bryonbean staff 96 Aug 28 08:34 .
drwxr-xr-x+ 97 bryonbean staff 3104 Aug 28 08:34 ..
drwxr-xr-x 3 bryonbean staff 96 Aug 28 08:34 cmake
bryons-mbp:spack-mirror-2018-08-28 bryonbean$ mkdir build_cache
bryons-mbp:spack-mirror-2018-08-28 bryonbean$ cd ..
bryons-mbp:~ bryonbean$ spack buildcache list
==> Finding buildcaches in /Users/bryonbean/spack-mirror-2018-08-28/build_cache
==> Finding buildcaches on https://s3-us-east-2.amazonaws.com/spack-mirror-binary
==> buildcache specs and commands to install them
bryons-mbp:~ bryonbean$ spack buildcache create cmake libelf
==> adding matching spec [email protected]%[email protected] arch=darwin-highsierra-x86_64
==> recursing dependencies
==> adding dependency [email protected]%[email protected] arch=darwin-highsierra-x86_64
==> adding matching spec [email protected]%[email protected]~doc+ncurses+openssl+ownlibs patches=dd3a40d4d92f6b2158b87d6fb354c277947c776424aa03f6dc8096cf3135f5d0 ~qt arch=darwin-highsierra-x86_64
==> recursing dependencies
==> adding dependency [email protected]%[email protected]~symlinks~termlib arch=darwin-highsierra-x86_64
==> adding dependency [email protected]%[email protected]+optimize+pic+shared arch=darwin-highsierra-x86_64
==> adding dependency [email protected]%[email protected]+systemcerts arch=darwin-highsierra-x86_64
==> adding dependency [email protected]%[email protected]~doc+ncurses+openssl+ownlibs patches=dd3a40d4d92f6b2158b87d6fb354c277947c776424aa03f6dc8096cf3135f5d0 ~qt arch=darwin-highsierra-x86_64
==> writing tarballs to ./build_cache
==> creating binary cache file for package [email protected]%[email protected]~symlinks~termlib arch=darwin-highsierra-x86_64
==> Error:
/var/folders/tq/b216ws_s64x2g2dgm3_gf4340000gn/T/tmp21_988pr/ncurses-6.1-y43rifzc4mdllqy3ujfm4iwkbcqafskq/lib/libncurses.6.dylib
contains string
/Users/bryonbean/Projects/spack/opt/spack
after replacing it in rpaths.
Package should not be relocated.
Use -a to override.
bryons-mbp:~ bryonbean$ cd spack-mirror-2018-08-28/build_cache/
bryons-mbp:build_cache bryonbean$ ls -la
total 0
drwxr-xr-x 2 bryonbean staff 64 Aug 28 08:39 .
drwxr-xr-x 4 bryonbean staff 128 Aug 28 08:39 ..
bryons-mbp:build_cache bryonbean$ cd ../../
bryons-mbp:~ bryonbean$ spack buildcache create -a cmake libelf
==> adding matching spec [email protected]%[email protected]~doc+ncurses+openssl+ownlibs patches=dd3a40d4d92f6b2158b87d6fb354c277947c776424aa03f6dc8096cf3135f5d0 ~qt arch=darwin-highsierra-x86_64
==> recursing dependencies
==> adding dependency [email protected]%[email protected]~symlinks~termlib arch=darwin-highsierra-x86_64
==> adding dependency [email protected]%[email protected]+optimize+pic+shared arch=darwin-highsierra-x86_64
==> adding dependency [email protected]%[email protected]+systemcerts arch=darwin-highsierra-x86_64
==> adding dependency [email protected]%[email protected]~doc+ncurses+openssl+ownlibs patches=dd3a40d4d92f6b2158b87d6fb354c277947c776424aa03f6dc8096cf3135f5d0 ~qt arch=darwin-highsierra-x86_64
==> adding matching spec [email protected]%[email protected] arch=darwin-highsierra-x86_64
==> recursing dependencies
==> adding dependency [email protected]%[email protected] arch=darwin-highsierra-x86_64
==> writing tarballs to ./build_cache
==> creating binary cache file for package [email protected]%[email protected]~symlinks~termlib arch=darwin-highsierra-x86_64
==> Error: gpg2 is not available in $PATH .
Use spack install gnupg and spack load gnupg.
bryons-mbp:~ bryonbean$ spack install gnupg
==> gnupg is already installed in /Users/bryonbean/Projects/spack/opt/spack/darwin-highsierra-x86_64/clang-9.1.0-apple/gnupg-2.2.3-otme5jgl2by5psv77wpbdalqjfbdao2g
bryons-mbp:~ bryonbean$ spack load gnupg
==> This command requires spack's shell integration.
To initialize spack's shell commands, you must run one of
the commands below. Choose the right command for your shell.
For bash and zsh:
. /Users/bryonbean/Projects/spack/share/spack/setup-env.sh
For csh and tcsh:
setenv SPACK_ROOT /Users/bryonbean/Projects/spack
source /Users/bryonbean/Projects/spack/share/spack/setup-env.csh
This exposes a 'spack' shell function, which you can use like
$ spack load package-foo
Running the Spack executable directly (for example, invoking
./bin/spack) will bypass the shell function and print this
placeholder message, even if you have sourced one of the above
shell integration scripts.
bryons-mbp:~ bryonbean$ . /Users/bryonbean/Projects/spack/share/spack/setup-env.sh
bryons-mbp:~ bryonbean$ spack load gnupg
-bash: module: command not found |
|
Superseded by #11117. |
A script for creating a mirror on a S3 bucket.
Given a local path, a bucket name, and optionally an AWS profile S3Mirror recursively scans a directory structure, uploading its contents to a S3 bucket.
Addresses #7123