Conversation
|
Hi @douglasjacobsen! I noticed that the following package(s) don't yet have maintainers:
Are you interested in adopting any of these package(s)? If so, simply add the following to the package class: maintainers = ['douglasjacobsen']If not, could you contact the developers of this package and see if they are interested? You can quickly see who has worked on a package with $ spack blame luaThank you for your help! Please don't add maintainers without their consent. You don't have to be a Spack expert or package developer in order to be a "maintainer," it just gives us a list of users willing to review PRs or debug issues relating to this package. A package can have multiple maintainers; just add a list of GitHub handles of anyone who wants to volunteer. |
|
This PR supersedes #24422 |
scottwittenburg
left a comment
There was a problem hiding this comment.
This looks pretty good to me, but since it changes some things with respect to how S3 is handled, it's probably a good idea for @opadron to take a look.
scottwittenburg
left a comment
There was a problem hiding this comment.
I just noticed some cleanup jobs failing in gitlab pipelines, to do with the recursive argument to remove_url(). Here's a link to a failing job, though you can find it from the gitlab check on your PR as well.
|
Why do we need this? Isn't GCS S3-compatible? |
|
This allows us to use google-cloud-storage's python API without having a dependence on boto3. However, users could choose to use the s3 backend, and configure it to point to GCS resources by changing the environment variables instead of using this native GCS backend. |
Looks like it's adding a new url scheme, so I assumed GCS could be addressed with both url protocols. |
Ok, but to do that requires a dependency on the google-cloud-storage's python API. I'm not yet convinced this is worth adding on the basis of reducing dependencies because it doesn't. Is there some kind of feature or capability that GCS provides that S3 does not? And, are we currently or planning to take advantage of those capabilities? Otherwise, I'm having a really hard time seeing why these changes are needed. Maybe my reading of the above quote is off. Are you perhaps saying that using boto3 with GCS would also require the google-cloud-storage's python API? |
I think the cleanup job uses But @douglasjacobsen I'm curious if you saw some documentation indicating that we no longer need the special handling for recursive removal of a url? I would be happy to be enlightened if you could share a link. Thanks! |
|
Yes, using the These changes more allow users who are already using the Google Cloud SDK to access GCS buckets using their SDK credentials and the default endpoint, similar to how the default endpoint for |
|
@scottwittenburg: Sorry, this is a bug. I'm going to work on fixing it asap. I think the version we started working on didn't have the recusrive destruction of these URLs, and when I rebased these changes didn't translate properly. For the GCS backend, we could use |
|
We should probably resolve the discussion about whether or not this is useful before I put more time into fixing the PR though. :) |
opadron
left a comment
There was a problem hiding this comment.
I'm quite confused by these proposed changes. They concern me because I worry that it might be unnecessarily adding new code paths to access GCS, which as far as I understand, are already accessible via an s3:// URL and boto.
Besides the above concern, I notice that a number of the proposed changes modify how s3:// urls are handled as well as other minor details of how Spack operates without a clear reason. The PR author themselves notes that one of these changes are unrelated and should be removed. Perhaps this is supposed to be a WIP and not yet ready for review?
@douglasjacobsen if I am in error, please accept my apologies. Help us to understand your intent and what we can do to help.
|
@douglasjacobsen so, I just noticed your comments from the last 15 minutes or so. Thanks for helping to clear up some of my questions. I think a big part of what you're going for is a good developer experience for users of GCS tooling, is that right? I can see how it would be more convenient to just use existing tooling, but I think we should balance that desire against the maintenance cost of maintaining |
|
@opadron Thanks for the discussion. And I should probably apologize for the changes that are irrelevant. This should not change the s3 backend, and it should not modify how spack works (i.e. the recursive removal is an accident). This is a branch that was started a long time ago (#24422) and I've been working with the original author to fix some of the issues we had with it. The unrelated changes that are in the PR are simply an oversight on my part when I rebased the branch to fix some of the merge conflicts. Regarding user experience, yes that's exactly what I'm going for. The steps a user needs to go through to access a GCS bucket with the existing s3 backend are something similar to the following (which honestly are not very well documented and it takes a lot for a naive user to figure out):
Whereas with this PR, the steps are similar to the user experience with an
Then you can read from |
|
I agree that a better user experience could be a compelling argument in favor of changes like these. However, the best trade off between this interest and maintainability is not clear. It'd be great if we could have other Spack devs & users weigh in, especially if they're also GCS users. @tgamblin @adamjstewart @alalazo @gartung @scheibelp: would love to hear your opinion. Also, @eugeneswalker, I believe you are a user of GCS, is that right? What do you think about having code in Spack for handling |
|
@opadron: we really want Spack to be useable by the different cloud providers using their own tooling, and this is a pretty small PR as far as they go. I don't think it's a huge maintenance burden to support this, and it's supported in a very similar way to the way we currently support If merging stuff like this enables folks at GCP to deliver Spack efficiently to their users, it ultimately helps the project and get us a broader user base, which is what we want. So I'd say please evaluate this considering the use of the GCP native libs as a bonus (the same way we consider If there is a good way to support all of the object store interfaces smoothly without the native libraries, we could talk about that, but I think at this point we want buildcaches to be just as usable on whatever cloud you happen to be on. |
|
I have no comment since we don't use this feature. |
|
Thanks, I think @tgamblin made some good points for why we want native support for GCS. @douglasjacobsen, I think we're in agreement that this PR is useful and worth putting more time in to. Let us know if you have any more questions. |
99d04d7 to
146bbda
Compare
|
@tgamblin, @opadron, and @scottwittenburg: OK, it seems like I've fixed most (if not all) of the CI checks. There's one still pending, but let me know if you see any other PR changes that you'd like me to make. I think I resolved all of the previous issues already. |
This commit updates the GCS support to be more inline with how the S3 implementation works.
146bbda to
50073bd
Compare
opadron
left a comment
There was a problem hiding this comment.
This is looking really good! Additional feedback below.
711f9a4 to
45f6e6b
Compare
45f6e6b to
b9370ba
Compare
|
@opadron I think I fixed all of the issues you spotted, let me know if you see any new ones though. |
|
@scottwittenburg I think I've fixed all of the issues you spotted as well. Feel free to let me know if there are any more modifications you'd like to see though. |
scottwittenburg
left a comment
There was a problem hiding this comment.
Looks good to me @douglasjacobsen, thanks!
|
Just wanted to throw my vote of support for this feature - I've been working with @douglasjacobsen 's contributions on my own fork on Google Cloud and have been able to host binary caches on GCS that help considerably reduce build times for Google Compute Engine VM images. |
lib/spack/spack/test/gcs_fetch.py
Outdated
|
|
||
|
|
||
| @pytest.mark.parametrize('_fetch_method', ['curl', 'urllib']) | ||
| def test_gcsfetchstrategy_sans_url(_fetch_method): |
There was a problem hiding this comment.
Maybe:
| def test_gcsfetchstrategy_sans_url(_fetch_method): | |
| def test_gcsfetchstrategy_without_url(_fetch_method): |
? But in any case 🎉 🇫🇷 🎉
| import spack.util.web as web_util | ||
|
|
||
|
|
||
| def gcs_open(req, *args, **kwargs): |
There was a problem hiding this comment.
Where are the *args and **kwargs used? I am wondering why is it necessary to have them in the function signature.
There was a problem hiding this comment.
It might be also good to add a brief docstring here.
There was a problem hiding this comment.
I only added *args and **kwargs to keep the signature the same as the other functions that are used as function pointers here: https://github.com/douglasjacobsen/spack/blob/b9370ba0d7e54f7de37ef4d5f680479aeb741d11/lib/spack/spack/util/web.py#L552
Would you like me to change it so I don't have them in the signature? I'll work on the docstring in the back ground.
1119d1d
|
@alalazo: Let me know if these changes look good. Thanks for the review! |
1119d1d to
af92e64
Compare
af92e64 to
60a3e73
Compare
alalazo
left a comment
There was a problem hiding this comment.
This LGTM. There's only a minor comment on a thing I overlooked on first review.
Adressing review comments Co-authored-by: Massimiliano Culpo <[email protected]>
|
No problem. Thanks @alalazo For some reason some style errors came up in the last CI check, but I don't see them on my local machine. We'll see if they crop up again after it runs this time. |
|
Thanks! |
This pull request contains changes to support Google Cloud Storage buckets as mirrors, meant for hosting Spack build-caches. This feature is beneficial for folks that are running infrastructure on Google Cloud Platform. On public cloud systems, resources are ephemeral and in many cases, installing compilers, MPI flavors, and user packages from scratch takes up considerable time.
Giving users the ability to host a Spack mirror that can store build caches in GCS buckets offers a clean solution for reducing application rebuilds for Google Cloud infrastructure.