Skip to content

s3: cache client instance#34372

Merged
haampie merged 4 commits intospack:developfrom
haampie:fix/cache-s3-connection
Dec 9, 2022
Merged

s3: cache client instance#34372
haampie merged 4 commits intospack:developfrom
haampie:fix/cache-s3-connection

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Dec 7, 2022

Use a cache of the form

(mirror name, fetch/push) -> Client(...)

for S3 connections, to avoid the overhead of creating it on every request.

With this PR:

In [1]: import spack.util.web

In [2]: %timeit spack.util.web.url_exists("s3://spack-binaries/does/not/exist")
105 ms ± 7.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Before:

In [1]: import spack.util.web

In [2]: %timeit spack.util.web.url_exists("s3://spack-binaries/does/not/exist")
2.2 s ± 238 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@spackbot-app spackbot-app bot added core PR affects Spack core functionality fetching tests General test capability(ies) utilities labels Dec 7, 2022
@haampie haampie force-pushed the fix/cache-s3-connection branch from 964f348 to 59bef1d Compare December 7, 2022 15:11
@haampie haampie force-pushed the fix/cache-s3-connection branch from b4e554c to 6a60b3e Compare December 8, 2022 10:04
Copy link
Copy Markdown
Contributor

@scottwittenburg scottwittenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haampie! Caching s3 clients is a great idea, and this looks to be a fairly straightforward (let's put that in air quotes though) adaptation of what we had before to support that. I wasn't really familiar with this bit of the code before, so I added some questions to check my understanding. Feel free to ignore them unless you have time to answer.

Otherwise, LGTM. I'm not sure how well the unit tests mock s3 read/write, so maybe I should keep an eye on the pipelines after this is merged 😜

}
mirror = spack.mirror.Mirror.from_dict(
{
"fetch": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I'm confused about is why you changed from an arbitrary dictionary of connection data (mock_connection_data) to this thing that is actually a Mirror. For one thing, my understanding was that the endpoint_url was optional (and only used/needed for S3 buckets that don't live in AWS S3, but rather minio or similar). If this were a real S3 mirror config, would it have another url? I.e. the s3:\\ url?

Also, I'm surprised it's valid to specify all three auth methods simultaneously, I would have thought you'd want to specify only one of access_token, profile, or access_pair. Maybe specifying all three just helps with testing though?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many different urls correspond to one (or no) mirror, so I thought it was an easier key to use


def _s3_open(url):
parsed = url_util.parse(url)
s3 = s3_util.create_s3_session(parsed, connection=s3_util.get_mirror_connection(parsed))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it a bug that this didn't pass url_type="fetch" previously?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so yes

if url_result.scheme == "s3":
# Check for URL-specific connection information
s3 = s3_util.create_s3_session(
url_result, connection=s3_util.get_mirror_connection(url_result)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this another case where url_type="fetch" should have been passed before?

import spack.config
import spack.util.url as url_util

#: Map (mirror name, method) tuples to s3 client instances.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our conversation in slack, I was expecting a mapping from (url, method) to s3 client instance, rather than (name, method) to client instance.

mirrors = [
(name, mirror)
for name, mirror in all_mirrors.items()
if url_str.startswith(get_mirror_url(mirror))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I get it. This is where you've converted from the provide url to a mirror name, which you've possibly done in support of the future goal of using mirror names (rather than mirror urls) everywhere?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the old code was already like this, but maybe a comment got lost

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Dec 8, 2022

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Dec 8, 2022

I'm sorry, gitlab does not have your latest revision yet, I can't run that pipeline for you right now.

One likely possibility is that your PR pipeline has been temporarily deferred, in which case, it is awaiting a develop pipeline, and will be run when that finishes.

Please check the gitlab commit status message to see if more information is available.

Details
Unexpected response from gitlab: {'message': '404 Commit Not Found'}

@haampie haampie merged commit 7e054cb into spack:develop Dec 9, 2022
@haampie haampie deleted the fix/cache-s3-connection branch December 9, 2022 07:50
amd-toolchain-support pushed a commit to amd-toolchain-support/spack that referenced this pull request Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core PR affects Spack core functionality fetching tests General test capability(ies) utilities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants