Skip to content

Add extension='tar.gz' option to version()#1758

Merged
tgamblin merged 1 commit intospack:developfrom
scheibelp:features/more-url-extract
Oct 6, 2016
Merged

Add extension='tar.gz' option to version()#1758
tgamblin merged 1 commit intospack:developfrom
scheibelp:features/more-url-extract

Conversation

@scheibelp
Copy link
Copy Markdown
Member

@scheibelp scheibelp commented Sep 13, 2016

Example:

version('2.0.5', '84029e969b5b37e1ba791d0572895133', extension='tar.gz')

Note: this removes Package.validate_package_url, which would have to be updated in this PR but is not used so I decided to remove it instead.

This allows the user to set the extension type on a per-version basis. I was reluctant to set a Package-level extension field.

Commit message:

This closes #1757 which provides an example of a url scheme where the
version appears after the extension. Instead of extending the parsing
logic to handle this case, this commit allows the user to specify
their extension type. This helps Spack choose the appropriate
decompressor and mirror archive filename.

@scheibelp scheibelp mentioned this pull request Sep 13, 2016
@scheibelp scheibelp force-pushed the features/more-url-extract branch from f6870c2 to c12d685 Compare September 14, 2016 20:30
This closes spack#1757 which provides an example of a url scheme where the
version appears after the extension. Instead of extending the parsing
logic to handle this case, this commit allows the user to specify
their extension type. This helps Spack choose the appropriate
decompressor and mirror archive filename.
@scheibelp scheibelp force-pushed the features/more-url-extract branch from c12d685 to 10a3a98 Compare September 15, 2016 00:12
@tgamblin tgamblin added the WIP label Sep 21, 2016
@tgamblin
Copy link
Copy Markdown
Member

@scheibelp: Spack relies on tar for most decompression... specifically tar -xf. In early versions, I tried to avoid parsing file extensions at all, and I really don't want to force the user to specify a file extension in the package file (which is kind of awkward, and something the tool could figure out).

I think there's probably a better way to do this if we change expected_archive_files to fall back to the old mechanism -- looking at what files actually exist in the stage directory after fetch. Then we could just let tar -xf do its thing and not worry about the extension...

The only reason Spack currently cares much about extensions is that it tars up repositories that need to be mirrored. I'd like to avoid having it know too much about URL extensions if possible. Do you think this is possible?

@adamjstewart
Copy link
Copy Markdown
Member

If we want to identify the compression scheme, what about something like http://stackoverflow.com/a/19120764/5828163?

Decompression shouldn't be hard. We can assume zip if the file ends with .zip, otherwise assume tar and use tar -xf. But the mirror renaming thing is definitely a hurdle.

@scheibelp
Copy link
Copy Markdown
Member Author

I think there's probably a better way to do this if we change expected_archive_files to fall back to the old mechanism -- looking at what files actually exist in the stage directory after fetch. Then we could just let tar -xf do its thing and not worry about the extension..

I don't see how this is an issue with expected_archive_files - that function is finding the file OK, and it doesn't place any constraints on it. The issue that was coming up is that later there is an attempt to determine the file extension. I think the core issue is with url.downloaded_file_extension: what Adam suggests would make it more robust (although linux-specific). The strategy I offered here was to argue there will be cases where the extension cannot be determined so allow the user to set it up if the logic is getting in the way.

Assuming tar.gz would work in this case but not if a maintainer chose to name their packages like foo.zip.1.0.3

On the side though, all the fetch strategy/resource interaction logic is hard to understand and I really want to refactor it (which is why I left this a WIP - because it makes things more complex in an already-overly-complex section)

@tgamblin tgamblin mentioned this pull request Oct 6, 2016
@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Oct 6, 2016

@scheibelp This came up in #1202 as well as here. I think this is a good solution for now but I really like the idea of refactoring the fetch logic. Do you have ideas about how that could be done cleanly?

Packages, stages, and fetch strategies all started out life as part of Package, but they need to be decoupled. There are a bunch of requirements that have grown together that I tried to outline to @goxberry in #1202, but there's got to be a cleaner way to do it.

@tgamblin tgamblin removed the WIP label Oct 6, 2016
@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Oct 6, 2016

@scheibelp @robertdfrench: note that in the case of #1757 I don't think that looking for .tgz. somewhere in the file is such a bad solution -- I think that would probably work reasonably well. For MFEM as in #1202 it's more complicated as the extension isn't even in the package file... it's only available through a redirect URL and we can't rely on that for mirrors.

@adamjstewart's suggestion of using file might be a decent solution and would allow us to remove eventually remove extension. It would defer the need to know the compression scheme until after the file is fetched. It would mean that for things like MFEM, we just omit the suffix in the mirror for things we can't parse, e.g. mfem-3.2.

The disadvantage is that mirrors would be invalidated when the packager changes the package URL from something non-parseable to something parseable. So in the case of MFEM, if @tzanio decided to get rid of the goo.gl links and to go with github ones, Spack would start expecting mfem-3.2.tar.gz in mirrors instead of mfem-3.2. I kind of like the explicitness of @scheibelp's solution because it avoids this.

@adamjstewart
Copy link
Copy Markdown
Member

If you're going to refactor fetching, can you take a stab at #282 while you're at it? A majority of the Xorg/X11 packages I added ran into this same problem. It's not a big deal when running spack checksum, but when running spack create the *.tar.gz.sig files come first and so the build system isn't detected properly.

@scheibelp
Copy link
Copy Markdown
Member Author

Do you have ideas about how [a refactor] could be done cleanly?

I'm still thinking on it. Currently tabled for other work but I'd like to get back to it by the middle of next week.

@tgamblin tgamblin changed the title [WIP] Handle packages with unparseable extensions Add extension='tar.gz' option to version() Oct 6, 2016
@tgamblin tgamblin merged commit 508d79c into spack:develop Oct 6, 2016
@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 6, 2016

On the side though, all the fetch strategy/resource interaction logic is hard to understand and I really want to refactor it (which is why I left this a WIP - because it makes things more complex in an already-overly-complex section)

+1 on refactoring fetch / stage logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wild and Crazy URLs

4 participants