Skip to content

Tarball flake improvements#8477

Merged
edolstra merged 4 commits intoNixOS:masterfrom
edolstra:tarball-flake-redirects
Jun 16, 2023
Merged

Tarball flake improvements#8477
edolstra merged 4 commits intoNixOS:masterfrom
edolstra:tarball-flake-redirects

Conversation

@edolstra
Copy link
Member

@edolstra edolstra commented Jun 8, 2023

Motivation

This PR adds the following improvements to tarball flakes:

  • Tarball flake URLs can now use the Link: <url>; rel="immutable" HTTP header to return an "immutable" URL that will be recorded in lock files. Previously, the locked URL was the original URL, making redirects such as http://.../patchelf-latest.tar.gz pretty useless.
    The Link header can be returned on any HTTP request in the chain of redirects. The "immutable" URL can include the flake attributes rev and revCount (as HTTP query parameters), which makes it possible to mirror Git/GitHub flakes while retaining rev/revCount. The URL can also include the narHash attribute for verifying the content.
    Instead of Link, you can also use x-amz-meta-link, since S3 buckets currently don't allow setting a custom Link header.
  • It adds a VM test for tarball flakes.

Example:

$ nix flake metadata http://flake-test.s3-website-eu-west-1.amazonaws.com/patchelf-latest.tar.gz  
Resolved URL:  http://flake-test.s3-website-eu-west-1.amazonaws.com/patchelf-latest.tar.gz
Locked URL:    http://flake-test.s3-website-eu-west-1.amazonaws.com/patchelf-442793d9ec0584f6a6e82fa253850c8085bb150a.tar.gz?narHash=sha256-GUm8Uh/U74zFCwkvt9Mri4DSM%2BmHj3tYhXUkYpiv31M%3D
Description:   A tool for modifying ELF executables and libraries
Path:          /nix/store/f2hi54921vbbxydpd5vbbndb593zpl0g-source
Revision:      442793d9ec0584f6a6e82fa253850c8085bb150a
Revisions:     835
Inputs:
└───nixpkgs: github:NixOS/nixpkgs/b139e44d78c36c69bcbb825b20dbfa51e7738347

Context

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

  • agreed on idea
  • agreed on implementation strategy
  • tests, as appropriate
    • functional tests - tests/**.sh
    • unit tests - src/*/tests
    • integration tests - tests/nixos/*
  • documentation in the manual
  • documentation in the internal API docs
  • code and comments are self-explanatory
  • commit message explains why the change was made
  • new feature or incompatible change: updated release notes

Priorities

Add 👍 to pull requests you find important.

@edolstra edolstra requested a review from thufschmitt as a code owner June 8, 2023 14:59
@github-actions github-actions bot added fetching Networking with the outside (non-Nix) world, input locking with-tests Issues related to testing. PRs with tests have some priority labels Jun 8, 2023
encoding = trim(line.substr(i + 1));
else if (name == "accept-ranges" && toLower(trim(line.substr(i + 1))) == "bytes")
acceptRanges = true;
else if (name == "x-amz-meta-nix-is-immutable" || name == "x-nix-is-immutable") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if it might be more accurate to be named or x-nix-redirect-is-stable or something to make it more obvious that this is describing the fact that the redirect target isn't expected to change, not that its contents aren't expected to change (a slight difference).

But maybe this is probably bikeshedding, so feel free to disregard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the header x-nix-is-immutable indicates the URL that was fetched is immutable. So:

GET http://flake-test.s3-website-eu-west-1.amazonaws.com/patchelf-latest.tar.gz

Location: http://flake-test.s3-website-eu-west-1.amazonaws.com/patchelf-442793d9ec0584f6a6e82fa253850c8085bb150a.tar.gz

then:

GET http://flake-test.s3-website-eu-west-1.amazonaws.com/patchelf-442793d9ec0584f6a6e82fa253850c8085bb150a.tar.gz

x-nix-is-immutable: 1

causes Nix to store that GET's URL in the lock.

If the first request had it, like this:

GET http://flake-test.s3-website-eu-west-1.amazonaws.com/patchelf-latest.tar.gz

x-nix-is-immutable: 1
Location: http://flake-test.s3-website-eu-west-1.amazonaws.com/patchelf-442793d9ec0584f6a6e82fa253850c8085bb150a.tar.gz

Nix would store http://flake-test.s3-website-eu-west-1.amazonaws.com/patchelf-latest.tar.gz in the lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a practical issue, using a header in the redirect doesn't work with S3 website buckets, since they don't allow custom headers on redirects.

Copy link
Member

@lheckemann lheckemann Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given https://www.rfc-editor.org/rfc/rfc6648, I'd recommend not having an X- prefix on the header (where we have full control over the header name, at least).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roberth
Copy link
Member

roberth commented Jun 9, 2023

Instead of a redirect and custom header you could use a standard header, a Link with rel: latest-version. (not the best writing, but the example makes it clear that this is a reference to a specific unchanging version)
This basically sidesteps the whole "immutable isn't the same thing as permanent" business. We just want to get a fixed version that's currently latest, as expressed by the latest-version relation.
rev and revCount can be added to the Link as well, as link parameters are open for extension, or they could be taken from headers.

By putting a full link in the header, you could let any response in the chain of redirects provide the permanent location:

  • All could be done in a single response; no redirects required -> lowest latency.
  • You can put a dumb redirect in front of any redirect chain (or response) that supports the link. This could abstract away the infrastructure of organizations that care to set up such a redirect for their projects, because Nix will just follow links until it has found the header.

x-nix-is-immutable: 1 behavior can still be achieved by setting latest-version URL to be a self reference.

So this is more standard and it allows more flexibility in service implementation.

@edolstra
Copy link
Member Author

edolstra commented Jun 9, 2023

@roberth Using Link sounds good, but I don't get latest-version. We already know the latest-version URL (namely http://.../patchelf-latest.tar.gz), and we want to get the stable, locked URL that it currently points to.

Wouldn't rel=canonical be more appropriate? I.e. when you fetch http://.../patchelf-latest.tar.gz, you get the current tarball, and a header

Link: <http://.../patchelf-<rev>.tar.gz?rev=<rev>&revCount=1234>; rel="canonical"

which is the URL that Nix would record in the lock file.

@roberth
Copy link
Member

roberth commented Jun 9, 2023

We already know the latest-version URL (namely http://.../patchelf-latest.tar.gz),

The name is a bit confusing, as you really have to read the linked paragraph thoroughly to conclude that it refers to a resource like x/y/1.0 and not x/y/latest.

Wouldn't rel=canonical be more appropriate?

I did consider it, but its use is for crawling/indexing, which might perhaps conflict, specifically when the resource considers itself to be more about the ref than the rev. Indeed you could consider the branch tip to be the canonical version for some set of revisions, at least for the purpose of indexing.

latest-version otoh is from a standard explicitly about "Version Navigation between Web Resources".

bookmark was another contender, but as with indexing, the browsing use case for a resource might be subtly conflicting.

EDIT: memento is also intriguing but comes with a lot of complexity and an obligatory datetime. And that's only if it even makes sense to use by itself without implementing that standard, which is for web archiving rather than inherently versioned resources.

@roberth
Copy link
Member

roberth commented Jun 9, 2023

We already know the latest-version URL (namely http://.../patchelf-latest.tar.gz),

Maybe I read too much into the "latest version by timestamp" phrase. It could refer to the content of the final response instead of the URL.
I'll assume that I've misinterpreted that text, and at best it's ambiguous.

As an alternative, we could bring to life rel=permalink from this draft.

Designates an immutable "permanent link" for the link's context.

It seems to have the least potential for ambiguity, despite not being a registered rel.
It also lines up with GitHub's terminology. Their "permalink" links into files rely on commit ids, so that's nice.

@roberth
Copy link
Member

roberth commented Jun 9, 2023

Or if we're ok with using a draft, immutable from that same document!

immutable
Refers to an immutable version of the link's context that is not subject or susceptible to change or variation. This may refer to a read-only resource, an archived copy of an otherwise dynamic resource, a specific version in a version control system, etc. This is in contrast to links which return the latest version of the resource at any given time.

@grahamc

This comment was marked as outdated.

@roberth
Copy link
Member

roberth commented Jun 9, 2023

The purpose of immutable seems to be freezing the contents, whereas for permalink it's more about freezing the URL, ie providing an alias that never changes, even if for example a repository is renamed or moved. (Unless you consider the version to be the real resource, and a path like foo/latest to be just a pointer.)
We generally don't have alternate identifiers for things like repositories and getting your dependency through an opaque or potentially misleading identifier doesn't seem like a great idea, so let's forget about creating such alias URLs as permalink suggests in the section before the one I linked originally:

Designates an immutable URL which is not subject or susceptible to change or variation even if the resource itself is updated. A portmanteau of "permanent" and "link", the relation allows clients to save and share a URL with the guarantee that it will resolve to that resource so long as it still exists. Typically the more information that is included in a link the more volatile it is likely to be. For example a link to a blog post that includes the title may change whenever the title is updated. Where "permalink" is specified such changes must not cause the designated URL to stop resolving or to resolve to another resource.

There's no mention of the immutability of contents, so the Link to look for should be immutable.

@roberth
Copy link
Member

roberth commented Jun 9, 2023

An example wouldn't hurt to illustrate the Link idea; it's not just a syntax change.
I'll omit rev and revCount, because it's not clear to me where is best to put them, in their own header, or as Link parameters at the same level as rel.

Here's the shortest possible exchange:

GET http://server/patchelf-latest.tar.gz

200 OK
Link: <http://server/patchelf-442793d9ec0584f6a6e82fa253850c8085bb150a.tar.gz>; rel="immutable"

<tarball data>

As you can see it contains both the data and the pinned URL. If we want to test the immutable URL as part of locking, we could use a HEAD request instead.

Here's a more elaborate exchange:

First a dumb redirect. If we don't learn anything from the redirect, we just keep going. This lets users set up an indirection that makes them independent of their service choices.

GET http://get-patchelf.io/letsgonix

302 Found
Location: http://server/patchelf-latest.tar.gz

server is a web scale microservice thing. They like redirects too, but they're all about software so we do get the metadata we need early on. Alternatively, they could do another dumb redirect first, but that'd be boring.

GET http://server/patchelf-latest.tar.gz

302 Found
Link: <http://server/patchelf-442793d9ec0584f6a6e82fa253850c8085bb150a.tar.gz>; rel="immutable"
Location: http://cdn.server/c0584f6a6e82fa25?q=idk

At this point we have both the pinned URL for the lock file and a URL to start the actual download.

GET http://cdn.server/c0584f6a6e82fa25?q=idk

200 OK

<tarball data>

We didn't save a roundtrip because we got a 302, but that's ok. At least we do support cutting out the roundtrip as shown in the prior case. Maybe server will upgrade their web servers to support the single request flow later; at least with our Link architecture, they can.

edolstra added 2 commits June 13, 2023 14:13
Previously, for tarball flakes, we recorded the original URL of the
tarball flake, rather than the URL to which it ultimately
redirects. Thus, a flake URL like
http://example.org/patchelf-latest.tar that redirects to
http://example.org/patchelf-<revision>.tar was not really usable. We
couldn't record the redirected URL, because sites like GitHub redirect
to CDN URLs that we can't rely on to be stable.

So now we use the redirected URL only if the server returns the
`x-nix-is-immutable` or `x-amz-meta-nix-is-immutable` headers in its
response.
@edolstra edolstra force-pushed the tarball-flake-redirects branch from c98c775 to 1ad3328 Compare June 13, 2023 12:17
@edolstra
Copy link
Member Author

Updated the PR to use Link: <url>; rel="immutable". The x-nix-rev and x-nix-revcount headers are gone since that info can now be included in the immutable URL as a regular flake attribute encoded as a HTTP query parameter, e.g.

Link: <http://flake-test.s3-website-eu-west-1.amazonaws.com/patchelf-442793d9ec0584f6a6e82fa253850c8085bb150a.tar.gz?rev=442793d9ec0584f6a6e82fa253850c8085bb150a&revCount=835>; rel="immutable"

@fricklerhandwerk
Copy link
Contributor

Triaged in Nix maintainers team meeting 2023-06-09: (for reference)

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-06-09-nix-team-meeting-minutes-61/29163/1

@fricklerhandwerk
Copy link
Contributor

Discussed in Nix team meeting:

  • @Ericson2314: why can't we use redirects?
    • @edolstra: because they themselves are usually not stable.
  • @fricklerhandwerk: what about the documentation?
    • @edolstra: could add a "Protocols" page to the manual, also suitable for the .narinfo format and others

@edolstra edolstra enabled auto-merge June 16, 2023 14:00
@edolstra
Copy link
Member Author

I've added a "Protocols" chapter to the manual that documents how to serve tarball flakes.

?rev=442793d9ec0584f6a6e82fa253850c8085bb150a
&revCount=835
&narHash=sha256-GUm8Uh/U74zFCwkvt9Mri4DSM%2BmHj3tYhXUkYpiv31M%3D>; rel="immutable"
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether this or Link: <url>; rel="immutable"; rev="442..." is better. It gives the server implementer the option to not pass those parameters to the CDN. OTOH it could lead to some duplication, but is that a problem? Both seem like minor problems so I really don't know.

Maybe the folks at GitHub have a preference?

@edolstra edolstra merged commit e503ead into NixOS:master Jun 16, 2023
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-06-16-nix-team-meeting-minutes-63/29316/1

@edolstra edolstra deleted the tarball-flake-redirects branch June 21, 2023 13:50
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/future-of-channels-and-channels-nixos-org-in-a-flakes-world/11563/15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fetching Networking with the outside (non-Nix) world, input locking with-tests Issues related to testing. PRs with tests have some priority

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants