Retrieve version metadata from astral-sh/versions ndjson instead of the GitHub API#737
Retrieve version metadata from astral-sh/versions ndjson instead of the GitHub API#737
astral-sh/versions ndjson instead of the GitHub API#737Conversation
2dbc8bd to
98291be
Compare
98291be to
0a4c510
Compare
zsol
left a comment
There was a problem hiding this comment.
I think this change is OK, but maybe not enough. downloadVersionFromManifest still depends on the manifest shipped with the action, which will get used over the ndjson for versions it contains. I'd much prefer it if the version and the download url would come from the same location:
- if we're using ndjson for discovering versions, we should use the URLs in it for downloading too
- if we're using the manifest for discovering versions (I think this can only occur if the manifest file is specified), we should continue using the manifest for the download location
I also think we should version the ndjson format, and added a few other minor comments inline
| if (checkSum !== undefined && checkSum !== "") { | ||
| checksumUsed = checkSum; | ||
| core.debug("Using user-provided checksum."); | ||
| isValid = await validateFileCheckSum(downloadPath, checkSum); | ||
| } else { | ||
| core.debug("Checksum not provided. Checking known checksums."); | ||
| const key = `${arch}-${platform}-${version}`; | ||
| if (key in KNOWN_CHECKSUMS) { | ||
| const knownChecksum = KNOWN_CHECKSUMS[`${arch}-${platform}-${version}`]; | ||
| core.debug(`Checking checksum for ${arch}-${platform}-${version}.`); | ||
| isValid = await validateFileCheckSum(downloadPath, knownChecksum); | ||
| checksumUsed = KNOWN_CHECKSUMS[key]; | ||
| core.debug(`Using known checksum for ${key}.`); | ||
| isValid = await validateFileCheckSum(downloadPath, checksumUsed); | ||
| } else if (ndjsonChecksum !== undefined && ndjsonChecksum !== "") { | ||
| checksumUsed = ndjsonChecksum; | ||
| core.debug("Using checksum from NDJSON version data."); | ||
| isValid = await validateFileCheckSum(downloadPath, ndjsonChecksum); | ||
| } else { | ||
| core.debug(`No known checksum found for ${key}.`); | ||
| core.debug(`No checksum found for ${key}.`); | ||
| } | ||
| } |
There was a problem hiding this comment.
nit: I would've done a single validateFileCheckSum(downloadPath, checksumUsed) call conditional on checksumUsed !== undefined
src/download/versions-client.ts
Outdated
| const decoder = new TextDecoder(); | ||
| let buffer = ""; | ||
|
|
||
| for await (const chunk of response.body) { | ||
| buffer += decoder.decode(chunk, { stream: true }); | ||
|
|
||
| // Process complete lines | ||
| const lines = buffer.split("\n"); | ||
| // Keep the last potentially incomplete line in buffer | ||
| buffer = lines.pop() ?? ""; | ||
|
|
||
| for (const line of lines) { | ||
| const trimmed = line.trim(); | ||
| if (trimmed === "") { | ||
| continue; | ||
| } | ||
| try { | ||
| const version = JSON.parse(trimmed) as NdjsonVersion; | ||
| versions.push(version); | ||
| } catch { | ||
| core.debug(`Failed to parse NDJSON line: ${trimmed}`); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Process any remaining content in buffer | ||
| const remaining = buffer.trim(); | ||
| if (remaining !== "") { | ||
| try { | ||
| const version = JSON.parse(remaining) as NdjsonVersion; | ||
| versions.push(version); | ||
| } catch { | ||
| core.debug(`Failed to parse NDJSON line: ${remaining}`); | ||
| } | ||
| } |
There was a problem hiding this comment.
this seems overly complicated if we're going to hold all versions in memory anyway. just read the whole body and then split by newlines to parse each json object
|
|
||
| export async function getLatestVersion(): Promise<string> { | ||
| const versions = await fetchVersionData(); | ||
| // The NDJSON file lists versions in order, newest first |
There was a problem hiding this comment.
is this always going to be true? What if we want to "yank" the latest release without breaking existing workflows?
There was a problem hiding this comment.
I think ordering should always be true, yes, otherwise you need to walk the whole file which defeats the purpose of ndjson. I have not considered a yanking mechanism, I sort of considered that out of scope, but I could consider it 🤔 it doesn't really make sense unless we cannot release a fix quickly.
There was a problem hiding this comment.
The only thing I'm worried about is GitHub reliability impeding our ability to quickly push out a fix. It's sad that this is a realistic thing to worry about.
We could maybe work around it by manually crafting an updated ndjson file to effectively republish an older release under a newer version - if we really need to. I haven't really thought about if a first-class yank concept is worth the complexity, up to you :)
| export interface NdjsonArtifact { | ||
| platform: string; | ||
| variant: string; | ||
| url: string; | ||
| archive_format: string; | ||
| sha256: string; | ||
| } | ||
|
|
||
| export interface NdjsonVersion { | ||
| version: string; | ||
| artifacts: NdjsonArtifact[]; | ||
| } |
There was a problem hiding this comment.
Would it be worth versioning this format? Maybe the first json object in the file could describe the version
There was a problem hiding this comment.
It's versioned at the URL-level instead
| "test": "jest", | ||
| "act": "act pull_request -W .github/workflows/test.yml --container-architecture linux/amd64 -s GITHUB_TOKEN=\"$(gh auth token)\"", | ||
| "update-known-versions": "RUNNER_TEMP=known_versions node dist/update-known-versions/index.js src/download/checksum/known-versions.ts \"$(gh auth token)\"", | ||
| "update-known-versions": "RUNNER_TEMP=known_versions node dist/update-known-versions/index.js src/download/checksum/known-checksums.ts version-manifest.json", |
There was a problem hiding this comment.
I think this change needs to be made in the corresponding workflow as well - maybe it could just invoke this command (also maybe it should be renamed update-known-checksums)
src/update-known-versions.ts
Outdated
| filePath: string, | ||
| entries: ArtifactEntry[], | ||
| ): Promise<void> { | ||
| const { promises: fs } = await import("node:fs"); |
There was a problem hiding this comment.
why import this here? Lift it to the top and rewrite as
import { promises as fs } from "node:fs";|
I am currently on vacation without my laptop. Will look over this on Sunday |
eifinger
left a comment
There was a problem hiding this comment.
I highly appreciate something like astral-sh/versions. I have been waiting for it for some time and a lot of code in this repo is just here to work around the fact that it is/was missing.
Some history
I included checksum validation from the very start because just about that time the first big supply chain attacks via GitHub actions started. With the checksums we could make sure that if a user pinned astral/setup-uv to a commit we would be sure that this action itself is valid and that it checks the binary it downloads "from outside" against the list of known checksums. The user provided checksum mechanism is a fallback should I ever fail to update checksums quickly enough.
I added the manifest to make it possible to add custom download locations and or test releases of uv. It also contains the checksums because I planned (but until now never followed up on) to use the manifest for checksums and drop the inline constants in known-checksums.ts.
This PR
adds ndjson via astral-sh/versions. I propose to delete version-manifest.json from this repo and always call out to astral-sh/versions. I would still keep the input for a manifest-url to support custom downloads/releases as stated above.
For the start I would add support for the ndjson format as well and deprecate the existing version-manifest.json format until a breaking release.
I would keep the KNOWN_CHECKSUMS in known-checksums.ts to still give users the security guarantee that even if astral-sh/versions gets hacked, this action would detect a malicious uv binary being downloaded (for known checksums/versions)
…ormat` (#17651) I'm picking up some pretty old work here prompted by astral-sh/setup-uv#737 and a desire to be able to fetch newer `python-build-standalone` versions. Previously, we only supported a static version which means we can construct a known GitHub asset URL trivially. However, to support the "latest" version or version constraints, we need a registry with metadata. The GitHub API is notoriously rate limited, so we don't want to use that. It'd be great to use PyPI (and more broadly, the resolver), but I don't want to introduce it in this code path yet. Instead, this hits https://github.com/astral-sh/versions in order to determine the available versions. We stream the NDJSON line by line to avoid downloading the whole file in order to read one version. Loosely requires #17648 to reach production and be ported to `ruff`, though it's not a blocker.
Prototyping this to avoid hitting GitHub API rate limits when fetching the latest version.
Note this patch was nearly entirely generated by Claude.
I stood up a test repository over at https://github.com/zanieb/test-setup-uv which seems happy https://github.com/zanieb/test-setup-uv/actions/runs/21229282593/job/61084034381
Requires astral-sh/uv#17648 to ship at some point.