Skip to content

Retrieve version metadata from astral-sh/versions ndjson instead of the GitHub API#737

Draft
zanieb wants to merge 2 commits intomainfrom
zb/versions-ndjson
Draft

Retrieve version metadata from astral-sh/versions ndjson instead of the GitHub API#737
zanieb wants to merge 2 commits intomainfrom
zb/versions-ndjson

Conversation

@zanieb
Copy link
Member

@zanieb zanieb commented Jan 21, 2026

Prototyping this to avoid hitting GitHub API rate limits when fetching the latest version.

Note this patch was nearly entirely generated by Claude.

I stood up a test repository over at https://github.com/zanieb/test-setup-uv which seems happy https://github.com/zanieb/test-setup-uv/actions/runs/21229282593/job/61084034381

Requires astral-sh/uv#17648 to ship at some point.

@zanieb zanieb force-pushed the zb/versions-ndjson branch from 98291be to 0a4c510 Compare January 22, 2026 00:09
Copy link
Member

@zsol zsol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is OK, but maybe not enough. downloadVersionFromManifest still depends on the manifest shipped with the action, which will get used over the ndjson for versions it contains. I'd much prefer it if the version and the download url would come from the same location:

  • if we're using ndjson for discovering versions, we should use the URLs in it for downloading too
  • if we're using the manifest for discovering versions (I think this can only occur if the manifest file is specified), we should continue using the manifest for the download location

I also think we should version the ndjson format, and added a few other minor comments inline

Comment on lines 20 to 37
if (checkSum !== undefined && checkSum !== "") {
checksumUsed = checkSum;
core.debug("Using user-provided checksum.");
isValid = await validateFileCheckSum(downloadPath, checkSum);
} else {
core.debug("Checksum not provided. Checking known checksums.");
const key = `${arch}-${platform}-${version}`;
if (key in KNOWN_CHECKSUMS) {
const knownChecksum = KNOWN_CHECKSUMS[`${arch}-${platform}-${version}`];
core.debug(`Checking checksum for ${arch}-${platform}-${version}.`);
isValid = await validateFileCheckSum(downloadPath, knownChecksum);
checksumUsed = KNOWN_CHECKSUMS[key];
core.debug(`Using known checksum for ${key}.`);
isValid = await validateFileCheckSum(downloadPath, checksumUsed);
} else if (ndjsonChecksum !== undefined && ndjsonChecksum !== "") {
checksumUsed = ndjsonChecksum;
core.debug("Using checksum from NDJSON version data.");
isValid = await validateFileCheckSum(downloadPath, ndjsonChecksum);
} else {
core.debug(`No known checksum found for ${key}.`);
core.debug(`No checksum found for ${key}.`);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would've done a single validateFileCheckSum(downloadPath, checksumUsed) call conditional on checksumUsed !== undefined

Comment on lines 41 to 75
const decoder = new TextDecoder();
let buffer = "";

for await (const chunk of response.body) {
buffer += decoder.decode(chunk, { stream: true });

// Process complete lines
const lines = buffer.split("\n");
// Keep the last potentially incomplete line in buffer
buffer = lines.pop() ?? "";

for (const line of lines) {
const trimmed = line.trim();
if (trimmed === "") {
continue;
}
try {
const version = JSON.parse(trimmed) as NdjsonVersion;
versions.push(version);
} catch {
core.debug(`Failed to parse NDJSON line: ${trimmed}`);
}
}
}

// Process any remaining content in buffer
const remaining = buffer.trim();
if (remaining !== "") {
try {
const version = JSON.parse(remaining) as NdjsonVersion;
versions.push(version);
} catch {
core.debug(`Failed to parse NDJSON line: ${remaining}`);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems overly complicated if we're going to hold all versions in memory anyway. just read the whole body and then split by newlines to parse each json object


export async function getLatestVersion(): Promise<string> {
const versions = await fetchVersionData();
// The NDJSON file lists versions in order, newest first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this always going to be true? What if we want to "yank" the latest release without breaking existing workflows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ordering should always be true, yes, otherwise you need to walk the whole file which defeats the purpose of ndjson. I have not considered a yanking mechanism, I sort of considered that out of scope, but I could consider it 🤔 it doesn't really make sense unless we cannot release a fix quickly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'm worried about is GitHub reliability impeding our ability to quickly push out a fix. It's sad that this is a realistic thing to worry about.

We could maybe work around it by manually crafting an updated ndjson file to effectively republish an older release under a newer version - if we really need to. I haven't really thought about if a first-class yank concept is worth the complexity, up to you :)

Comment on lines +5 to +16
export interface NdjsonArtifact {
platform: string;
variant: string;
url: string;
archive_format: string;
sha256: string;
}

export interface NdjsonVersion {
version: string;
artifacts: NdjsonArtifact[];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth versioning this format? Maybe the first json object in the file could describe the version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's versioned at the URL-level instead

"test": "jest",
"act": "act pull_request -W .github/workflows/test.yml --container-architecture linux/amd64 -s GITHUB_TOKEN=\"$(gh auth token)\"",
"update-known-versions": "RUNNER_TEMP=known_versions node dist/update-known-versions/index.js src/download/checksum/known-versions.ts \"$(gh auth token)\"",
"update-known-versions": "RUNNER_TEMP=known_versions node dist/update-known-versions/index.js src/download/checksum/known-checksums.ts version-manifest.json",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change needs to be made in the corresponding workflow as well - maybe it could just invoke this command (also maybe it should be renamed update-known-checksums)

filePath: string,
entries: ArtifactEntry[],
): Promise<void> {
const { promises: fs } = await import("node:fs");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why import this here? Lift it to the top and rewrite as

import { promises as fs } from "node:fs";

@eifinger
Copy link
Collaborator

I am currently on vacation without my laptop. Will look over this on Sunday

Copy link
Collaborator

@eifinger eifinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly appreciate something like astral-sh/versions. I have been waiting for it for some time and a lot of code in this repo is just here to work around the fact that it is/was missing.

Some history

I included checksum validation from the very start because just about that time the first big supply chain attacks via GitHub actions started. With the checksums we could make sure that if a user pinned astral/setup-uv to a commit we would be sure that this action itself is valid and that it checks the binary it downloads "from outside" against the list of known checksums. The user provided checksum mechanism is a fallback should I ever fail to update checksums quickly enough.

I added the manifest to make it possible to add custom download locations and or test releases of uv. It also contains the checksums because I planned (but until now never followed up on) to use the manifest for checksums and drop the inline constants in known-checksums.ts.

This PR

adds ndjson via astral-sh/versions. I propose to delete version-manifest.json from this repo and always call out to astral-sh/versions. I would still keep the input for a manifest-url to support custom downloads/releases as stated above.

For the start I would add support for the ndjson format as well and deprecate the existing version-manifest.json format until a breaking release.

I would keep the KNOWN_CHECKSUMS in known-checksums.ts to still give users the security guarantee that even if astral-sh/versions gets hacked, this action would detect a malicious uv binary being downloaded (for known checksums/versions)

zanieb added a commit to astral-sh/uv that referenced this pull request Feb 11, 2026
…ormat` (#17651)

I'm picking up some pretty old work here prompted by
astral-sh/setup-uv#737 and a desire to be able
to fetch newer `python-build-standalone` versions.

Previously, we only supported a static version which means we can
construct a known GitHub asset URL trivially. However, to support the
"latest" version or version constraints, we need a registry with
metadata. The GitHub API is notoriously rate limited, so we don't want
to use that. It'd be great to use PyPI (and more broadly, the resolver),
but I don't want to introduce it in this code path yet. Instead, this
hits https://github.com/astral-sh/versions in order to determine the
available versions. We stream the NDJSON line by line to avoid
downloading the whole file in order to read one version.

Loosely requires #17648 to reach
production and be ported to `ruff`, though it's not a blocker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants