Skip to content

Comments

Mark uv pip invocations in linehaul data#16702

Closed
zsol wants to merge 9 commits intomainfrom
zsol/jj-yspoqwnxoqkt
Closed

Mark uv pip invocations in linehaul data#16702
zsol wants to merge 9 commits intomainfrom
zsol/jj-yspoqwnxoqkt

Conversation

@zsol
Copy link
Member

@zsol zsol commented Nov 12, 2025

Summary

This PR makes uv set the installer.name field in the linehaul metadata to uv-pip for all registry requests made from uv pip commands.

Test Plan

  • Added a test case to capture that setting installer_name on a ClientBuilder results in the user agent header changed accordingly.
  • Ran a few manual tests and observed linehaul data being sent and parsed:
UV_DEFAULT_INDEX=https://api.pyx.dev/simple/public/pypi cargo run -- pip install httpx torch

@zsol zsol temporarily deployed to uv-test-registries November 12, 2025 11:15 — with GitHub Actions Inactive
@zsol zsol force-pushed the zsol/jj-yspoqwnxoqkt branch from 7c7022c to 8d7ad79 Compare November 12, 2025 11:26
@zsol zsol temporarily deployed to uv-test-registries November 12, 2025 11:29 — with GitHub Actions Inactive
@zsol zsol changed the title Mark uv pip invocations in linehaul data Mark uv pip invocations in linehaul data Nov 12, 2025
@zsol zsol added registry Related to package indexes and registries uv pip Related to the uv pip interface labels Nov 12, 2025
@zsol zsol marked this pull request as ready for review November 12, 2025 12:05
Self {
installer: Option::from(Installer {
name: Some("uv".to_string()),
name: Some(installer_name.unwrap_or("uv").to_string()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly surprised but linehaul doesn't seem to check this field at all, which is good for us.

Copy link
Collaborator

@samypr100 samypr100 Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of Google BigQuery? I feel I've queried based on installer name in the past 😅 e.g. details.installer.name

Ah, you meant the cloud function doesn't validate the field contents. Ignore my initial reaction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect changing details.installer.name could have some consequences / churn in terms of how statistics are reported for uv in cases where someone using Big Query doesn't group by details.installer.name and does a select on details.installer.name = 'uv' directly for popular clients. I'd expect this to be a common scenario to avoid associated costs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we had an additional field so that querying for uv can be done in a single condition instead of splitting this into two names.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @pradyunsg, @di or @ewdurbin may have better ideas on how to avoid splitting details.installer.name between uv and uv-pip or this use case in general?

I think that'd be great if adding a new field could be a supported option to avoid the confusion of having two potential installer names, but I'm not sure what type of coordination or process would be needed in linehaul's side first. There may be better options I'm missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we won't merge this as-is.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems entirely reasonable to add additional values under the details key, though that can take some time since we no longer get to run our own schema migrations now that our dataset is part of the google public datasets. file an issue at https://github.com/pypi/linehaul-cloud-function/issues and tag me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base automatically changed from zsol/jj-otzultwkyovu to main November 12, 2025 13:55
Comment on lines +23 to +25
/// Spawns a dummy HTTP server that echoes back the User-Agent header.
/// Returns the server URL and the server task handle.
async fn spawn_user_agent_echo_server() -> Result<(DisplaySafeUrl, JoinHandle<()>)> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, a new abstraction to spawing this server will be in #16473

@zanieb zanieb marked this pull request as draft November 13, 2025 18:35
@zsol zsol closed this Nov 13, 2025
@zsol zsol deleted the zsol/jj-yspoqwnxoqkt branch December 5, 2025 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

registry Related to package indexes and registries uv pip Related to the uv pip interface

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants