Skip to content

Comments

Lock to avoid concurrent refresh of pyx tokens#17479

Merged
zanieb merged 14 commits intoastral-sh:mainfrom
zaniebot:claude/credential-helper-token-refresh-oq5GB
Jan 15, 2026
Merged

Lock to avoid concurrent refresh of pyx tokens#17479
zanieb merged 14 commits intoastral-sh:mainfrom
zaniebot:claude/credential-helper-token-refresh-oq5GB

Conversation

@zanieb
Copy link
Member

@zanieb zanieb commented Jan 14, 2026

Prevent concurrent processes from all refreshing the token simultaneously by using a file lock. When a refresh is needed:

  1. Acquire an exclusive file lock (which will block if another process is refreshing)
  2. Re-read the tokens from disk and check if they are fresh enough, instead of making another request
  3. Otherwise, perform the refresh and update the cached tokens

This prevents thundering herd issues when multiple concurrent processes are invoking uv auth token or uv auth helper at the same time.

Also, set the tolerance for uv auth helper to the same as uv auth token (5 minutes).

claude added 11 commits January 14, 2026 23:35
Prevent concurrent processes from all refreshing the token simultaneously
by using a file lock and timestamp-based debounce. When a refresh is needed:

1. Acquire an exclusive lock on refresh.lock
2. Check if another process recently refreshed (within 5 seconds)
3. If so, re-read the tokens from disk instead of making another request
4. Otherwise, perform the refresh and update the timestamp

This prevents thundering herd issues when multiple concurrent processes
invoke `uv auth credential-helper` at the same time.
The lock file is now specific to the token type:
- OAuth tokens: tokens.lock
- API key tokens: {api_key_digest}.lock

This ensures that different API keys don't unnecessarily block each other
when refreshing concurrently.
When another process recently refreshed, we re-read the token from disk.
Now we also verify the token's expiration is beyond the tolerance before
returning it. If the re-read token is stale (e.g., the other process failed
to write a fresh token), we fall through to perform the refresh ourselves.
Consolidate the token freshness check into a single `is_fresh` method
and use it in both places: the initial check and the debounce check.
This reduces duplication and simplifies the code.
The is_fresh helper now includes all the debug logging, so we can use
it in both the initial freshness check and the debounce check without
duplicating the checking logic.
…eason

Move the freshness checking logic to a method on PyxTokens that returns
Result<(), ExpiredTokenReason>. The ExpiredTokenReason enum implements
Display, allowing callers to log "Refreshing token due to {reason}" with
descriptive messages like "missing expiration", "zero tolerance",
"token expired", or "token expiring soon".
ExpiringSoon now carries the expiration timestamp and displays it as
"token will expire within tolerance (`{exp}`)" for more informative logs.
When we read tokens from disk during debounce but they're still not fresh,
log the reason so it's clear why we're falling through to refresh.
… comments

- Expired variant now includes the timestamp like ExpiringSoon
- Removed trailing periods from inline comments
check_fresh now returns Ok(expiration) instead of Ok(()), allowing callers
to log the expiration timestamp when the token is fresh, matching the
original "Access token is up-to-date (`{exp}`)" log message.
@zanieb
Copy link
Member Author

zanieb commented Jan 15, 2026

cc @charliermarsh I think this is probably sound even if it's not causing the Bazel issue @zsol is going to look into?

@zanieb zanieb added the bug Something isn't working label Jan 15, 2026
Comment on lines 401 to 414
/// Read the last refresh timestamp from the lock file.
async fn read_last_refresh(&self, lock_path: &Path) -> Option<u64> {
let data = fs_err::tokio::read_to_string(lock_path).await.ok()?;
data.trim().parse().ok()
}

/// Write the current timestamp to the lock file.
async fn write_last_refresh(&self, lock_path: &Path) -> Result<(), io::Error> {
let now = SystemTime::now()
.duration_since(UNIX_EPOCH)
.expect("system time before Unix epoch")
.as_secs();
fs_err::tokio::write(lock_path, now.to_string()).await
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would version the contents of this lockfile, just in case we need to change its contents in future releases. It could be as simple as the first 2 bytes containing a version marker, followed by content

@zsol
Copy link
Member

zsol commented Jan 15, 2026

This seems to resolve the issue I'm seeing, verified with the following simple script:

#!/usr/bin/env python3
"""Request pyx auth tokens from uv in concurrent attempts."""

import argparse
import asyncio


async def get_auth_token(
    index: int, uv_binary: str
) -> tuple[int, str | None, str | None]:
    """Request an auth token from uv."""
    proc = await asyncio.create_subprocess_exec(
        uv_binary,
        "auth",
        "helper",
        "--preview-features",
        "auth-helper",
        "--protocol",
        "bazel",
        "get",
        stdin=asyncio.subprocess.PIPE,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )
    input_json = b'{"uri": "https://api.pyx.dev/"}'
    stdout, stderr = await proc.communicate(input=input_json)

    if proc.returncode == 0:
        return (index, stdout.decode().strip(), None)
    else:
        return (index, None, stderr.decode().strip())


async def main(invocations: int, uv_binary: str) -> None:
    """Run n concurrent auth token requests."""
    print(f"Starting {invocations} concurrent auth token requests...")

    tasks = [get_auth_token(i, uv_binary) for i in range(invocations)]
    results = await asyncio.gather(*tasks)

    successes = 0
    failures = 0

    for index, token, error in results:
        if token:
            successes += 1
            # Only print first few characters of token for privacy.
            print(f"[{index:3d}] Success: {token[:5]}...")
        else:
            failures += 1
            print(f"[{index:3d}] Failed: {error}")

    print(f"\nResults: {successes} successes, {failures} failures")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Request PyX auth tokens from uv in concurrent attempts"
    )
    parser.add_argument(
        "-n",
        "--invocations",
        type=int,
        default=100,
        help="Number of concurrent requests (default: 100)",
    )
    parser.add_argument(
        "--uv",
        default="uv",
        help="Path to uv binary (default: uv)",
    )
    args = parser.parse_args()
    asyncio.run(main(args.invocations, args.uv))

On my machine, calling this with uv run $script -n 300 would consistently fail on uv 0.9.25, but never with this PR applied.

However, with -n 1000 I'm now running into the default UV_LOCK_TIMEOUT of five minutes

@zsol zsol changed the title Add debounce to pyx token refresh Fix race condition while refreshing pyx tokens Jan 15, 2026
@zsol zsol marked this pull request as ready for review January 15, 2026 15:41
@zanieb zanieb changed the title Fix race condition while refreshing pyx tokens Lock to avoid concurrent refresh of pyx tokens Jan 15, 2026
@zanieb zanieb merged commit 0920a0e into astral-sh:main Jan 15, 2026
69 checks passed
@zanieb
Copy link
Member Author

zanieb commented Jan 15, 2026

We opted not to include the complexity of a debounce for now, and instead just changed the tolerance for the credential-helper fetch.

tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Jan 17, 2026
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.9.24` → `0.9.26` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>astral-sh/uv (astral-sh/uv)</summary>

### [`v0.9.26`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0926)

[Compare Source](astral-sh/uv@0.9.25...0.9.26)

Released on 2026-01-15.

##### Python

- Add CPython 3.15.0a5

##### Enhancements

- Add a hint to update uv when a managed Python download is not found ([#&#8203;17461](astral-sh/uv#17461))
- Improve cache initialization failure error message ([#&#8203;17469](astral-sh/uv#17469))
- Improve error message for abi3 wheels on free-threaded Python ([#&#8203;17442](astral-sh/uv#17442))
- Add support for `--no-sources-package` ([#&#8203;14910](astral-sh/uv#14910))

##### Preview features

- Add `METADATA.json` and `WHEEL.json` in uv build backend ([#&#8203;15510](astral-sh/uv#15510))
- Add support for GCS request signing ([#&#8203;17474](astral-sh/uv#17474))
- Adjust the process ulimit to the maximum allowed on startup ([#&#8203;17464](astral-sh/uv#17464))

##### Bug fixes

- Lock to avoid concurrent refresh of pyx tokens ([#&#8203;17479](astral-sh/uv#17479))

##### Documentation

- Add linting and formatting instructions to the CONTRIBUTING guide ([#&#8203;17470](astral-sh/uv#17470))
- Avoid rendering `pyproject.toml` examples for more system-level settings ([#&#8203;17462](astral-sh/uv#17462))

### [`v0.9.25`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0925)

[Compare Source](astral-sh/uv@0.9.24...0.9.25)

Released on 2026-01-13.

##### Python

- Add CPython 3.15.0a4
- Upgrade Tcl/Tk used by CPython to 9.0

##### Enhancements

- Add `--compile-bytecode` to `uv python install` and `uv python upgrade` to compile the standard library ([#&#8203;17088](astral-sh/uv#17088))
- Allow disabling `exclude-newer` per package ([#&#8203;16854](astral-sh/uv#16854))
- Broadcast `WM_SETTINGCHANGE` on `uv tool update-shell` ([#&#8203;17404](astral-sh/uv#17404))

##### Preview features

- Detect workspace from `uv run` target ([#&#8203;17423](astral-sh/uv#17423))

##### Bug fixes

- Avoid unwrapping size for file responses ([#&#8203;17434](astral-sh/uv#17434))
- Use keyring authentication when retrieving `tool@latest` version ([#&#8203;17448](astral-sh/uv#17448))
- Use latest Pyodide version for each python version ([#&#8203;17372](astral-sh/uv#17372))
- Improve trampoline file handle closing ([#&#8203;17374](astral-sh/uv#17374))
- Fix error message when installing musl python on armv7 ([#&#8203;17213](astral-sh/uv#17213))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi44MC4xIiwidXBkYXRlZEluVmVyIjoiNDIuODEuOCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90IiwiYXV0b21hdGlvbjpib3QtYXV0aG9yZWQiLCJkZXBlbmRlbmN5LXR5cGU6OnBhdGNoIl19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants