Skip to content

_clear_anthropic_env_values mid-stream race window (PR #1727 follow-up) #1736

@nesquena-hermes

Description

@nesquena-hermes

Background

Opus advisor on stage-302 (v0.51.7 release pass) reviewed PR #1727 (Claude Code OAuth onboarding) and flagged a small thread-safety concern in _clear_anthropic_env_values() (api/oauth.py:65-77). Documenting it as a follow-up rather than blocking the v0.51.7 release.

Concern

_clear_anthropic_env_values() mutates os.environ directly:

os.environ.pop("ANTHROPIC_TOKEN", None)
os.environ.pop("ANTHROPIC_API_KEY", None)

CPython's dict.pop is GIL-protected, so concurrent os.environ.get() calls can't crash or read torn data. No memory-safety issue.

But there's a semantic race: if a sibling chat thread has already read os.environ.get('ANTHROPIC_API_KEY') into a local string and is mid-flight building an Anthropic outbound request, it will continue using the now-revoked value. The user's outbound call may 401.

There's also a transient asymmetry: a chat thread that reads env vars between the two pop calls sees ANTHROPIC_TOKEN cleared but ANTHROPIC_API_KEY still present (or vice versa). Since both are cleared together for the same intent, this asymmetry is harmless in practice, but it's a code smell.

Practical impact

Low frequency, recoverable:

  • The race only manifests if a user re-runs Claude Code OAuth onboarding while another browser tab is mid-stream.
  • Onboarding linkage typically happens at first setup with no concurrent chat.
  • 401 is recoverable (user retries).

Suggested fix (not a hard requirement)

Two options:

  1. Onboarding lock: gate mutation of os.environ for credential-related vars behind a single lightweight threading.RLock so credential changes are atomic with respect to readers that grab the lock. Requires consumers to also take the lock, so this is more invasive than appears.
  2. Re-read per request: document that chat threads should call a resolve_anthropic_token() helper per outbound request rather than caching at thread start. This is what agent/anthropic_adapter.py:resolve_anthropic_token already does — the issue is purely on the WebUI-side onboarding mutation.

Option 2 is probably cleaner since it doesn't introduce new locking primitives.

Source

Opus stage-302 advisor verdict (May 6 2026):

Q3 — #1727 _clear_anthropic_env_values thread safety — SHOULD-FIX (small) ⚠
... Recommendation: ship as-is, but consider gating mutation of os.environ behind a single onboarding lock and document that chat threads should re-read tokens per outbound request rather than caching at thread start. Not a blocker.

Acceptance

  • Either add a documented re-read-per-request invariant in _clear_anthropic_env_values callers, OR introduce a lightweight onboarding lock.
  • If choosing the lock route: any reader of ANTHROPIC_TOKEN/ANTHROPIC_API_KEY in a chat-stream context needs to take the same lock briefly when reading.
  • Add a regression test that simulates concurrent onboarding+chat-read to confirm the chosen fix removes the race.

Filed from v0.51.7 stage-302 release (PRs #1725-1730, #1732). Not blocking — _clear_anthropic_env_values shipped as-is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions