_clear_anthropic_env_values mid-stream race window (PR #1727 follow-up)

## Background

Opus advisor on stage-302 (v0.51.7 release pass) reviewed PR #1727 (Claude Code OAuth onboarding) and flagged a small thread-safety concern in `_clear_anthropic_env_values()` (`api/oauth.py:65-77`). Documenting it as a follow-up rather than blocking the v0.51.7 release.

## Concern

`_clear_anthropic_env_values()` mutates `os.environ` directly:

```python
os.environ.pop("ANTHROPIC_TOKEN", None)
os.environ.pop("ANTHROPIC_API_KEY", None)
```

CPython's `dict.pop` is GIL-protected, so concurrent `os.environ.get()` calls can't crash or read torn data. **No memory-safety issue.**

But there's a **semantic race**: if a sibling chat thread has already read `os.environ.get('ANTHROPIC_API_KEY')` into a local string and is mid-flight building an Anthropic outbound request, it will continue using the now-revoked value. The user's outbound call may 401.

There's also a transient asymmetry: a chat thread that reads env vars *between* the two `pop` calls sees `ANTHROPIC_TOKEN` cleared but `ANTHROPIC_API_KEY` still present (or vice versa). Since both are cleared together for the same intent, this asymmetry is harmless in practice, but it's a code smell.

## Practical impact

Low frequency, recoverable:
- The race only manifests if a user re-runs Claude Code OAuth onboarding while another browser tab is mid-stream.
- Onboarding linkage typically happens at first setup with no concurrent chat.
- 401 is recoverable (user retries).

## Suggested fix (not a hard requirement)

Two options:
1. **Onboarding lock**: gate mutation of `os.environ` for credential-related vars behind a single lightweight `threading.RLock` so credential changes are atomic with respect to readers that grab the lock. Requires consumers to also take the lock, so this is more invasive than appears.
2. **Re-read per request**: document that chat threads should call a `resolve_anthropic_token()` helper per outbound request rather than caching at thread start. This is what `agent/anthropic_adapter.py:resolve_anthropic_token` already does — the issue is purely on the WebUI-side onboarding mutation.

Option 2 is probably cleaner since it doesn't introduce new locking primitives.

## Source

Opus stage-302 advisor verdict (May 6 2026):
> Q3 — `#1727 _clear_anthropic_env_values` thread safety — **SHOULD-FIX (small) ⚠**
> ... Recommendation: ship as-is, but consider gating mutation of `os.environ` behind a single onboarding lock and document that chat threads should re-read tokens per outbound request rather than caching at thread start. Not a blocker.

## Acceptance

- [ ] Either add a documented re-read-per-request invariant in `_clear_anthropic_env_values` callers, OR introduce a lightweight onboarding lock.
- [ ] If choosing the lock route: any reader of `ANTHROPIC_TOKEN`/`ANTHROPIC_API_KEY` in a chat-stream context needs to take the same lock briefly when reading.
- [ ] Add a regression test that simulates concurrent onboarding+chat-read to confirm the chosen fix removes the race.

Filed from v0.51.7 stage-302 release (PRs #1725-1730, #1732). Not blocking — `_clear_anthropic_env_values` shipped as-is.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_clear_anthropic_env_values mid-stream race window (PR #1727 follow-up) #1736

Background

Concern

Practical impact

Suggested fix (not a hard requirement)

Source

Acceptance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

_clear_anthropic_env_values mid-stream race window (PR #1727 follow-up) #1736

Description

Background

Concern

Practical impact

Suggested fix (not a hard requirement)

Source

Acceptance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions