Skip to content

fix(llm-cli): harden codex and claude-code auth probe paths#1295

Merged
muddlebee merged 11 commits intoTracer-Cloud:mainfrom
muddlebee:fix/claude-cli-flow-audit-1260
May 5, 2026
Merged

fix(llm-cli): harden codex and claude-code auth probe paths#1295
muddlebee merged 11 commits intoTracer-Cloud:mainfrom
muddlebee:fix/claude-cli-flow-audit-1260

Conversation

@muddlebee
Copy link
Copy Markdown
Collaborator

Fixes #1260

Describe the changes you have made in this PR -

  • handle claude auth probe bootstrap failures safely by catching runner import failures and returning an unclear-auth probe result instead of crashing
  • normalize user-facing Claude auth hints to remove duplicate spacing and keep messaging consistent across adapter paths
  • improve interactive shell response extraction to support structured LLM content blocks (list / dict / .text objects)
  • add regression tests for probe import failures, auth hint formatting, and structured interactive-shell response rendering

Demo/Screenshot for feature changes and bug fixes -

  • uv run pytest tests/integrations/llm_cli/test_claude_code_adapter.py tests/cli/interactive_shell/test_cli_agent.py
  • Result: 60 passed

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

  • The main failure mode fixed here is an avoidable crash during Claude auth probing when the helper import path is unavailable. The adapter now treats that condition as an unknown auth state, which preserves diagnostic flow and surfaces an actionable message instead of throwing at import time.
  • I kept auth messaging centralized by introducing one shared hint constant and using it in all unauthenticated/unclear branches. This avoids drift between probe/classify/runtime errors and removes formatting inconsistencies from user-facing output.
  • For interactive shell robustness, response parsing now handles structured content payloads from providers that return block objects instead of plain strings. This preserves intended text output and avoids leaking Python repr-style payloads in terminal responses.

Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

muddlebee added 2 commits May 5, 2026 11:34
- align Claude auth probe env with runtime subprocess env filtering
- classify non-JSON "not logged in" auth output as unauthenticated
- standardize Claude auth hint/messages to `claude auth login`
- enrich runtime failure messaging when auth probe status is unclear
- include `claude-code` in interactive-shell action-planner provider list
- add doctor checks for CLI provider install/auth readiness (codex/claude-code)
- add regression tests for probe env handling, auth parsing, runner errors, and doctor checks
- handle ImportError when preparing claude auth probe env and return a safe unclear-auth result
- normalize claude auth hint strings to remove duplicate spaces in user-facing messages
- support structured LLM content blocks in interactive shell responses
- add regression tests for probe import failures and structured response rendering
@muddlebee muddlebee marked this pull request as ready for review May 5, 2026 06:17
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 5, 2026

Greptile Summary

  • Extracts subprocess env filtering into a dedicated subprocess_env.py module with a stable public API (build_cli_subprocess_env), resolving the previous fragile importlib/getattr access pattern; runner.py and claude_code.py both migrate to the new import, with a back-compat alias left on runner for any unconverted consumers.
  • Centralises auth messaging behind _AUTH_HINT and _anthropic_env_overrides(), adds negative-marker text parsing for older Claude CLI versions that return plain-text instead of JSON, and wires the registry-driven CLI probe into opensre doctor.
  • Improves interactive-shell response extraction via _response_text() to handle structured content payloads (list of block objects/dicts) that previously leaked as Python repr strings into the terminal.

Confidence Score: 5/5

Safe to merge — no P0 or P1 issues found; all changes are well-tested and backwards-compatible.

All affected code paths are covered by regression tests (60 passing). The refactoring is non-breaking (back-compat alias retained), auth env filtering is verified not to leak ANTHROPIC_API_KEY to other adapters, and the structured-response parser handles every documented content shape. Only minor P2 style findings remain.

No files require special attention.

Important Files Changed

Filename Overview
app/integrations/llm_cli/subprocess_env.py New dedicated module extracting subprocess env filtering from runner.py; adds CLAUDE_ prefix to allowlist and exposes a public build_cli_subprocess_env API.
app/integrations/llm_cli/claude_code.py Centralises auth hint into _AUTH_HINT constant, extracts _anthropic_env_overrides(), switches auth probe to filtered env via build_cli_subprocess_env, adds negative-marker fallback for older CLI versions.
app/integrations/llm_cli/runner.py Removes duplicated env-filter code (now in subprocess_env.py), adds auth_probe_unclear flag, improves error message composition to avoid double punctuation on unclear-auth failures.
app/cli/interactive_shell/cli_agent.py Adds _response_text() to handle structured LLM content payloads (list/dict/object with .text), fixes provider list to include claude-code.
app/cli/commands/doctor.py Integrates registry-driven CLI provider probe into the doctor check, short-circuiting before API-key env-var checks for CLI-backed providers.
tests/integrations/llm_cli/test_claude_code_adapter.py Adds tests for negative-marker auth detection, filtered probe env, auth hint formatting, and unclear-auth error message composition.
tests/integrations/llm_cli/test_codex_adapter.py Adds regression tests for unclear-auth suffix appended to RuntimeErrors and double-period prevention.
tests/cli/test_doctor.py New test file covering doctor _check_llm_provider for hosted-key, CLI-ready, CLI-auth-unclear, and registry-driven probe paths.
tests/cli/interactive_shell/test_cli_agent.py Extends existing test suite with structured-content-block rendering assertion and claude-code provider string presence check.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CLIBackedLLMClient.invoke] --> B[_probe adapter.detect]
    B --> C{probe.installed?}
    C -- No --> D[raise RuntimeError: CLI not found]
    C -- Yes --> E{probe.logged_in?}
    E -- False --> F[raise RuntimeError: not authenticated]
    E -- None --> G[auth_probe_unclear = True]
    E -- True --> H[auth_probe_unclear = False]
    G --> I[adapter.build invocation]
    H --> I
    I --> J[build_cli_subprocess_env + overrides]
    J --> K[subprocess.run CLI]
    K --> L{returncode != 0?}
    L -- Yes, unclear auth --> M[explain_failure + auth hint suffix]
    L -- Yes, auth known --> N[explain_failure only]
    L -- No --> O[adapter.parse stdout]
    O --> P[LLMResponse content]
    subgraph subprocess_env.py
        J
    end
    subgraph claude_code.py _probe_cli_auth
        Q[claude auth status subprocess] --> R{JSON output?}
        R -- Yes --> S{loggedIn?}
        S -- No --> T[False _AUTH_HINT]
        S -- Yes --> U[True source detail]
        R -- No --> V{negative markers in plain text?}
        V -- Yes --> T
        V -- No --> W[True CLI fallback]
    end
Loading

Reviews (2): Last reviewed commit: "fix(cli): derive doctor CLI providers fr..." | Re-trigger Greptile

Comment thread app/integrations/llm_cli/claude_code.py Outdated
Comment thread app/integrations/llm_cli/runner.py Outdated
muddlebee added 3 commits May 5, 2026 11:50
- satisfy CI format-check for files touched by the Claude Code CLI flow
- add subprocess_env.build_cli_subprocess_env for filtered CLI subprocess env
- use it from runner (alias _build_subprocess_env) and claude_code auth probe
- compose unclear-auth failure messages without double periods after explain_failure
- update tests: import public helper; add regression for trailing-period explain_failure
- probe CLI auth/install via get_cli_provider_registration instead of hardcoded IDs
- document registry/doctor coupling and subprocess_env allowlist in AGENTS.md
- add test asserting CLI branch works for any registry provider id
@muddlebee
Copy link
Copy Markdown
Collaborator Author

@greptileai review

@muddlebee
Copy link
Copy Markdown
Collaborator Author

@greptileai review new changes

- wizard quickstart selects claude-code, runs CLI onboarding, syncs CLAUDE_CODE_MODEL
- credential summary line matches ClaudeCodeAdapter.auth_hint
- LLMSettings.from_env accepts claude-code without hosted API keys
- _create_llm_client wires CLIBackedLLMClient + ClaudeCodeAdapter and CLAUDE_CODE_MODEL
Copy link
Copy Markdown
Collaborator

@yashksaini-coder yashksaini-coder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. I ran a multi-role review (senior eng, QA, greptile verification) on HEAD d24987aa and the PR is in good shape.

The refactor is clean. runner.py shed 56 lines, subprocess_env.py picked up 64, and the diff between them is line-for-line a move with no logic loss. Public API of runner.py is preserved, and you kept _build_subprocess_env = build_cli_subprocess_env as a back-compat alias which is the right call. Cross-OS allowlist in subprocess_env.py covers POSIX (USER, HOME, TMPDIR) and Windows (USERPROFILE, APPDATA, LOCALAPPDATA, PATHEXT, SYSTEMROOT, WINDIR, COMSPEC) which lines up with issue #1260's cross-OS parity ask.

Both of greptile's P2 flags are stale at this HEAD. I checked each one:

  • The getattr(runner_module, "_build_subprocess_env", None) pattern greptile cited doesn't exist anywhere in the repo. claude_code.py:37 is a clean direct import. The bot quoted code that isn't in the file. Nothing to do.
  • The double-period concern is already guarded at runner.py:140-144 (the endswith((".", "!", "?")) branch) and you have an explicit regression test at tests/integrations/llm_cli/test_codex_adapter.py:247-285 named test_cli_backed_client_unclear_auth_no_double_period_when_explain_failure_trailing_period that asserts ".." not in msg. That's exactly the right shape.

The AGENTS.md updates match the new code structure (new subprocess_env.py row at line 12, CLAUDE_ prefix added, doctor-discovery note at line 11), so the local guidance and the implementation agree.

A few non-blocking follow-ups I'd suggest, none needed for this merge:

  1. _response_text in cli_agent.py:158-185 is good defensive coding but it's the kind of normalization that probably belongs once on LLMResponse or in app/services/llm_client.py so every caller benefits. Worth a follow-up issue, not a change here.
  2. runner.py:140 has # Avoid \"..\" with escaped quotes from source-editing. Reads awkwardly. Tiny.
  3. Test coverage gap in tests/cli/interactive_shell/test_cli_agent.py:184-196: the structured-content test mixes a .text-attribute object and a dict block, but doesn't cover (a) a bare str element inside the list, or (b) the getattr(content, "text", None) non-list/non-dict branch at cli_agent.py:182-185. Inline comment with specifics below.
  4. Doctor coverage: tests/cli/test_doctor.py covers unset / hosted-missing-key / CLI-ready / CLI-auth-unclear / registry paths, but the installed=False and logged_in=False branches in doctor.py:66-69 aren't exercised. Reasonable scope for the new file, but worth noting.

Lint and format clean (10 files), and the linked greptile suppressions are commented in code (commit 48ebb47c "stable subprocess env API for Greptile nits"). Nice work.

text_value = getattr(content, "text", None)
if isinstance(text_value, str):
return text_value
return str(content)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking follow-up. This _response_text is doing real work, but it's normalizing LLM response shapes that every other caller of get_llm_for_reasoning() will eventually need to handle too. Worth lifting into app/services/llm_client.py as something like extract_text(response) so future consumers don't reimplement the same str | dict | list | object-with-.text ladder. Fine to keep here for this PR and split that out separately.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

assert session.cli_agent_messages[-1] == ("assistant", "First line\nSecond line")

def test_llm_failure_prints_red_error_and_does_not_record(self, monkeypatch: Any) -> None:
class _Boom:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small coverage gaps in _response_text that this test doesn't hit yet:

  1. List with a bare str element (the if isinstance(item, str): blocks.append(item) branch at cli_agent.py:170-172). Realistic for older Anthropic SDK responses that mix string and block items in content.
  2. The non-list, non-dict object with .text attribute (cli_agent.py:182-185) — separate from the in-list .text case you already cover.

Not blocking. Worth adding both since you've already got the fixture pattern in place.

Copy link
Copy Markdown
Collaborator Author

@muddlebee muddlebee May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more coverage in subsequent commits

muddlebee added 2 commits May 5, 2026 12:10
- Add write_fake_runnable_cli_bin helper (.exe on win32, chmod elsewhere)
- Use helper in codex/claude adapter and wizard path-override tests
- Reorder imports in test_codex_adapter for ruff isort
- add Windows fallback binary lookup for Volta and pnpm install paths
- expand `~` in Claude workspace cwd to keep path handling consistent
- add Windows auth classification coverage for no-credential fallback
muddlebee added 3 commits May 5, 2026 12:28
- Refactor error message construction in CLIBackedLLMClient to avoid double periods.
- Update tests to ensure clarity in error messages related to authentication status.
- Enhance test assertions for clearer validation of error message content.
Pass OpenAI Platform auth/config keys via CLIInvocation.env so filtered
invoke env merges them; add adapter and runner merge tests.
- treat Codex auth as valid when login status is negative/unclear but OPENAI_API_KEY is present
- reuse Claude auth env utility to derive fallback auth source for API key or auth token
- add adapter tests for subscription/API fallback matrix and unclear auth probe scenarios
@muddlebee muddlebee changed the title fix(llm-cli): harden claude-code auth probe paths fix(llm-cli): harden codex and claude-code auth probe paths May 5, 2026
@muddlebee muddlebee force-pushed the fix/claude-cli-flow-audit-1260 branch from a6b8d0e to bc2219a Compare May 5, 2026 09:25
@muddlebee muddlebee merged commit b412445 into Tracer-Cloud:main May 5, 2026
17 of 19 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

🍵 @muddlebee made tea, opened a PR, and merged before it cooled. No notes. ☕


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

muddlebee pushed a commit that referenced this pull request May 5, 2026
…thod (#1260) (#1302)

* fix(claude-code): parse auth status JSON before exit code; use authMethod (#1260)

Follow-up to #1295. Two cases of `claude auth status` were misclassified:

1. No auth + no env key → CLI exits 1 with valid JSON
   `{"loggedIn": false, "authMethod": "none"}`. The probe short-circuited
   on `returncode != 0` before parsing JSON and returned `None`, so the
   wizard surfaced "could not verify Claude Code CLI login" instead of the
   correct "Claude Code CLI requires login" — same logged_in=None surface
   #1199 was about, just without the env-key fallback to rescue it.

2. Subscription + ANTHROPIC_API_KEY in env → CLI emits both
   `authMethod: "claude.ai"` and `apiKeySource: "ANTHROPIC_API_KEY"`. The
   prior code branched on `apiKeySource`-presence and reported
   "Authenticated via ANTHROPIC_API_KEY" even though the active method was
   the subscription. `authMethod` (`claude.ai` / `api_key` / `none`) is
   the authoritative discriminator; `apiKeySource` is set whenever the env
   contributes a key, regardless of which method is in use.

Implementation:
- Extract `_auth_status_from_json_payload` and `_try_parse_auth_status_stdout`
  so the JSON path is testable in isolation.
- `_probe_cli_auth` now parses stdout first; the exit-code error path is a
  fallback only when JSON is absent or unparseable.
- Detail-string branching uses `authMethod` first; falls back to
  `apiKeySource` / `email` for older CLI versions that omit `authMethod`.

Live verification on claude 2.1.123 (all four states):
- subscription only           → "Authenticated via Claude subscription (...)"
- api key only                → "Authenticated via ANTHROPIC_API_KEY."
- subscription + env API key  → "Authenticated via Claude subscription (...)"  (was: "via ANTHROPIC_API_KEY")
- no auth (fresh HOME)        → (False, "Not authenticated. ...")             (was: (None, "claude auth status failed: ..."))

Adds two regression tests:
- test_probe_cli_auth_not_logged_in_exits_1 — exit 1 + valid JSON loggedIn:false
- test_probe_cli_auth_subscription_with_env_api_key_reports_subscription
- test_probe_cli_auth_api_key_only_uses_authmethod (covers the api_key branch)

make lint / format-check / typecheck clean. make test-cov: 4967 passed.

* fix(claude-code): guard unrecognized authMethod from legacy fallback

Future authMethod values (e.g. oauth/sso) emitted alongside a populated
apiKeySource were skipping both explicit branches and landing on the
legacy heuristic, mis-reporting the env var as the active source — the
same kind of mis-attribution this PR fixes for claude.ai. Surface the
authMethod verbatim instead. Adds the missing stderr-only-JSON contract
test so a future stderr-parsing refactor cannot silently change
_probe_cli_auth's return value.
yashksaini-coder pushed a commit to gitsofaryan/opensre that referenced this pull request May 5, 2026
Resolved conflict in app/integrations/llm_cli/runner.py.

PR Tracer-Cloud#1295 extracted the subprocess env allowlist into a new
app/integrations/llm_cli/subprocess_env.py module. This branch had
modified the inline list in runner.py to add the KIMI_ prefix.
Resolution:
- runner.py: take origin/main version (imports build_cli_subprocess_env
  from subprocess_env.py, keeps _build_subprocess_env back-compat alias).
- subprocess_env.py: add "KIMI_" to _SAFE_SUBPROCESS_ENV_PREFIXES so
  Kimi env vars (KIMI_API_KEY, KIMI_BASE_URL, etc.) are forwarded to
  the subprocess.

This preserves both PR Tracer-Cloud#1295's refactor and PR Tracer-Cloud#1139's Kimi support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audit Claude Code CLI: subscription login vs API key and cross-OS parity

3 participants