feat: add CLI framework base and codex CLI integration #781
feat: add CLI framework base and codex CLI integration #781Devesh36 merged 18 commits intoTracer-Cloud:mainfrom
Conversation
- Introduced support for OpenAI Codex as a CLI provider, allowing non-interactive execution via subprocess. - Updated `.env.example` to include optional Codex configuration parameters. - Enhanced `AGENTS.md` with details on the new Codex integration and its usage. - Modified `app/config.py` to include Codex in the list of valid LLM providers. - Updated CLI wizard to handle Codex onboarding, including environment variable synchronization. - Added tests for Codex integration and CLI onboarding process. - Refactored related components to support the new CLI-based LLM interactions.
Greptile SummaryThis PR adds a subprocess-backed LLM CLI framework ( Key changes:
Confidence Score: 4/5Safe to merge after fixing the _LLMClientType | Any annotation; all prior P0/P1 bugs from earlier rounds are resolved All three previously flagged issues (auth substring collision, per-invoke probe latency, unreachable exception handler) are addressed. The framework is well-tested with 9 wizard state-machine tests and thorough CodexAdapter unit tests. The one remaining P1 is the _LLMClientType | Any annotation that silently disables type safety for the entire client union — fixable in a single line with TYPE_CHECKING. Two P2 style nits do not affect correctness. app/services/llm_client.py (line 440 — _LLMClientType | Any type annotation) Important Files Changed
Sequence DiagramsequenceDiagram
participant W as Wizard (flow.py)
participant OB as _run_cli_llm_onboarding
participant CA as CodexAdapter
participant SP as subprocess
participant C as CLIBackedLLMClient
participant GR as GuardrailEngine
W->>OB: provider (credential_kind=cli)
loop up to 10 attempts
OB->>CA: detect()
CA->>SP: codex --version
SP-->>CA: version / error
CA->>SP: codex login status
SP-->>CA: auth output
CA-->>OB: CLIProbe(installed, logged_in, detail)
alt installed and logged_in=True
OB-->>W: ok
else installed and logged_in not True
OB->>W: show login menu retry or repick
else not installed
OB->>W: show install menu retry or path or repick
end
end
OB-->>W: abort (max retries)
note over C,GR: At invocation time
C->>GR: engine.apply(flat_prompt)
C->>C: _probe() TTL cache 45s
C->>CA: build(prompt, model, workspace)
CA->>SP: codex exec --ephemeral -s read-only -C ws -
SP-->>CA: stdout / stderr / returncode
CA-->>C: parse() content
C-->>C: LLMResponse(content)
Reviews (3): Last reviewed commit: "fix(codeql): revert TYPE_CHECKING guard,..." | Re-trigger Greptile |
Made-with: Cursor
- Modified `.env.example` to clarify that an empty `CODEX_MODEL` uses the Codex CLI's configured default model. - Removed `CODEX_LLM_CONFIG` from `app/config.py` as it is no longer needed. - Updated `CODEX_MODELS` in `app/cli/wizard/config.py` to reflect the new model handling. - Adjusted `CodexAdapter` in `app/integrations/llm_cli/codex.py` to accommodate the empty model value. - Refactored `llm_client.py` to use `None` for the model when `CODEX_MODEL` is empty. - Updated tests to ensure the new behavior is correctly implemented and validated.
- Normalize repo map table formatting - Use checklist checkboxes in integration section - Drop redundant llm_cli bullet (matches main layout) Made-with: Cursor
- Updated the onboarding process to handle authentication status more effectively, allowing users to retry or repick providers based on their login state. - Introduced clearer prompts for users when authentication is required or unclear. - Adjusted tests to validate the new onboarding behavior, ensuring correct handling of login states and user choices during the onboarding process.
- Introduced a new method to identify candidate binary names for Codex, improving compatibility across platforms. - Refactored fallback path logic to include environment variables and npm prefix directories, ensuring more robust detection of the Codex binary. - Updated tests to validate the new fallback mechanism and ensure correct binary detection behavior.
- Satisfy CodeQL: avoid mixed import and import-from for the same module. - Import CodexAdapter, _fallback_codex_paths, _npm_prefix_bin_dirs from one from-block; use pathlib.Path for home paths. Made-with: Cursor
- Match main: do not version the uv lockfile. - Remove uv.lock from the index; file may remain locally. Made-with: Cursor
- Fix CI ruff format --check on app/ and tests/. Made-with: Cursor
- Updated the `argv` attribute in `CLIInvocation` to use a tuple instead of a list for better immutability. - Adjusted the instantiation of `CLIInvocation` in `CodexAdapter` to reflect this change. - Modified the `subprocess.run` call in `CLIBackedLLMClient` to convert `argv` back to a list for compatibility. - Updated type hints in `llm_client.py` to use `Union` for forward references, avoiding import cycles.
|
|
||
| _LLMClientType = LLMClient | OpenAILLMClient | BedrockLLMClient | ||
| # CLIBackedLLMClient is included but omitted from the union to avoid import cycles. | ||
| _LLMClientType = LLMClient | OpenAILLMClient | BedrockLLMClient | Any |
There was a problem hiding this comment.
| Any collapses the union to Any, defeating type safety
In Python's typing system, X | Any evaluates to Any — the entire union annotation is silently erased by mypy. The comment says "CLIBackedLLMClient is included" but it isn't really: the annotation carries no structural information about the CLI client.
The project enables warn_return_any in mypy, which means call-sites that receive this type will get untracked Any returns rather than proper type narrowing.
The idiomatic fix for a circular-import situation is TYPE_CHECKING:
from typing import TYPE_CHECKING, Union
if TYPE_CHECKING:
from app.integrations.llm_cli.runner import CLIBackedLLMClient
_LLMClientType = Union[LLMClient, OpenAILLMClient, BedrockLLMClient, "CLIBackedLLMClient"]This import is never executed at runtime (only during static analysis), so the circular import is avoided while keeping real type information.
Context Used: Code style and conventions (source)
There was a problem hiding this comment.
I am sticking to "Any" type as add proper types like "CLIBackedLLMClient" is giving "Module-level cyclic import" lint errors as tried in this commit
There was a problem hiding this comment.
If any suggestions, feel free to push commits into this branch :) glad for any help or suggestions
|
Steps to test locally Negative flow
Positive flow
|
|
|
||
| invocation = self._adapter.build(prompt=flat, model=self._model, workspace="") | ||
| merged_env = os.environ.copy() | ||
| if invocation.env: |
There was a problem hiding this comment.
Every env var from the parent process — ANTHROPIC_API_KEY, OPENAI_API_KEY, database URLs, etc. — is handed to the Codex CLI subprocess. If the CLI logs env vars on error or spawns its own children, credentials leak.
|
|
||
| def _probe(self) -> CLIProbe: | ||
| now = time.monotonic() | ||
| if self._cached_probe is not None and (now - self._probe_cached_at) < _PROBE_CACHE_TTL_SEC: |
There was a problem hiding this comment.
_cached_probe and _probe_cached_at are written without a lock. In a multi-threaded LangGraph pipeline where the same CLIBackedLLMClient instance is shared, two threads could simultaneously find the cache stale and both call detect(), with one overwriting the other mid-write. Add a threading.Lock acquired around the stale-check-and-set block.
| ) -> None: | ||
| """Wraps any LLM client with `.invoke` (API or CLI subprocess) for Pydantic JSON parsing.""" | ||
|
|
||
| def __init__(self, base: Any, model: type[BaseModel]) -> None: |
There was a problem hiding this comment.
Replacing the union with Any suppresses mypy errors on all callers, not just the new CLI-backed one. A Protocol with a single .invoke(...) method is the correct fix and would have caught this at the type level without widening.
| def _resolve_models(provider: str) -> tuple[str, str]: | ||
| """Resolve tool and reasoning model names for the active provider.""" | ||
| if provider == "codex": | ||
| raise ValueError( |
There was a problem hiding this comment.
ValueError raised inside a LangGraph node will propagate as an unhandled exception unless the graph has a catch-all error handler. A user-facing message or a typed UnsupportedProviderError that the graph runner knows to catch and format would be safer.
yashksaini-coder
left a comment
There was a problem hiding this comment.
Hey @muddlebee, really nice work on the architecture here — the Protocol-based adapter design, probe caching, and onboarding recovery loop are all solid. A few issues I caught that need fixing before this lands:
… environment handling - Reduced the default execution timeout in CodexAdapter from 900 seconds to 120 seconds for improved responsiveness. - Introduced a new function to build a safe subprocess environment, ensuring only necessary environment variables are passed to subprocesses. - Added thread safety to the probe caching mechanism in CLIBackedLLMClient to prevent race conditions. - Updated tests to verify the new timeout and environment handling behavior.
- Reformat llm_cli runner probe cache conditional - Wrap long assert in chat node test - Normalize sync_release_version error message string Made-with: Cursor
- Patch shutil.which so CI without codex on PATH passes Made-with: Cursor
yashksaini-coder
left a comment
There was a problem hiding this comment.
Good progress on addressing the review feedback — env var allowlist, thread-safe probe cache, Protocol-based typing, and the 900s → 120s timeout fix all look solid. Approving. The one thing still worth a follow-up PR is the LangChain message handling in text.py (non-dict messages stringify as garbage), but that can be a quick patch after merge. Nice work overall @muddlebee 🚀
|
@yashksaini-coder thank your for your review. anyway we are planning migration away from LangGraph, check #650 So adding the reference to this PR, to cater such cases. noted! |
* fix(llm-cli): address post-review issues in CLI integration - fix: set os.environ[env_key] after sync_env_values so in-process retry loop sees the user-provided binary path immediately (#781 follow-up) - fix: validate user-supplied binary path (isfile + executable) before persisting to .env to prevent silently storing broken paths - fix: bump npm_prefix_bin_dirs subprocess timeout from 0.3s to 2.0s to avoid false-empty cache on slow nvm/volta shim startup - refactor: use None sentinels for resolve_cli_binary defaults so shutil.which and is_runnable_binary are looked up at call time, making the function patchable in tests without explicit overrides - refactor: drop redundant shutil import and explicit which_resolver / runnable_check kwargs from CodexAdapter._resolve_binary - refactor: consolidate four separate binary_resolver import blocks into one - fix: _ver_tuple now uses re.findall to correctly parse versions with non-digit segments (e.g. "1.2a.3", "1.2.3-beta.4") - feat: log CLI stderr at DEBUG level after successful parse so warnings from the CLI are not silently discarded - test: update patch targets from codex.shutil.which to binary_resolver.shutil.which after resolution moved to shared module * docs(llm-cli): document auth probe pattern and env allowlist in AGENTS.md - Add three-state logged_in contract table (True/False/None) with wizard behaviour - Document _classify_*_auth pattern and negative-phrases-first rule - Warn about _SAFE_SUBPROCESS_ENV_PREFIXES coupling in runner.py for new CLIs - Update provider checklist with auth probe and env forwarding steps * fix(llm-cli): use is_runnable_binary for path validation on all platforms Replace os.path.isfile + os.access(X_OK) with shared is_runnable_binary from binary_resolver, which already handles Windows extension checks (.cmd, .exe, .ps1, .bat). os.access(X_OK) returns False for valid Windows executables that lack Unix-style execute bits. * fix(llm-cli): diagnose broken symlinks in binary path resolution - add diagnose_binary_path() to binary_resolver: distinguishes broken symlinks (with target path in message) from missing files and non-executable files, returning an actionable string or None - resolve_cli_binary now logs a WARNING with the diagnostic when an explicit *_BIN env var is set but unusable, instead of silently falling through to PATH lookup - wizard flow uses diagnose_binary_path so users see e.g. "'~/bin/codex' is a broken symlink (points to '/usr/local/bin/codex'). Remove or fix it." instead of the generic "not a valid executable" message - tests: missing file, broken symlink, valid exe, non-executable, and resolver warning on broken symlink * docs(llm-cli): add module docstring with path-state table to binary_resolver - document resolve_cli_binary, diagnose_binary_path, and is_runnable_binary at the top of binary_resolver.py with path state -> message reference table and platform notes (Windows extension check, lru_cache, readlink fallback) - move inline `import sys` and `import logging` to top-level in test file - guard symlink-creation tests against Windows hosts without elevation: catch OSError/NotImplementedError and skip rather than fail, preserving full coverage on Linux/macOS CI and graceful no-op on restricted runners * fix(tests): tighten chmod masks for CodeQL - set executable test fixture permissions to 0700 - set non-executable test fixture permissions to 0600 * fix(llm-cli): align windows diagnostics and onboarding test - add Windows executable extension validation in diagnose_binary_path - expand binary_resolver docstring path-state table for Windows mismatch case - harden onboarding path override test with real executable and env cleanup - assert in-process CODEX_BIN propagation without leaking env across tests * style(llm-cli): format binary_resolver for CI - reformat Windows executable diagnostic branch to satisfy ruff format-check * feat(llm-cli): add CLI registry and generic wizard summary - add registry mapping LLM_PROVIDER to adapter factory and model env key - wire _create_llm_client through registry instead of codex-only branch - derive saved-summary credential line from adapter auth_hint - drop unused prompt_delivery from LLMCLIAdapter; document prompt via build() - update AGENTS.md wiring checklist; add registry and summary tests * fix(llm-cli): lazy-load CLI registry to break import cycle - defer get_cli_provider_registration import until _create_llm_client runs - avoid loading llm_cli package __init__ (runner) during llm_client import
…loud#1097) * fix(llm-cli): address post-review issues in CLI integration - fix: set os.environ[env_key] after sync_env_values so in-process retry loop sees the user-provided binary path immediately (Tracer-Cloud#781 follow-up) - fix: validate user-supplied binary path (isfile + executable) before persisting to .env to prevent silently storing broken paths - fix: bump npm_prefix_bin_dirs subprocess timeout from 0.3s to 2.0s to avoid false-empty cache on slow nvm/volta shim startup - refactor: use None sentinels for resolve_cli_binary defaults so shutil.which and is_runnable_binary are looked up at call time, making the function patchable in tests without explicit overrides - refactor: drop redundant shutil import and explicit which_resolver / runnable_check kwargs from CodexAdapter._resolve_binary - refactor: consolidate four separate binary_resolver import blocks into one - fix: _ver_tuple now uses re.findall to correctly parse versions with non-digit segments (e.g. "1.2a.3", "1.2.3-beta.4") - feat: log CLI stderr at DEBUG level after successful parse so warnings from the CLI are not silently discarded - test: update patch targets from codex.shutil.which to binary_resolver.shutil.which after resolution moved to shared module * docs(llm-cli): document auth probe pattern and env allowlist in AGENTS.md - Add three-state logged_in contract table (True/False/None) with wizard behaviour - Document _classify_*_auth pattern and negative-phrases-first rule - Warn about _SAFE_SUBPROCESS_ENV_PREFIXES coupling in runner.py for new CLIs - Update provider checklist with auth probe and env forwarding steps * fix(llm-cli): use is_runnable_binary for path validation on all platforms Replace os.path.isfile + os.access(X_OK) with shared is_runnable_binary from binary_resolver, which already handles Windows extension checks (.cmd, .exe, .ps1, .bat). os.access(X_OK) returns False for valid Windows executables that lack Unix-style execute bits. * fix(llm-cli): diagnose broken symlinks in binary path resolution - add diagnose_binary_path() to binary_resolver: distinguishes broken symlinks (with target path in message) from missing files and non-executable files, returning an actionable string or None - resolve_cli_binary now logs a WARNING with the diagnostic when an explicit *_BIN env var is set but unusable, instead of silently falling through to PATH lookup - wizard flow uses diagnose_binary_path so users see e.g. "'~/bin/codex' is a broken symlink (points to '/usr/local/bin/codex'). Remove or fix it." instead of the generic "not a valid executable" message - tests: missing file, broken symlink, valid exe, non-executable, and resolver warning on broken symlink * docs(llm-cli): add module docstring with path-state table to binary_resolver - document resolve_cli_binary, diagnose_binary_path, and is_runnable_binary at the top of binary_resolver.py with path state -> message reference table and platform notes (Windows extension check, lru_cache, readlink fallback) - move inline `import sys` and `import logging` to top-level in test file - guard symlink-creation tests against Windows hosts without elevation: catch OSError/NotImplementedError and skip rather than fail, preserving full coverage on Linux/macOS CI and graceful no-op on restricted runners * fix(tests): tighten chmod masks for CodeQL - set executable test fixture permissions to 0700 - set non-executable test fixture permissions to 0600 * fix(llm-cli): align windows diagnostics and onboarding test - add Windows executable extension validation in diagnose_binary_path - expand binary_resolver docstring path-state table for Windows mismatch case - harden onboarding path override test with real executable and env cleanup - assert in-process CODEX_BIN propagation without leaking env across tests * style(llm-cli): format binary_resolver for CI - reformat Windows executable diagnostic branch to satisfy ruff format-check * feat(llm-cli): add CLI registry and generic wizard summary - add registry mapping LLM_PROVIDER to adapter factory and model env key - wire _create_llm_client through registry instead of codex-only branch - derive saved-summary credential line from adapter auth_hint - drop unused prompt_delivery from LLMCLIAdapter; document prompt via build() - update AGENTS.md wiring checklist; add registry and summary tests * fix(llm-cli): lazy-load CLI registry to break import cycle - defer get_cli_provider_registration import until _create_llm_client runs - avoid loading llm_cli package __init__ (runner) during llm_client import
Fixes #642
Describe the changes you have made in this PR -
Code specs/architecture as per #642 (comment)
Screenshots of the UI changes (If any) -
Success scenario - when codex binary present and logged in
Screencast.from.23-04-26.02.03.43.PM.IST.webm
Final Output 👇

Failure scenario - where codex binary path is not detected in default paths
Screencast.from.23-04-26.02.42.59.PM.IST.webm
Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
Checklist before requesting a review
Note: Please check Allow edits from maintainers if you would like us to assist in the PR.