fix: v0.34.3 — tool-aware stall threshold, edit failure recovery, diagnostics#106
fix: v0.34.3 — tool-aware stall threshold, edit failure recovery, diagnostics#106Nathan Schram (nathanschram) merged 6 commits intomasterfrom
Conversation
…gnostics (#103, #104, #105, #89) - Tool-aware stall threshold: 10 min when a tool is actively running (#105) - Progress edit failure fallback: log + send when initial edit fails (#103) - Approval keyboard: wait=True for keyboard transitions, failure logging (#104) - /usage 429: downgrade from error to warning level (#89) - Session cleanup structured reporting, spawn args logging, no-events warning - 300+ lines of new tests, integration testing playbook Co-Authored-By: Claude Opus 4.6 <[email protected]>
…, and guides Integration tests are now automated via Telegram MCP tools (send_message, get_history, list_inline_buttons, press_inline_button, reply_to_message). Updated all relevant docs to reflect this workflow. Co-Authored-By: Claude Opus 4.6 <[email protected]>
- T1 (voice): send_voice MCP tool with OGG/Opus file - T5 (media groups): send_file MCP tool for rapid file sends - B4 (SIGTERM): Bash tool kill -TERM - B5 (log inspection): Bash tool journalctl + FD/zombie checks - Add post-test log inspection and GitHub issue creation instructions - Add structured test result tracking (pass/fail/error with reason) - Distinguish Untether bugs from upstream engine API errors Co-Authored-By: Claude Opus 4.6 <[email protected]>
All tiers are fully automatable — no manual steps remain. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Watch for phantom responses (output from empty input), session cross-contamination, wrong engine, disproportionate cost. Discovered via AMP producing substantive DNS content from empty voice transcription. Co-Authored-By: Claude Opus 4.6 <[email protected]>
- CLAUDE.md: update test counts (1472→1548), add test_build_args.py and test_loop_coverage.py, add send_voice/send_file to MCP list - AGENTS.md: Gemini CLI and Amp are shipped, not "coming soon" - CONTRIBUTING.md: add gemini/amp to architecture diagram and prereqs - hooks.json: add MCP automation note to version bump checklist - testing-conventions.md: add new test files to key test table - integration-testing.md: add T5 send_file limitation note Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5f9fe13259
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| resume=resume.value if resume else None, | ||
| prompt=prompt[:100] + "…" if len(prompt) > 100 else prompt, | ||
| prompt_len=len(prompt), | ||
| args=cmd[1:], |
There was a problem hiding this comment.
Remove full CLI args from runner start logs
The new runner.start log field args=cmd[1:] records the complete subprocess argument list, which includes raw user prompt text for engines that send prompts via CLI args (for example Claude’s -p <prompt> flow), so normal info-level logs now persist potentially sensitive chat/code content that was previously only logged in truncated form. This is a production privacy regression because every run can leak full prompt data into log sinks.
Useful? React with 👍 / 👎.
| if not action_state.completed: | ||
| return True | ||
| break # only check the most recent |
There was a problem hiding this comment.
Check all running actions before applying tool stall threshold
_has_running_tool() is documented as checking whether any action is still running, but the unconditional break means it only inspects the newest inserted action. When actions overlap (e.g. Claude can emit multiple tool starts before all tool results arrive), a newer action can complete while an older tool is still running; this function then returns False, causing stall logic to drop back to the shorter normal threshold and emit incorrect stuck warnings (and eventually auto-cancel) during legitimate long-running tool work.
Useful? React with 👍 / 👎.
Summary
wait=Truefor keyboard transitions, info/warning-level logging for diagnostics #104/usage429 handling: downgrade rate limit errors from error to warning level #89Fixes #89, #103, #104, #105
Test plan
uv run pytest— 1547 passed, 80.90% coverageuv run ruff check src/ tests/— all checks passeduv run ruff format --check src/ tests/— 234 files formatteduv lock --check— lockfile in sync@untether_dev_bot(post-merge)🤖 Generated with Claude Code