-
Notifications
You must be signed in to change notification settings - Fork 2
bug(skills): os-automation skill over-triggers for generic shell commands, suppresses native bash tool usage #2501
Description
Summary
When a user asks something like "Run the shell command: echo hello-world", the skill disambiguator selects os-automation with confidence ~0.80. With the skill injected into context, gpt-4o-mini consistently refuses to use the native bash tool and responds with "I cannot execute shell commands directly."
Reproduction
Config: .local/config/testing.toml (gpt-4o-mini, os-automation skill in .zeph/skills/)
Prompt:
Run the shell command: echo hello-world
Expected: bash tool invoked with echo hello-world
Actual: "I cannot execute shell commands directly. However, you can run..."
The bash tool IS present in the tool schema (confirmed via debug dump). The issue is that os-automation skill injection with high confidence causes the LLM to believe it should only perform OS-level automation tasks (desktop notifications, clipboard, screenshots) rather than run arbitrary shell commands via the native bash tool.
By contrast, coding-context prompts ("What is the current git branch? Run git status.") successfully invoke bash because a different skill (or no skill) is injected.
Root Cause Hypothesis
The os-automation skill description lists specific use cases: notifications, clipboard, screenshots, open URLs, launch apps, etc. Generic echo/shell commands don't match these use cases but get selected due to embedding proximity to "OS automation" concepts. When injected, the skill context overrides the LLM's awareness of the native bash tool.
Impact
Users asking for simple shell commands in a non-coding context (no git/cargo/file references) may get unhelpful responses despite bash being available.
Suggested Fix
- Tighten the
os-automationskill's embedding descriptor to explicitly exclude generic shell execution - Or: add a
should_not_use_when/exclusion_patternsfield to SKILL.md that prevents triggering on bare shell command requests - Or: when a skill is injected at <0.85 confidence and the user explicitly requests a tool by name ("Run the shell command"), prefer native tool over skill context
Session
CI-344 (2026-03-31). Provider: gpt-4o-mini. Log: .local/testing/debug/ci344.log.