Skip to content

bug(skills): os-automation skill over-triggers for generic shell commands, suppresses native bash tool usage #2501

@bug-ops

Description

@bug-ops

Summary

When a user asks something like "Run the shell command: echo hello-world", the skill disambiguator selects os-automation with confidence ~0.80. With the skill injected into context, gpt-4o-mini consistently refuses to use the native bash tool and responds with "I cannot execute shell commands directly."

Reproduction

Config: .local/config/testing.toml (gpt-4o-mini, os-automation skill in .zeph/skills/)

Prompt:

Run the shell command: echo hello-world

Expected: bash tool invoked with echo hello-world
Actual: "I cannot execute shell commands directly. However, you can run..."

The bash tool IS present in the tool schema (confirmed via debug dump). The issue is that os-automation skill injection with high confidence causes the LLM to believe it should only perform OS-level automation tasks (desktop notifications, clipboard, screenshots) rather than run arbitrary shell commands via the native bash tool.

By contrast, coding-context prompts ("What is the current git branch? Run git status.") successfully invoke bash because a different skill (or no skill) is injected.

Root Cause Hypothesis

The os-automation skill description lists specific use cases: notifications, clipboard, screenshots, open URLs, launch apps, etc. Generic echo/shell commands don't match these use cases but get selected due to embedding proximity to "OS automation" concepts. When injected, the skill context overrides the LLM's awareness of the native bash tool.

Impact

Users asking for simple shell commands in a non-coding context (no git/cargo/file references) may get unhelpful responses despite bash being available.

Suggested Fix

  1. Tighten the os-automation skill's embedding descriptor to explicitly exclude generic shell execution
  2. Or: add a should_not_use_when / exclusion_patterns field to SKILL.md that prevents triggering on bare shell command requests
  3. Or: when a skill is injected at <0.85 confidence and the user explicitly requests a tool by name ("Run the shell command"), prefer native tool over skill context

Session

CI-344 (2026-03-31). Provider: gpt-4o-mini. Log: .local/testing/debug/ci344.log.

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitybugSomething isn't workingskillszeph-skills crate

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions