Skip to content

security: malicious skill trust tier enforcement (community skill security empirical study) #1853

@bug-ops

Description

@bug-ops

Source

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study — arXiv 2602.06547, February 2026
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

Findings

Empirical study of 98,380 community SKILL.md files found:

  • 157 confirmed malicious skills (1.6% prevalence)
  • Two archetypes: Data Thieves (exfiltrate memory/files) and Agent Hijackers (override goals/instructions)
  • 26.1% of community skills contain at least one security vulnerability
  • Proposed four-tier gate: Unverified → Community-Reviewed → Signed → Audited

Applicability to Zeph: HIGH (security path)

Zeph's skill loader already has a trust_level field (trusted / quarantined) but enforcement is limited to execution confirmation. The study's findings motivate:

  1. Capability restrictions per trust tier: untrusted skills must not access shell, memory writes, or network tools.
  2. Static skill content scanning: scan SKILL.md body for injection patterns before activation (similar to ContentSanitizer but applied to skill definitions themselves).
  3. Provenance metadata: hash-pinning + optional GPG signature for locally trusted skills (the vault already does this pattern).

Implementation Sketch

  1. Extend TrustLevel enum: Quarantined → Community → Trusted → Signed.
  2. In CompositeExecutor, check skill's trust tier before tool dispatch — block shell, memory_save, web_scrape for Quarantined/Community levels.
  3. Add SkillContentScanner using existing SecurityPatterns regexes applied to parsed skill body at load time.
  4. Emit WARN skill content scan: N potential injection patterns found for untrusted skills.
  5. Add [skills.trust] scan_on_load = true config flag.

Implementation Complexity

LOW-MEDIUM — trust tier enforcement is an extension of existing TrustLevel checks; content scanner reuses SecurityPatterns.

See Also

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsecuritySecurity-related issueskillszeph-skills crate

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions