-
Notifications
You must be signed in to change notification settings - Fork 2
security: malicious skill trust tier enforcement (community skill security empirical study) #1853
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or requestsecuritySecurity-related issueSecurity-related issueskillszeph-skills cratezeph-skills crate
Description
Source
Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study — arXiv 2602.06547, February 2026
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
Findings
Empirical study of 98,380 community SKILL.md files found:
- 157 confirmed malicious skills (1.6% prevalence)
- Two archetypes: Data Thieves (exfiltrate memory/files) and Agent Hijackers (override goals/instructions)
- 26.1% of community skills contain at least one security vulnerability
- Proposed four-tier gate: Unverified → Community-Reviewed → Signed → Audited
Applicability to Zeph: HIGH (security path)
Zeph's skill loader already has a trust_level field (trusted / quarantined) but enforcement is limited to execution confirmation. The study's findings motivate:
- Capability restrictions per trust tier: untrusted skills must not access shell, memory writes, or network tools.
- Static skill content scanning: scan SKILL.md body for injection patterns before activation (similar to
ContentSanitizerbut applied to skill definitions themselves). - Provenance metadata: hash-pinning + optional GPG signature for locally trusted skills (the vault already does this pattern).
Implementation Sketch
- Extend
TrustLevelenum:Quarantined → Community → Trusted → Signed. - In
CompositeExecutor, check skill's trust tier before tool dispatch — blockshell,memory_save,web_scrapeforQuarantined/Communitylevels. - Add
SkillContentScannerusing existingSecurityPatternsregexes applied to parsed skill body at load time. - Emit WARN
skill content scan: N potential injection patterns foundfor untrusted skills. - Add
[skills.trust] scan_on_load = trueconfig flag.
Implementation Complexity
LOW-MEDIUM — trust tier enforcement is an extension of existing TrustLevel checks; content scanner reuses SecurityPatterns.
See Also
- Existing
[skills.trust]config:default_level,local_level,hash_mismatch_level ExfiltrationGuard(applies to tool output; analogous pattern for skill definitions)- research(security): declarative policy compiler for tool call authorization (Policy Compiler pattern) #1695 (declarative policy compiler for tool call authorization)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestsecuritySecurity-related issueSecurity-related issueskillszeph-skills cratezeph-skills crate