security: malicious skill trust tier enforcement (community skill security empirical study)

## Source

[Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study](https://arxiv.org/abs/2602.06547) — arXiv 2602.06547, February 2026
[Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward](https://arxiv.org/abs/2602.12430)

## Findings

Empirical study of 98,380 community SKILL.md files found:
- 157 confirmed malicious skills (1.6% prevalence)
- Two archetypes: **Data Thieves** (exfiltrate memory/files) and **Agent Hijackers** (override goals/instructions)
- 26.1% of community skills contain at least one security vulnerability
- Proposed four-tier gate: Unverified → Community-Reviewed → Signed → Audited

## Applicability to Zeph: HIGH (security path)

Zeph's skill loader already has a `trust_level` field (`trusted` / `quarantined`) but enforcement is limited to execution confirmation. The study's findings motivate:

1. **Capability restrictions per trust tier**: untrusted skills must not access shell, memory writes, or network tools.
2. **Static skill content scanning**: scan SKILL.md body for injection patterns before activation (similar to `ContentSanitizer` but applied to skill definitions themselves).
3. **Provenance metadata**: hash-pinning + optional GPG signature for locally trusted skills (the vault already does this pattern).

## Implementation Sketch

1. Extend `TrustLevel` enum: `Quarantined → Community → Trusted → Signed`.
2. In `CompositeExecutor`, check skill's trust tier before tool dispatch — block `shell`, `memory_save`, `web_scrape` for `Quarantined`/`Community` levels.
3. Add `SkillContentScanner` using existing `SecurityPatterns` regexes applied to parsed skill body at load time.
4. Emit WARN `skill content scan: N potential injection patterns found` for untrusted skills.
5. Add `[skills.trust] scan_on_load = true` config flag.

## Implementation Complexity

LOW-MEDIUM — trust tier enforcement is an extension of existing `TrustLevel` checks; content scanner reuses `SecurityPatterns`.

## See Also

- Existing `[skills.trust]` config: `default_level`, `local_level`, `hash_mismatch_level`
- `ExfiltrationGuard` (applies to tool output; analogous pattern for skill definitions)
- #1695 (declarative policy compiler for tool call authorization)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: malicious skill trust tier enforcement (community skill security empirical study) #1853

Source

Findings

Applicability to Zeph: HIGH (security path)

Implementation Sketch

Implementation Complexity

See Also

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

security: malicious skill trust tier enforcement (community skill security empirical study) #1853

Description

Source

Findings

Applicability to Zeph: HIGH (security path)

Implementation Sketch

Implementation Complexity

See Also

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions