Proposal: Safety Guardrails for smolagents

## Proposal: Safety Guardrails for smolagents

### Problem

smolagents is a beautifully minimal agent framework, but its simplicity means there are limited built-in safety controls for production deployments. When agents execute code or call tools autonomously, teams need:

- **Blocked pattern detection** - Prevent dangerous code patterns (regex/glob-aware, not just substring)
- **Resource limits** - Cap token usage, tool calls, and execution time
- **Semantic intent classification** - Classify agent actions into threat categories before execution
- **Governance event hooks** - React to policy violations in real-time
- **Audit trails** - Know exactly what the agent did and why

### What we've built (Apache-2.0)

[Agent-OS](https://github.com/imran-siddique/agent-os) includes production-grade governance:

1. **GovernancePolicy** - YAML-based declarative policies with import/export, diff, comparison
2. **PatternType** - Blocked patterns with substring, regex, and glob matching (pre-compiled for performance)
3. **Semantic intent classifier** - 9 threat categories, deterministic (no LLM), fast
4. **Event hooks** - POLICY_CHECK, POLICY_VIOLATION, TOOL_CALL_BLOCKED, CHECKPOINT_CREATED
5. **Policy diff** - Compare policies, check if one is strictly more restrictive

### Proposed integration

A safety wrapper that hooks into smolagents' tool execution:

`python
from smolagents import CodeAgent, tool
from smolagents_safety import GovernancePolicy, SafeAgent

policy = GovernancePolicy.load("policy.yaml")
agent = SafeAgent(
    tools=[my_tool],
    model=model,
    policy=policy,
)
agent.on("policy_violation", lambda e: log_alert(e))
result = agent.run("Do the task")
# All tool calls are policy-checked; dangerous patterns blocked
`

### Why this fits smolagents

- **Minimal footprint** - Our policy engine is pure Python, no heavy deps, matches smolagents' philosophy
- **Deterministic** - No LLM-in-the-loop for safety; fast and predictable
- **YAML-native** - Policies are simple YAML files, easy to version and review
- **700+ tests** backing the governance engine

### Ask

Would maintainers be interested in:
1. A standalone `smolagents-safety` package
2. A PR adding optional policy enforcement hooks to the tool execution pipeline
3. A cookbook/example demonstrating the pattern

Happy to start with whichever approach fits best.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Safety Guardrails for smolagents #1989