Proposal: Safety Guardrails for smolagents
Problem
smolagents is a beautifully minimal agent framework, but its simplicity means there are limited built-in safety controls for production deployments. When agents execute code or call tools autonomously, teams need:
- Blocked pattern detection - Prevent dangerous code patterns (regex/glob-aware, not just substring)
- Resource limits - Cap token usage, tool calls, and execution time
- Semantic intent classification - Classify agent actions into threat categories before execution
- Governance event hooks - React to policy violations in real-time
- Audit trails - Know exactly what the agent did and why
What we've built (Apache-2.0)
Agent-OS includes production-grade governance:
- GovernancePolicy - YAML-based declarative policies with import/export, diff, comparison
- PatternType - Blocked patterns with substring, regex, and glob matching (pre-compiled for performance)
- Semantic intent classifier - 9 threat categories, deterministic (no LLM), fast
- Event hooks - POLICY_CHECK, POLICY_VIOLATION, TOOL_CALL_BLOCKED, CHECKPOINT_CREATED
- Policy diff - Compare policies, check if one is strictly more restrictive
Proposed integration
A safety wrapper that hooks into smolagents' tool execution:
`python
from smolagents import CodeAgent, tool
from smolagents_safety import GovernancePolicy, SafeAgent
policy = GovernancePolicy.load("policy.yaml")
agent = SafeAgent(
tools=[my_tool],
model=model,
policy=policy,
)
agent.on("policy_violation", lambda e: log_alert(e))
result = agent.run("Do the task")
All tool calls are policy-checked; dangerous patterns blocked
`
Why this fits smolagents
- Minimal footprint - Our policy engine is pure Python, no heavy deps, matches smolagents' philosophy
- Deterministic - No LLM-in-the-loop for safety; fast and predictable
- YAML-native - Policies are simple YAML files, easy to version and review
- 700+ tests backing the governance engine
Ask
Would maintainers be interested in:
- A standalone
smolagents-safety package
- A PR adding optional policy enforcement hooks to the tool execution pipeline
- A cookbook/example demonstrating the pattern
Happy to start with whichever approach fits best.
Proposal: Safety Guardrails for smolagents
Problem
smolagents is a beautifully minimal agent framework, but its simplicity means there are limited built-in safety controls for production deployments. When agents execute code or call tools autonomously, teams need:
What we've built (Apache-2.0)
Agent-OS includes production-grade governance:
Proposed integration
A safety wrapper that hooks into smolagents' tool execution:
`python
from smolagents import CodeAgent, tool
from smolagents_safety import GovernancePolicy, SafeAgent
policy = GovernancePolicy.load("policy.yaml")
agent = SafeAgent(
tools=[my_tool],
model=model,
policy=policy,
)
agent.on("policy_violation", lambda e: log_alert(e))
result = agent.run("Do the task")
All tool calls are policy-checked; dangerous patterns blocked
`
Why this fits smolagents
Ask
Would maintainers be interested in:
smolagents-safetypackageHappy to start with whichever approach fits best.