Skip to content

Proposal: Safety Guardrails for smolagents #1989

@imran-siddique

Description

@imran-siddique

Proposal: Safety Guardrails for smolagents

Problem

smolagents is a beautifully minimal agent framework, but its simplicity means there are limited built-in safety controls for production deployments. When agents execute code or call tools autonomously, teams need:

  • Blocked pattern detection - Prevent dangerous code patterns (regex/glob-aware, not just substring)
  • Resource limits - Cap token usage, tool calls, and execution time
  • Semantic intent classification - Classify agent actions into threat categories before execution
  • Governance event hooks - React to policy violations in real-time
  • Audit trails - Know exactly what the agent did and why

What we've built (Apache-2.0)

Agent-OS includes production-grade governance:

  1. GovernancePolicy - YAML-based declarative policies with import/export, diff, comparison
  2. PatternType - Blocked patterns with substring, regex, and glob matching (pre-compiled for performance)
  3. Semantic intent classifier - 9 threat categories, deterministic (no LLM), fast
  4. Event hooks - POLICY_CHECK, POLICY_VIOLATION, TOOL_CALL_BLOCKED, CHECKPOINT_CREATED
  5. Policy diff - Compare policies, check if one is strictly more restrictive

Proposed integration

A safety wrapper that hooks into smolagents' tool execution:

`python
from smolagents import CodeAgent, tool
from smolagents_safety import GovernancePolicy, SafeAgent

policy = GovernancePolicy.load("policy.yaml")
agent = SafeAgent(
tools=[my_tool],
model=model,
policy=policy,
)
agent.on("policy_violation", lambda e: log_alert(e))
result = agent.run("Do the task")

All tool calls are policy-checked; dangerous patterns blocked

`

Why this fits smolagents

  • Minimal footprint - Our policy engine is pure Python, no heavy deps, matches smolagents' philosophy
  • Deterministic - No LLM-in-the-loop for safety; fast and predictable
  • YAML-native - Policies are simple YAML files, easy to version and review
  • 700+ tests backing the governance engine

Ask

Would maintainers be interested in:

  1. A standalone smolagents-safety package
  2. A PR adding optional policy enforcement hooks to the tool execution pipeline
  3. A cookbook/example demonstrating the pattern

Happy to start with whichever approach fits best.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions