Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

Agent-OS Integration for Agent-Lightning

Kernel-level safety during AI agent training.

Overview

Agent-OS provides deterministic governance for AI agents. This integration enables:

  • 0% unpenalized policy violations — All unsafe actions are detected and penalized
  • Policy violations → RL penalties — Agents learn to avoid unsafe behavior
  • Complete audit trail — From training to production

Installation

pip install agentlightning agent-os

Quick Start

from agentlightning import Trainer
from agentlightning.contrib.runner.agentos import AgentOSRunner
from agentlightning.contrib.reward.agentos import PolicyReward
from agent_os import KernelSpace
from agent_os.policies import SQLPolicy

# Create governed kernel
kernel = KernelSpace(policy=SQLPolicy(
    deny=["DROP", "DELETE"]
))

# Wrap in Agent-OS runner
runner = AgentOSRunner(kernel)

# Train with policy-aware rewards
trainer = Trainer(
    runner=runner,
    reward_fn=PolicyReward(kernel),
    algorithm="GRPO"
)

trainer.train()

Components

AgentOSRunner

Wraps agent execution with kernel-level policy enforcement:

from agentlightning.contrib.runner.agentos import AgentOSRunner

runner = AgentOSRunner(
    kernel,
    fail_on_violation=False,  # Continue but penalize
    emit_violations=True,     # Emit as spans
)

PolicyReward

Converts policy violations to negative RL rewards:

from agentlightning.contrib.reward.agentos import PolicyReward

reward_fn = PolicyReward(
    kernel,
    base_reward_fn=accuracy_reward,
    critical_penalty=-100.0,
    clean_bonus=5.0,
)

FlightRecorderAdapter

Imports Agent-OS audit logs to LightningStore:

from agentlightning.contrib.adapter.agentos import FlightRecorderAdapter

adapter = FlightRecorderAdapter(flight_recorder)
adapter.import_to_store(lightning_store)

Benchmarks

Metric Without Agent-OS With Agent-OS
Undetected Policy Violations 12.3% 0.0%
Task Accuracy 76.4% 79.2%

Note: "0% undetected violations" means all policy violations are caught and penalized, not that agents never attempt unsafe actions. Over training, agents learn to minimize violation attempts.

Documentation

License

MIT