Skip to content

research: SAGE skill library RL — principled reward signal for skill evolution (arXiv:2512.17102) #2232

@bug-ops

Description

@bug-ops

Source

arXiv:2512.17102 — "Reinforcement Learning for Self-Improving Agent with Skill Library (SAGE)"

Summary

SAGE combines GRPO reinforcement learning with a growing skill library. Sequential rollout across similar tasks lets earlier task skills accumulate for later tasks. Skill-integrated reward signal replaces heuristic scoring. Achieves 8.9% higher goal completion and 59% fewer tokens vs. baselines on AppWorld.

Applicability to Zeph

HIGHzeph-skills self-learning, hot-reload, auto-promote/demote.

Zeph already has a skill evolution loop ([skills.learning]) with heuristic scoring (improve_threshold, rollback_threshold). SAGE's principled approach:

  • Sequential rollout pattern maps onto Zeph's multi-session skill evaluation
  • Skill-integrated reward is a direct upgrade to min_evaluations + improve_threshold heuristics
  • No weight updates required — reward updates skill metadata only (non-parametric)

Implementation Direction

  • Replace or augment skill evaluation score with a reward function based on task completion signals
  • Add cross-session skill rollout tracking (skills tested across multiple sessions before promotion)
  • Connect to existing auto_promote_threshold and auto_demote_threshold config fields

Priority: P2
Discovered: CI-211 research scan (2026-03-27)

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityresearchResearch-driven improvementskillszeph-skills crate

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions