-
Notifications
You must be signed in to change notification settings - Fork 2
research: SAGE skill library RL — principled reward signal for skill evolution (arXiv:2512.17102) #2232
Copy link
Copy link
Closed
Labels
P2High value, medium complexityHigh value, medium complexityresearchResearch-driven improvementResearch-driven improvementskillszeph-skills cratezeph-skills crate
Description
Source
arXiv:2512.17102 — "Reinforcement Learning for Self-Improving Agent with Skill Library (SAGE)"
Summary
SAGE combines GRPO reinforcement learning with a growing skill library. Sequential rollout across similar tasks lets earlier task skills accumulate for later tasks. Skill-integrated reward signal replaces heuristic scoring. Achieves 8.9% higher goal completion and 59% fewer tokens vs. baselines on AppWorld.
Applicability to Zeph
HIGH — zeph-skills self-learning, hot-reload, auto-promote/demote.
Zeph already has a skill evolution loop ([skills.learning]) with heuristic scoring (improve_threshold, rollback_threshold). SAGE's principled approach:
- Sequential rollout pattern maps onto Zeph's multi-session skill evaluation
- Skill-integrated reward is a direct upgrade to
min_evaluations+improve_thresholdheuristics - No weight updates required — reward updates skill metadata only (non-parametric)
Implementation Direction
- Replace or augment skill evaluation score with a reward function based on task completion signals
- Add cross-session skill rollout tracking (skills tested across multiple sessions before promotion)
- Connect to existing
auto_promote_thresholdandauto_demote_thresholdconfig fields
Priority: P2
Discovered: CI-211 research scan (2026-03-27)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2High value, medium complexityHigh value, medium complexityresearchResearch-driven improvementResearch-driven improvementskillszeph-skills cratezeph-skills crate