research: SAGE skill library RL — principled reward signal for skill evolution (arXiv:2512.17102)

## Source

arXiv:2512.17102 — "Reinforcement Learning for Self-Improving Agent with Skill Library (SAGE)"

## Summary

SAGE combines GRPO reinforcement learning with a growing skill library. Sequential rollout across similar tasks lets earlier task skills accumulate for later tasks. Skill-integrated reward signal replaces heuristic scoring. Achieves 8.9% higher goal completion and 59% fewer tokens vs. baselines on AppWorld.

## Applicability to Zeph

**HIGH** — `zeph-skills` self-learning, hot-reload, auto-promote/demote.

Zeph already has a skill evolution loop (`[skills.learning]`) with heuristic scoring (`improve_threshold`, `rollback_threshold`). SAGE's principled approach:
- Sequential rollout pattern maps onto Zeph's multi-session skill evaluation
- Skill-integrated reward is a direct upgrade to `min_evaluations` + `improve_threshold` heuristics
- No weight updates required — reward updates skill metadata only (non-parametric)

## Implementation Direction

- Replace or augment skill evaluation score with a reward function based on task completion signals
- Add cross-session skill rollout tracking (skills tested across multiple sessions before promotion)
- Connect to existing `auto_promote_threshold` and `auto_demote_threshold` config fields

**Priority**: P2  
**Discovered**: CI-211 research scan (2026-03-27)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: SAGE skill library RL — principled reward signal for skill evolution (arXiv:2512.17102) #2232

Source

Summary

Applicability to Zeph

Implementation Direction

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research: SAGE skill library RL — principled reward signal for skill evolution (arXiv:2512.17102) #2232

Description

Source

Summary

Applicability to Zeph

Implementation Direction

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions