research(skills): ARISE hierarchical RL skill evolution — policy-driven selection outperforms embedding similarity across 7 benchmarks (arXiv:2603.16060)

## Paper

**arXiv:2603.16060** — *ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical RL* (March 2026)

## Key Finding

Two-level hierarchical RL system:
- **Manager**: maintains a tiered skill library by summarizing successful solution traces; selects skills via policy (not pure embedding similarity)
- **Worker**: executes selected skill

Consistently outperforms GRPO-family algorithms across 7 benchmarks with largest gains on out-of-distribution tasks (where embedding similarity fails because no skill description matches the novel task).

Key insight: **quality-improving skill libraries over time** — skills are not static documents but evolve by distilling high-quality solution traces into updated skill descriptions.

## Applicability to Zeph

Directly relevant to `zeph-skills` self-learning evolution (SAGE RL). Current SAGE: records outcomes → adjusts trust weights via Beta distribution. What ARISE adds:

1. **Trace-based skill improvement**: when a successful multi-step solution is recorded, summarize the trace into a skill description update — not just a numerical reward but semantic distillation
2. **Policy-driven selection beyond embedding**: use success history to bias selection away from skills that matched semantically but failed previously (complements the confusability fix in #2268)
3. **Out-of-distribution generalization**: skill library grows to cover novel task patterns, not just cached exact-match scenarios

## Integration

- Short term: use `skill_outcomes` traces (already recorded in SAGE) to periodically update SKILL.md descriptions via LLM summarization
- Medium term: Manager-level meta-policy that re-ranks embedding candidates using historical success rates

## Related

- #2268 (skill confusability) — policy-driven ranking as complement to hierarchical category routing
- #1889 (automated skill acquisition) — ARISE provides the quality-gating mechanism for acquired skills

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(skills): ARISE hierarchical RL skill evolution — policy-driven selection outperforms embedding similarity across 7 benchmarks (arXiv:2603.16060) #2398

Paper

Key Finding

Applicability to Zeph

Integration

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(skills): ARISE hierarchical RL skill evolution — policy-driven selection outperforms embedding similarity across 7 benchmarks (arXiv:2603.16060) #2398

Description

Paper

Key Finding

Applicability to Zeph

Integration

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions