Skip to content

research(skills): ARISE hierarchical RL skill evolution — policy-driven selection outperforms embedding similarity across 7 benchmarks (arXiv:2603.16060) #2398

@bug-ops

Description

@bug-ops

Paper

arXiv:2603.16060ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical RL (March 2026)

Key Finding

Two-level hierarchical RL system:

  • Manager: maintains a tiered skill library by summarizing successful solution traces; selects skills via policy (not pure embedding similarity)
  • Worker: executes selected skill

Consistently outperforms GRPO-family algorithms across 7 benchmarks with largest gains on out-of-distribution tasks (where embedding similarity fails because no skill description matches the novel task).

Key insight: quality-improving skill libraries over time — skills are not static documents but evolve by distilling high-quality solution traces into updated skill descriptions.

Applicability to Zeph

Directly relevant to zeph-skills self-learning evolution (SAGE RL). Current SAGE: records outcomes → adjusts trust weights via Beta distribution. What ARISE adds:

  1. Trace-based skill improvement: when a successful multi-step solution is recorded, summarize the trace into a skill description update — not just a numerical reward but semantic distillation
  2. Policy-driven selection beyond embedding: use success history to bias selection away from skills that matched semantically but failed previously (complements the confusability fix in research(skills): single-agent skill library phase transition — semantic confusability drives matcher degradation (arXiv:2601.04748) #2268)
  3. Out-of-distribution generalization: skill library grows to cover novel task patterns, not just cached exact-match scenarios

Integration

  • Short term: use skill_outcomes traces (already recorded in SAGE) to periodically update SKILL.md descriptions via LLM summarization
  • Medium term: Manager-level meta-policy that re-ranks embedding candidates using historical success rates

Related

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexityresearchResearch-driven improvementskillszeph-skills crate

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions