-
Notifications
You must be signed in to change notification settings - Fork 2
research(orchestration): AgentRM OS-inspired scheduler — MLFQ lanes, zombie reaping, context hibernation, 86% P95 latency reduction (arXiv:2603.13110) #2334
Copy link
Copy link
Closed
Labels
P2High value, medium complexityHigh value, medium complexityresearchResearch-driven improvementResearch-driven improvement
Description
Source
arXiv:2603.13110 — AgentRM: An OS-Inspired Resource Manager for LLM Agent Systems (March 2026)
Technique
Applies classic OS scheduling theory to LLM agent systems in two components:
1. MLFQ Task Scheduler
- Multi-level feedback queue with priority lanes (interactive / batch / background)
- Rate-limit-aware admission control: tasks entering at high rate get demoted to avoid API throttle cascades
- Zombie reaping: detects and reaps stalled sub-agents to free scheduler slots
- Reduces P95 latency by 86% in multi-agent benchmarks
2. Three-Tier Context Lifecycle Manager (CLM)
- Tier 0 (Hot): active working memory — full fidelity, fast access
- Tier 1 (Warm): recent context — adaptive compaction applied
- Tier 2 (Cold): hibernated context — serialized to storage, recalled on demand
- Achieves 100% key-information retention vs. 65% for sliding-window baselines
Applicability to Zeph
High. Two independent components, both directly mappable:
DagScheduler+zeph-schedulerwould benefit from MLFQ priority lanes:/plantasks as interactive, background compaction as batch, sweep timers as background. Zombie reaping maps totask_timeout_secs+ explicit task cancellation.- Zeph's context builder already has soft/hard compaction tiers; adding a hibernation tier (Tier 2) for very old sessions aligns with the CLM model. The
deferred_apply_thresholdmechanism is the admission gate for Tier 1→0 promotion.
Differs from #2320 (AdaptOrch) which focuses on topology selection (parallel/sequential/hierarchical DAG shapes); AgentRM focuses on resource management within a fixed topology.
Implementation sketch
- Add priority field to
TaskKindorPlanTaskinzeph-orchestration - Implement admission control in
DagScheduler::tick()based on current rate vs. configured limit - Add zombie detection: tasks exceeding
task_timeout_secsget canceled and logged - Context hibernation: add
hibernatedtier belowcoldin memory tier config with SQLite serialization gate
P2 — research, evaluate MLFQ fit with current DAG scheduler before implementation.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2High value, medium complexityHigh value, medium complexityresearchResearch-driven improvementResearch-driven improvement