-
Notifications
You must be signed in to change notification settings - Fork 2
research(llm): utility-guided orchestration — multi-signal scoring for tool/routing decisions (arXiv:2603.19896) #2424
Description
Source
arXiv:2603.19896 — Utility-Guided Agent Orchestration for Efficient LLM Tool Use (2026-03-20)
Technique
An orchestration policy that scores each candidate action (respond, retrieve, tool call, verify, stop) using a utility function weighting:
- Estimated gain — expected quality improvement from the action
- Step cost — token/latency cost
- Uncertainty — model confidence in the current state
- Redundancy — overlap with already-retrieved context
The policy selects the action with the highest utility rather than letting the LLM freely choose.
Applicability to Zeph
Bandit routing layer (#2415 BaRP): The utility function formulation is a natural extension of the BaRP cost-weight dial. Instead of a single cost_weight scalar, the bandit reward signal could incorporate all four components: gain (quality), cost, uncertainty, and redundancy. This would make the LinUCB reward more semantically rich.
ToolExecutor step sequencing: The summarize_output flag and the tool overflow threshold are ad hoc; utility scoring could replace them with a principled "is this tool call worth running?" gate.
context_strategy = "adaptive": The adaptive context strategy already attempts a similar multi-signal tradeoff — this paper provides formal grounding.
Implementation sketch
- Add utility scoring as an optional gate in
zeph-tools/src/composite.rsbefore dispatching a tool - Feed utility signal as an additional feature to the LinUCB bandit reward model
- Config:
[tools] utility_scoring = true,utility_gain_weight,utility_cost_weight
Related
- research(llm): BaRP preference-conditioned bandit routing — runtime cost/quality trade-off dial (arXiv:2510.07429) #2415 — BaRP cost-weight bandit routing (direct extension)
- research(performance): PASTE application-level speculative tool execution — 48.5% latency reduction, 1.8x throughput (arXiv:2603.18897) #2409 — PASTE speculative tool execution (latency angle)