Skip to content

research(llm): utility-guided orchestration — multi-signal scoring for tool/routing decisions (arXiv:2603.19896) #2424

@bug-ops

Description

@bug-ops

Source

arXiv:2603.19896 — Utility-Guided Agent Orchestration for Efficient LLM Tool Use (2026-03-20)

Technique

An orchestration policy that scores each candidate action (respond, retrieve, tool call, verify, stop) using a utility function weighting:

  • Estimated gain — expected quality improvement from the action
  • Step cost — token/latency cost
  • Uncertainty — model confidence in the current state
  • Redundancy — overlap with already-retrieved context

The policy selects the action with the highest utility rather than letting the LLM freely choose.

Applicability to Zeph

Bandit routing layer (#2415 BaRP): The utility function formulation is a natural extension of the BaRP cost-weight dial. Instead of a single cost_weight scalar, the bandit reward signal could incorporate all four components: gain (quality), cost, uncertainty, and redundancy. This would make the LinUCB reward more semantically rich.

ToolExecutor step sequencing: The summarize_output flag and the tool overflow threshold are ad hoc; utility scoring could replace them with a principled "is this tool call worth running?" gate.

context_strategy = "adaptive": The adaptive context strategy already attempts a similar multi-signal tradeoff — this paper provides formal grounding.

Implementation sketch

  • Add utility scoring as an optional gate in zeph-tools/src/composite.rs before dispatching a tool
  • Feed utility signal as an additional feature to the LinUCB bandit reward model
  • Config: [tools] utility_scoring = true, utility_gain_weight, utility_cost_weight

Related

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityllmzeph-llm crate (Ollama, Claude)researchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions