-
Notifications
You must be signed in to change notification settings - Fork 2
research(tools): Think-Augmented Function Calling for improved parameter accuracy (TAFC) #1861
Description
Summary
Add an optional think parameter to tool definitions, allowing the model to reason about parameter values before committing them. Zero architectural changes — works via schema augmentation and a client-side filter.
Source: arXiv 2601.18282 — "Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning"
Technique
TAFC adds one optional think: string field to every ToolDefinition schema. The model generates reasoning first, then generates parameter values conditioned on that reasoning — a causal chain:
P(params, think | x, context) = P(think | x, context) · P(params | x, context, think)
For complex parameters (complexity > τ = 0.6), each parameter is further decomposed into {think_i: reasoning, value_i: actual_value}. A filter strips all think and think_i keys before dispatching to the actual tool — full backward compatibility, original tool never modified.
Results (ToolBench benchmark, Win Rate vs. standard function calling)
Average TAFC win rate across 7 models: 69.6% vs 18.2% (remainder ties).
Pass rate improvement (average across I1/I2/I3):
- GPT-4o: +2.9 pp → 59.2%
- Claude Sonnet: +3.0 pp → 60.3%
- Qwen2.5-72B: +4.4 pp → 49.0%
- Smaller models (7B–8B): +2.0–2.5 pp; largest relative gain at small model tier
Applicability to Zeph
HIGH. Low implementation complexity — no architectural changes, full API compatibility.
Implementation sketch (Zeph-specific)
-
ToolDefschema augmentation (zeph-tools/src/registry.rs):
Addthinkfield to the JSON schema emitted for each tool. Tag complex tools (shell, web_scrape, memory_save) for parameter-level decomposition. -
Filter on dispatch (
zeph-core/src/agent/context/or tool executor):
Stripthinkand anythink_{param}fields from the parsedToolCall.paramsbefore executing. -
Config flag (
[agent] tafc_enabled = falseinitially):
Gate behind a config flag, default off until evaluated in live sessions. -
No changes to LLM backends:
ToolDefinitionschema is what the LLM sees; augmenting it is sufficient — provider-agnostic.
Complexity threshold τ and per-parameter decomposition can be implemented in a second phase.