Skip to content

research(tools): Think-Augmented Function Calling for improved parameter accuracy (TAFC) #1861

@bug-ops

Description

@bug-ops

Summary

Add an optional think parameter to tool definitions, allowing the model to reason about parameter values before committing them. Zero architectural changes — works via schema augmentation and a client-side filter.

Source: arXiv 2601.18282 — "Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning"

Technique

TAFC adds one optional think: string field to every ToolDefinition schema. The model generates reasoning first, then generates parameter values conditioned on that reasoning — a causal chain:

P(params, think | x, context) = P(think | x, context) · P(params | x, context, think)

For complex parameters (complexity > τ = 0.6), each parameter is further decomposed into {think_i: reasoning, value_i: actual_value}. A filter strips all think and think_i keys before dispatching to the actual tool — full backward compatibility, original tool never modified.

Results (ToolBench benchmark, Win Rate vs. standard function calling)

Average TAFC win rate across 7 models: 69.6% vs 18.2% (remainder ties).

Pass rate improvement (average across I1/I2/I3):

  • GPT-4o: +2.9 pp → 59.2%
  • Claude Sonnet: +3.0 pp → 60.3%
  • Qwen2.5-72B: +4.4 pp → 49.0%
  • Smaller models (7B–8B): +2.0–2.5 pp; largest relative gain at small model tier

Applicability to Zeph

HIGH. Low implementation complexity — no architectural changes, full API compatibility.

Implementation sketch (Zeph-specific)

  1. ToolDef schema augmentation (zeph-tools/src/registry.rs):
    Add think field to the JSON schema emitted for each tool. Tag complex tools (shell, web_scrape, memory_save) for parameter-level decomposition.

  2. Filter on dispatch (zeph-core/src/agent/context/ or tool executor):
    Strip think and any think_{param} fields from the parsed ToolCall.params before executing.

  3. Config flag ([agent] tafc_enabled = false initially):
    Gate behind a config flag, default off until evaluated in live sessions.

  4. No changes to LLM backends: ToolDefinition schema is what the LLM sees; augmenting it is sufficient — provider-agnostic.

Complexity threshold τ and per-parameter decomposition can be implemented in a second phase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High ROI, low complexity — do next sprintresearchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions