research(tools): Think-Augmented Function Calling for improved parameter accuracy (TAFC)

## Summary

Add an optional `think` parameter to tool definitions, allowing the model to reason about parameter values before committing them. Zero architectural changes — works via schema augmentation and a client-side filter.

**Source**: arXiv 2601.18282 — "Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning"

## Technique

TAFC adds one optional `think: string` field to every `ToolDefinition` schema. The model generates reasoning first, then generates parameter values conditioned on that reasoning — a causal chain:

```
P(params, think | x, context) = P(think | x, context) · P(params | x, context, think)
```

For complex parameters (complexity > τ = 0.6), each parameter is further decomposed into `{think_i: reasoning, value_i: actual_value}`. A filter strips all `think` and `think_i` keys before dispatching to the actual tool — full backward compatibility, original tool never modified.

## Results (ToolBench benchmark, Win Rate vs. standard function calling)

Average TAFC win rate across 7 models: **69.6% vs 18.2%** (remainder ties).

Pass rate improvement (average across I1/I2/I3):
- GPT-4o: +2.9 pp → 59.2%
- Claude Sonnet: +3.0 pp → 60.3%  
- Qwen2.5-72B: +4.4 pp → 49.0%
- Smaller models (7B–8B): +2.0–2.5 pp; **largest relative gain at small model tier**

## Applicability to Zeph

HIGH. Low implementation complexity — no architectural changes, full API compatibility.

## Implementation sketch (Zeph-specific)

1. **`ToolDef` schema augmentation** (`zeph-tools/src/registry.rs`):
   Add `think` field to the JSON schema emitted for each tool. Tag complex tools (shell, web_scrape, memory_save) for parameter-level decomposition.

2. **Filter on dispatch** (`zeph-core/src/agent/context/` or tool executor):
   Strip `think` and any `think_{param}` fields from the parsed `ToolCall.params` before executing.

3. **Config flag** (`[agent] tafc_enabled = false` initially):
   Gate behind a config flag, default off until evaluated in live sessions.

4. **No changes to LLM backends**: `ToolDefinition` schema is what the LLM sees; augmenting it is sufficient — provider-agnostic.

Complexity threshold τ and per-parameter decomposition can be implemented in a second phase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(tools): Think-Augmented Function Calling for improved parameter accuracy (TAFC) #1861

Summary

Technique

Results (ToolBench benchmark, Win Rate vs. standard function calling)

Applicability to Zeph

Implementation sketch (Zeph-specific)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(tools): Think-Augmented Function Calling for improved parameter accuracy (TAFC) #1861

Description

Summary

Technique

Results (ToolBench benchmark, Win Rate vs. standard function calling)

Applicability to Zeph

Implementation sketch (Zeph-specific)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions