Process Reward Mechanism

# Add Process Reward Mechanism 

**Motivation**  

Current frameworks for training LLM-based tool-calling agents rely on **final-outcome rewards** (e.g., task success/failure or the accuracy of queries). While effective for simple tasks, this approach may face critical limitations in complex, multi-turn tool-calling scenarios:  
**Sparse Feedback**: Agents receive no guidance during multi-turn reasoning (e.g., API calls, data retrieval, tool chaining), leading to inefficient exploration.  


So we propose Introducing **process rewards** (step-level or milestone-based rewards) to explore more possibilities:  

1. **User-Customizable Reward Modes**:Users can choose for themselves whether to use the process rewards and how to define their process reward.
2. **Rate of convergence**:By providing immediate feedback on **tool selection**, **parameter validity**, and **reasoning coherence**,it may accelerate convergence of the training process.
3. **Tool-Calling Proficiency**:We want to know if process rewards can help llm learn how to use tools expertly.

**Key point**  

1.**how to get process reward**: rule_based ,reward model...

2.**Process Reward Injection Points**:Stepwise Correctness,Toolchain Efficiency...

3.**The timing of process reward**:Each time of completing the tool calling...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Process Reward Mechanism #6

Add Process Reward Mechanism

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Process Reward Mechanism #6

Description

Add Process Reward Mechanism

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions