Veri-R1: Toward Autonomous and Faithful Claim Verification via Online Reinforcement Learning

Veri-R1 is designed to enhance large language models’ comprehensive verification capabilities—including planning, searching, reasoning, and judgment—through online reinforcement learning.

To achieve this, we propose a fine-grained reward system that ensures the faithfulness, correctness, and precision of verification trajectories, building upon the frameworks of Search-R1 and veRL.

🤗Model and Dataset

📖Paper

Installation

Verification Environment

conda create -n verir1 python=3.9
conda activate verir1
# vllm
pip3 install vllm==0.6.3 # it will install torch and other necessary packages

# verl
pip install -e .

# flash attention 2
pip3 install flash-attn --no-build-isolation
pip install wandb

We encountered a Runtime error: batch size must be positive, which is caused by a mismatch between the Verl framework and Transformers.

To fix this issue, we refined the Transformer implementation to ensure stability and prevent unexpected random break-offs.

unzip transformers.zip
cd transformers
pip install -e.

Retriever Environment

conda create -n retriever python=3.10
conda activate retriever

# install pytorch
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets pyserini

## install faiss
conda install -c pytorch -c nvidia faiss-gpu=1.8.0

## install api function
pip install uvicorn fastapi

Corpus

The corpus must follow the structure{"id": "", "title": "", "text": ""}

We introduce [SEP] to benefit sentence-level segmentaiton

Examples

{"id": "Kansas State Wildcats football<sent_id=57_59>", "title": "Kansas State Wildcats football", "text": "\"Kansas State Wildcats football\"\nIn seven seasons at Kansas State, Coach Doug Weaver compiled an 8–60–1 record. [SEP] His final two seasons went without a win. [SEP] His 1961 and 1962 teams posted a losing streak of 18 games—tied for the 20th-longest streak in college football history."}

To improve retrieval precision and avoid overly long contexts, we split each article into individual sentences and then group every k consecutive sentences as one entity. The special token [SEP] is introduced as a separator between sentences, enabling sentence-level segmentation.

Because evidence IDs are tracked at the sentence level, the retrieved sentences are merged into the context together with the available search turns. The merged context is wrapped between <information></information> tags, following this format:

<information>
[[Berkheya_sentence_0]]: Berkheya is a genus of flowering plants in the aster family, Asteraceae, and the subfamily Carduoideae, the thistles.
[[Berkheya_sentence_1]]: It is distributed in tropical Africa, especially in southern regions.
[[Berkheya_sentence_2]]: Of about 75 species, 71 can be found in South Africa.
**2 search turns left.**
</information>

Index Corpus

cd seach_engine/search
# change path to your own corpus
sh build_index.sh

Start Local Search Engine

# change the path to your own corpus and index 
sh retrieval_launch.sh

Training

Online RL

sh online_grpo.sh

Offline RL

sh offline_grpo.sh

Superviesd Finetuning(SFT)

We use LLaMA Factory for the fine-tuning process. The supervised fine-tuning (SFT) data is located at:example/train/train_sft.json

Evaluation

Local Model Evaluation

We support both batch evaluation and single model evaluation.

Batch Model Evaluation
```
sh eva_batch.sh
```
Edit the model list in the script to include your own models.
Single Model Evaluation
```
sh eva_single.sh
```

API-based Evaluation

To evaluate closed-source models, we provide an API-driven evaluation pipeline.

Run:

sh run_api_eval.sh

You can configure the target model, number of workers, and batch size.

Prepared Results

We also support direct evaluation with generated answers.

Run:

sh run_result_eval.sh

Each entry in the JSONL file should include the following fields:

{"id": ..., "claim": ..., "ground_truth": ..., "response": ..., "data_source": ...}

Example

{
"id": "117656", 
"claim": "Carlos Santana was born in the 20th century.", 
"ground_truth": {"evidence": ["Carlos_Santana_sentence_0"], "label": "SUPPORTS"}, 
"response": "\n<plan>\nI will first search ... <answer>\nLabel: SUPPORT\nEvidence: [[Carlos_Santana_sentence_0]]\n</answer>", 
"data_source": "fever"
 }

Inference

(1) Launch a local retrieval server.

conda activate retriever
sh retrieval_launch.sh

(2) Inference.

conda activate verir1
python infer.py

You can modify the claim on line 7 to something you're interested in.

Citations

@article{he2025veri,
  title={Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning},
  author={He, Qi and Qian, Cheng and Chen, Xiusi and He, Bingxiang and Ji, Heng and others},
  journal={arXiv preprint arXiv:2510.01932},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Veri-R1: Toward Autonomous and Faithful Claim Verification via Online Reinforcement Learning

Installation

Verification Environment

Retriever Environment

Corpus

Examples

Index Corpus

Start Local Search Engine

Training

Online RL

Offline RL

Superviesd Finetuning(SFT)

Evaluation

Local Model Evaluation

API-based Evaluation

Prepared Results

Inference

Citations

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
asset		asset
example		example
result		result
search_engine		search_engine
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api_validator.py		api_validator.py
eva_batch.sh		eva_batch.sh
eva_single.sh		eva_single.sh
infer.py		infer.py
offline_grpo.sh		offline_grpo.sh
online_grpo.sh		online_grpo.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
retrieval_launch.sh		retrieval_launch.sh
run_api_eval.sh		run_api_eval.sh
run_result_eval.sh		run_result_eval.sh
setup.py		setup.py
transformers.zip		transformers.zip

License

H0key-22/Veri-R1

Folders and files

Latest commit

History

Repository files navigation

Veri-R1: Toward Autonomous and Faithful Claim Verification via Online Reinforcement Learning

Installation

Verification Environment

Retriever Environment

Corpus

Examples

Index Corpus

Start Local Search Engine

Training

Online RL

Offline RL

Superviesd Finetuning(SFT)

Evaluation

Local Model Evaluation

API-based Evaluation

Prepared Results

Inference

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages