TARS: Training Adaptive Reasoners for Safety

📄 Paper | 🤗 TARS Model | 🤗 Lightweight SFT Model | 📝 Blog Post |

Training repository for "Reasoning as an Adaptive Defense for Safety"

🎯 Overview

This repository contains the training code and datasets for TARS (Training Adaptive Reasoners for Safety), an online RL training approach that uses reasoning as an adaptive defense for LLM safety. The training code uses a modified version of verl, which is adapted from a previous version of rLLM.

Getting Started

This repository includes:

Datasets: train_lambda_0.1/0.3/0.5/0.7/0.9.parquet
Training Script: Online RL safety training for reasoning using GRPO

First, install the Python packages.

conda env create --file environment.yml

Second, install the modified version of verl and additional packages.

pip install -e ./verl
pip install git+https://github.com/dsbowen/strong_reject.git@main
pip install flash-attn

Training

Train through online RL starting from the base lightweight SFT model used for TARS.

bash scripts/train/run_train.sh

Citation

If you find this work useful, please cite our paper:

@article{kim2025reasoning,
  title={Reasoning as an Adaptive Defense for Safety},
  author={Kim, Taeyoun and Tajwar, Fahim and Raghunathan, Aditi and Kumar, Aviral},
  journal={arXiv preprint arXiv:2507.00971},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
scripts		scripts
verl		verl
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TARS: Training Adaptive Reasoners for Safety

🎯 Overview

Table of Contents

Getting Started

Training

Citation

About

Uh oh!

Releases

Packages

Languages

danielkty/tars

Folders and files

Latest commit

History

Repository files navigation

TARS: Training Adaptive Reasoners for Safety

🎯 Overview

Table of Contents

Getting Started

Training

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages