Skip to content

danielkty/tars

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TARS: Training Adaptive Reasoners for Safety

📄 Paper | 🤗 TARS Model | 🤗 Lightweight SFT Model | 📝 Blog Post |

Training repository for "Reasoning as an Adaptive Defense for Safety"


🎯 Overview

This repository contains the training code and datasets for TARS (Training Adaptive Reasoners for Safety), an online RL training approach that uses reasoning as an adaptive defense for LLM safety. The training code uses a modified version of verl, which is adapted from a previous version of rLLM.


Table of Contents


Getting Started

This repository includes:

  • Datasets: train_lambda_0.1/0.3/0.5/0.7/0.9.parquet
  • Training Script: Online RL safety training for reasoning using GRPO

First, install the Python packages.

conda env create --file environment.yml

Second, install the modified version of verl and additional packages.

pip install -e ./verl
pip install git+https://github.com/dsbowen/strong_reject.git@main
pip install flash-attn

Training

Train through online RL starting from the base lightweight SFT model used for TARS.

bash scripts/train/run_train.sh 

Citation

If you find this work useful, please cite our paper:

@article{kim2025reasoning,
  title={Reasoning as an Adaptive Defense for Safety},
  author={Kim, Taeyoun and Tajwar, Fahim and Raghunathan, Aditi and Kumar, Aviral},
  journal={arXiv preprint arXiv:2507.00971},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published