This repo scaffolds data preparation + SFT + DPO training for a multi-turn Motivational Interviewing (MI) style coach model using Hugging Face Transformers + TRL.
data/— schema + a tiny synthetic example dataset (JSONL) to verify the pipeline.scripts/data_prep.py— normalize MI datasets (MI-TAGS / AnnoMI / MI-Dataset) into a common JSONL format.scripts/sft_train.py— Supervised fine-tuning with TRL SFTTrainer. Supports LoRA + 4-bit.scripts/dpo_train.py— Preference optimization with TRL DPOTrainer.scripts/infer_demo.py— Run inference with memory + MI persona prompt.scripts/metrics_mi.py— Simple MI-style behavioral metrics (coverage of open-question/reflect/affirm, etc.).configs/*.yaml— Example hyper-parameters.
⚠️ You must provide your own dataset paths for MI-TAGS / AnnoMI etc. The includedexample_mi_dialogs.jsonlis just for smoke tests (not for real training).
cd /mnt
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p /mnt/miniconda3
echo 'export PATH="/mnt/miniconda3/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
conda create -n gptcoach python=3.10
conda activate gptcoach
pip install -r /mnt/gptcoaching_mi_training/requirements.txt
pip install -U transformers datasets accelerate trl peft bitsandbytes torch torchvision torchaudio
# If CUDA is not available, install CPU wheels for torch.Each line is a dict:
{
"dialog_id": "string",
"turn_id": 7,
"user_utt": "string",
"coach_utt": "string",
"mi_tags": ["open_question","reflection_simple","affirm"],
"state_before": {}, # optional structured state (goal, barriers, wearable stats, etc.)
"state_after": {} # optional updated state
}You can include additional fields; unknown keys are ignored by the loader.
Set your dataset file paths and run:
python scripts/data_prep.py --annomi_csv ./data/AnnoMI-full.csv --out_jsonl data/mi_unified_from_annomi_full.jsonl
python scripts/data_prep.py --annomi_csv ./data/AnnoMI-simple.csv --out_jsonl data/mi_unified_from_annomi_simple.jsonlpython scripts/sft_train.py \
--model_name_or_path Qwen/Qwen2.5-3B-Instruct \
--train_file data/mi_unified_from_annomi_full.jsonl \
--eval_file data/mi_unified_from_annomi_simple.jsonl \
--output_dir outputs/qwen2p5-3b-mi-sft \
--num_train_epochs 3 \
--per_device_train_batch_size 20 \
--lr 2e-5 \
--eval_steps 200 \
--save_steps 200 \
--wandb --wandb_project mi-coach-sft --wandb_run_name qwen2p5_3b_sft \
--bnb_4bit --loraThis part will generate both positive and negative samples (MI-style coaching response).
python scripts/make_dpo_prefs_v2.py \
--sft_file data/mi_unified_from_annomi_full.jsonl \
--out_file data/mi_prefs.jsonl \
--seed 123 \
--max_samples 5000
Prepare a JSONL with pairs of responses (chosen vs rejected) per context/turn.
python scripts/dpo_train.py \
--model_name_or_path outputs/qwen2p5-3b-mi-sft/checkpoint-510 \
--pref_file data/mi_prefs.jsonl \
--output_dir runs/qwen2p5-3b-mi-dpo \
--num_train_epochs 3 \
--per_device_train_batch_size 1 \
--lr 2e-5 \
--logging_steps 10 \
--save_steps 200 \
--lora --bnb_4bit \
--wandb --wandb_project mi-coach-dpo --wandb_run_name qwen2p5_3b_dpoexport HF_HOME=/mnt/.cache/huggingface
export TRANSFORMERS_CACHE=/mnt/.cache/huggingface/transformers
# merged model dir from your DPO (or SFT) run
export MODEL_PATH=/mnt/gptcoaching_mi_training/runs/qwen2p5-3b-mi-dpo-merged
uvicorn scripts.app_demo:app --host 0.0.0.0 --port 8000 --reload
# POST http://localhost:8000/chat
# body:
# {
# "history": [{"user":"I want to be more active.","coach":"What matters most about being active for you?"}],
# "user_msg":"I'm too busy this week."
# }