D^2PO

This repository contains the code in both PyTorch for our paper.

Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective

Quick Start

The config template is in examples/exp. When gamma is set smaller than 1.0, the experiment runs in D^2PO mode.

llamafactory-cli train examples/exp/llama3_full_d2po.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 2,819 Commits
.github		.github
assets		assets
data		data
docker		docker
evaluation		evaluation
examples		examples
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.local		.env.local
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py