Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation (MVA)

This project contains partial code and dataset files for MVA (Multi-Value Alignment), a parameter-based method for multi-objective alignment. It achieves multi-value models by eliminating vector interference and performing composite extrapolation, as illustrated in the figure below:

The full version of the code is coming soon...

Data

HH data is available at: HH Data

Beavertails data is available at: Beavertails Data

Environment Requirements

Python 3.10.16 torch==2.1.2 torchaudio==2.1.2 torchvision==0.16.2 tqdm==4.67.1 transformers==4.45.0 triton==2.1.0 trl==0.9.6 safetensors==0.5.3 scikit-learn==1.7.0 scipy==1.15.3 seaborn==0.13.2

How It Runs

After preparing the necessary files, follow these steps:

Step 1: Run dpo.py to obtain the helpful vector:

  python dpo.py

Step 2: Run mva.py to obtain the harmless value vector, and configure the search space for extrapolation to generate multiple model configurations (i.e., vector weights):

python mva

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
config.py		config.py
dpo.py		dpo.py
mva.png		mva.png
mva.py		mva.py
readme		readme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation (MVA)

Data

Environment Requirements

How It Runs

About

Uh oh!

Releases

Packages

Languages

HeFei-X/MVA

Folders and files

Latest commit

History

Repository files navigation

Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation (MVA)

Data

Environment Requirements

How It Runs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages