Steering Large Language Models using Conceptors 🛞

This repository introduces a novel technique called "Conceptor-Based Activation Engineering," which we use to steer the behavior of a GPT-J-6B model. This method is inspired by recent advancements in activation engineering and steering techniques.

Authors: Joris Postmus, Steven Abreu

Project Overview

In this project, we explore the application of Conceptors to manipulate and control the behavior of large language models, specifically GPT-J-6B. Our work builds upon recent discoveries in the field of activation engineering, and we present a series of experiments demonstrating the efficacy of this approach.

Description of the data saved in this repository

./Mean_Activation_Vectors: contains mean activation vectors (used for mean-centering).
./data/functionvectors/*.json contains tasks from the function vector paper.

Requirements

To run the experiments and reproduce the results, please follow these steps:

1. Set up Python environment

First, ensure you have the correct Python version installed on your system. This project has been tested and works with:

Python 3.10.10 (recommended)
Python 3.12

Note: The installation may not work properly with the latest Python versions. We recommend using Python 3.10.x for the best compatibility.

Then, create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Run the experiments

You can run the experiments using the Jupyter notebooks provided in the src/colabs folder.

In order to batch-run all experiments and store the logs, you can use the src/functionvectors/run_experiments.py file. Please read the options available through:

cd src/functionvectors
python run_experiment.py -h

You can then run an experiment, e.g. with:

cd src/functionvectors
python run_experiment.py \
    --experiment-type=conceptor <additional-arguments>

You can optionally also set the cache directory for HuggingFace using:

export HF_HOME=/path/to/your/cache/directory

If you are working on a SLURM-managed cluster, you can use the *.sh scripts in the root directory to reproduce the experiments from our paper. If you are not working on a SLURM cluster, simply replace the sbatch command with source.

Additional Notes

Ensure that your environment is correctly set up with the recommended Python version to avoid installation issues.
If you encounter any errors or have questions, please refer to the project documentation or contact the authors.

Memory usage for different models

Function vector experiments:

GPT-J-6B: 14GB of VRAM (RTX A6000).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Mean_Activation_Vectors/Mean_train		Mean_Activation_Vectors/Mean_train
analysis		analysis
data/functionvectors		data/functionvectors
results		results
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_experiment.sh		run_experiment.sh
run_gptj_addition.sh		run_gptj_addition.sh
run_gptj_addition_bool.sh		run_gptj_addition_bool.sh
run_gptj_baselines.sh		run_gptj_baselines.sh
run_gptj_conceptors.sh		run_gptj_conceptors.sh
run_gptj_conceptors_bool.sh		run_gptj_conceptors_bool.sh
run_gptj_meanaddition.sh		run_gptj_meanaddition.sh
run_gptj_meanconceptors.sh		run_gptj_meanconceptors.sh
run_gptneox_addition.sh		run_gptneox_addition.sh
run_gptneox_baselines.sh		run_gptneox_baselines.sh
run_gptneox_conceptors.sh		run_gptneox_conceptors.sh
run_gptneox_meanaddition.sh		run_gptneox_meanaddition.sh
run_gptneox_meanconceptors.sh		run_gptneox_meanconceptors.sh
run_llama70b.sh		run_llama70b.sh
run_merged_experiment.sh		run_merged_experiment.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Steering Large Language Models using Conceptors 🛞

Project Overview

Description of the data saved in this repository

Requirements

1. Set up Python environment

3. Run the experiments

Additional Notes

Memory usage for different models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

jorispos/ConceptorSteering

Folders and files

Latest commit

History

Repository files navigation

Steering Large Language Models using Conceptors 🛞

Project Overview

Description of the data saved in this repository

Requirements

1. Set up Python environment

3. Run the experiments

Additional Notes

Memory usage for different models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages