This is an official implementation for our paper, Proving membership in LLM pretraining data via data watermarks , ACL Findings 2024
@inproceedings{wei2024provingmembershipllmpretraining,
title = "Proving membership in {LLM} pretraining data via data watermarks",
author = "\text{J. Wei*} and
\textbf{R. Wang*} and
Jia, Robin",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.788",
doi = "10.18653/v1/2024.findings-acl.788",
pages = "13306--13320",
}Note that the gpt-neox folder in this repository is a near-identical clone of the GPT-NeoX repository by EleutherAI found here https://github.com/EleutherAI/gpt-neox.
- Ensure that Conda is installed
- Run the following commands, which creates a new Conda environment and installs Pytorch and CUDA dependencies"
conda create -n hubble python=3.8
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
conda install cudatoolkit=11.7 -c conda-forge # this is probably not necessary anymore
conda install -c conda-forge cudatoolkit-devcdinto thegpt-neoxdirectory and run the following commands, which installs the GPT-NeoX dependency:
pip install -r requirements/requirements.txt
pip install -r requirements/requirements-wandb.txt # optional, if logging using WandB
# optional: if you want to use FlashAttention
# for the next line, ssh into any machine with cuda version > 11.6
export CUDA_HOME=<path_to_your_conda>/envs/hubble # replace this with your conda environment path
pip install -r ./requirements/requirements-flashattention.txt
pip install tritonTo watermark pre-training data, you first need a base pre-training dataset (e.g. a shard from pile) stored in jsonl format. Ensure that this file is stored in the data directory.
cd to the root directory. The following command will insert a watermark of 10 characters into 32 documents inside a base pre-training dataset data/pile1e8_orig.jsonl.
bash perturb_data.shTo run the unicode perturbations, set exp_name to "unicode_properties"
cd to the root directory. The following command will tokenize the watermarked data.
bash tokenize_data.shcd to the root directory. The following command will pre-train the model using the watermarked data.
bash pretrain.shBy default, the code will run a 70M model in Pythia configs with 1 training step for demo purposes. The following is a list of important changes to make:
In the model configs:
- global_num_gpus: the number of GPUs to use
- train_batch_size: the batch size in total
- train_micro_batch_size_per_gpu: the batch size per GPU
- gradient_accumulation_steps: the number of steps to accumulate gradients over
- train_iters: the number of steps to train for
- seq_len: the sequence length of the model
In the setup configs:
- data_path: the path to the data (tokenized already)
- save: the path to save the model
- include: allows you to specify gpus to use by setting to the string "localhost:0,1,2,3"
- master_port: the port to use for training. Different runs on the same machine should use different ports
cd to the root directory. The following command will convert the model to the Hugging Face format.
bash convert_neox_to_hf.shcd to the root directory. The following command will run inference using the HF converted model.
bash score_model.shFor Unicode experiments, change score_model.py to call calculate_scores_unicode_properties instead of calculate_scores_unstealthy. Note that in order to run hypothesis testing for unicode, you must prepare the watermarked data using unicode (by setting exp_name to "unicode_properties" when running bash perturb_data.sh)