Blackbox Model Provenance via Palimpsestic Membership Inference

Suppose Alice trains an open-weight language model, and subsequently Bob uses a blackbox derivative of Alice’s model to produce text. Can Alice prove that Bob is using her model, either by querying Bob’s derivative model (query setting) or from the text alone (observational setting)?

We propose tests using the training order of Alice's model and investigate it through the lens of palimpsestic memorization in language models: models are more likely to memorize data seen later in training, so we can test whether Bob is using Alice’s model using test statistics that capture correlation between the likelihood of tokens in Bob’s text.

Specifically, this repository provides code for independence testing:

In the query setting, compute p-values for the independence test given a model and a transcript (i.e., ordered training data samples)
In the observational setting, compute p-values for the independence test given some sampled text and a transcript
- with a partitioning method
- and reshuffling method

Requirements

Install the necessary packages using:

uv sync

Usage

We provide scripts for models we experiment on, Pythia and OLMo families, and for both settings.

`scripts/query/run_query_test.py`

This script runs the query setting test w/ a reference model (see Equation 2) using the first n samples from a given transcript. This script accepts these command-line arguments:

--model: HuggingFace model ID for the model to be audited.
--ref_model: HuggingFace model ID for the reference model.
--n_samples: Number of samples from the trascript to use for the statistic.
--transcript: Name or path to a HuggingFace dataset that contains ordered training data samples. The dataset should contain an index column and a tokens column.
--metric_column_name: If specified, uses the precomputed metrics stored at the given column (e.g. losses for the model to be audited).
--ref_metric_column_name: If specified, uses the precomputed metrics stored at the given column (e.g. losses for the reference model).

Example: Recompute metrics with

python scripts/query/run_query_test.py \
    --model EleutherAI/pythia-6.9b-deduped \
    --ref_model EleutherAI/pythia-6.9b \
    --n_samples 100000 \
    --transcript hij/sequence_samples/pythia_deduped_100k

or use pre-computed metrics with

python scripts/query/run_query_test.py \
    --n_samples 100000 \
    --transcript hij/sequence_samples/pythia_deduped_100k
    --metric_column_name loss_pythia-6.9b-deduped_main
    --ref_metric_column_name loss_pythia-6.9b_main

which prints SignificanceResult(statistic=-0.07789184431337588, pvalue=2.337829800803965e-134). We provide partial transcripts and precomputed losses for some Pythia and OLMo datasets and derivative models here: https://huggingface.co/datasets/hij/sequence_samples.

`scripts/observation/partition.py`

This script runs the observational setting test that partitions a model's training transcript based on data order. We provide a general script that accepts the path to an InfiniGram index (see https://infini-gram.readthedocs.io/en/latest/) and a list of texts (saved w/ pickle). This script accepts these command-line arguments:

--texts_paths: Path to the text samples to be audited.
--infinigram_index_dir: Path to local InfiniGram index.
--n_texts: Number of texts to use for the statistic.
--k: Max. tokens for matching k-grams.
--tokenizer_name: Tokenizer to tokenize the texts and used to build the index.

Example:

python partition.py --texts_paths gens.pkl --infinigram_index_dir /path/to/index --n_texts 100000

Data

We release the transcripts used in our experiment on HuggingFace. We also provide the pre-computed sequence logprobs.

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
notebooks		notebooks
scripts		scripts
tracing		tracing
.gitignore		.gitignore
README.md		README.md
fig.png		fig.png
fig_update.png		fig_update.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blackbox Model Provenance via Palimpsestic Membership Inference

Requirements

Usage

`scripts/query/run_query_test.py`

`scripts/observation/partition.py`

Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Blackbox Model Provenance via Palimpsestic Membership Inference

Requirements

Usage

scripts/query/run_query_test.py

scripts/observation/partition.py

Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`scripts/query/run_query_test.py`

`scripts/observation/partition.py`

Packages