This repository provides the code and resources for our research paper, Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders (arXiv:2410.06981). We study whether sparse autoencoder (SAE) feature spaces across different LLMs are geometrically similar even when individual features do not line up one-to-one. We pair features via activation correlation (with optional Hungarian or entropic OT alignment in the UI tool), then measure relational similarity of decoder weight geometry (e.g. SVCCA, RSA), including how similarity varies across semantic subspaces.
- Feature matching across models: Align and compare SAE features across LLMs using activation correlations.
- Similarity analysis: Representational similarity on paired feature weights (SVCCA, RSA, baselines).
- Visualization: Figures, notebooks, and an interactive linked UMAP HTML for exploring two SAEs side by side (see below).
The script run_pipeline/pythia_feature_mapping_viz.py builds a single self-contained HTML page (Plotly) with two linked UMAP panels—one per model/SAE. The same text batch is run through both base models; features are aligned across models with a full activation correlation matrix (greedy, Hungarian, or Sinkhorn OT); the page reports CKA, RSA, and orthogonal Procrustes statistics on matched activations; selected decoder directions are embedded with UMAP; hover and selection tools highlight corresponding features across panels.
Typical workflow
- Install dependencies and SAE libraries (see Getting Started).
- From the repo root, run:
mkdir -p outputs && python run_pipeline/pythia_feature_mapping_viz.py
Defaults use Pythia 70M vs 160M with matching Eleuther SAEs and TinyStories snippets. - Open
outputs/pythia_sae_feature_map.htmlin a browser (Plotly loads from a CDN). - Hover points to see feature id, top-token hints, and cross-model correlation where available; use the toolbar to zoom, box-select, or paint a region to build a set of linked pairs and optional mesh overlays.
Full flags, JSON re-render (--from-json), semantic category JSON, and troubleshooting: run_pipeline/README_pythia_feature_mapping_viz.md.
Quick variant with fewer points:
python run_pipeline/pythia_feature_mapping_viz.py --features-per-side 800 --dataset-split "train[:256]" --output-html outputs/pythia_sae_feature_map.html
run_pipeline/: Main experiment scripts andpythia_feature_mapping_viz.py(interactive UI generator)main_results_nbs/: Jupyter notebooks for experiments and analysesmodal_scripts/: Modal cloud run helpersdocs/images/: README media (UI preview image and short preview video)README.md: Project documentation
- Python 3.8 or higher
-
Open a Command Prompt with Miniconda on
PATH(same pattern as the Start Menu shortcut):%windir%\System32\cmd.exe "/K" C:\Users\mikel\miniconda3\Scripts\activate.bat C:\Users\mikel\miniconda3If your install path differs, adjust both paths to your
miniconda3folder. -
Go to the repo root and run the helper script. It creates the
feature-space-mappingenvironment (Python 3.11) if needed, then installsrequirements.txt,sae_lens, andsparsify:setup_conda_env.cmdAlternatively, create the base env yourself and install pip packages manually:
conda env create -f environment.ymlconda activate feature-space-mappingpython -m pip install -r requirements.txtpython -m pip install sae_lens git+https://github.com/wlg1/sparsify.git -
Each new CMD session: activate Miniconda as in step 1, then:
conda activate feature-space-mapping
The script finishes by installing CPU PyTorch from the pytorch conda channel, which tends to be more reliable on Windows than the default pip wheel (fewer missing-DLL issues). It also sets KMP_DUPLICATE_LIB_OK=TRUE on the env to avoid an OpenMP duplicate-libiomp5md.dll error when importing torch with MKL-linked packages. If you use a GPU, run the optional CUDA pip install ... cu124 line from the script footer after that, so it replaces the CPU build.
Clone the repository and install dependencies:
pip install -r requirements.txt
Install these SAE libraries:
pip install sae_lens
pip install git+https://github.com/wlg1/sparsify.git
If running Gemma models, login to HF using:
huggingface-cli login
Recommended Hardware Requirements: Allocate ~100 GB of disk space when renting an A100 on Vast.ai (To work with large models like gemma-2-9b)
In run_pipeline/, run:
chmod +x run_pythia.sh
./run_pythia.sh --batch_size 300 --max_length 300 --num_rand_runs 1 --oneToOne_bool --model_A_endLayer 6 --model_B_endLayer 12 --layer_step_size 2
(TBD- update .sh to do this) to eval separate model pairs in one run:
If you use this code or our findings in your research, please cite our paper:
@misc{lan2025sparseautoencodersrevealuniversal,
title={Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders},
author={Michael Lan and Philip Torr and Austin Meek and Ashkan Khakzar and David Krueger and Fazl Barez},
year={2025},
eprint={2410.06981},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.06981},
}
This repo is currently being restructured
