Code for ICML 2026 "Krause Synchronization Transformers"
This repository contains the implementation for the paper "Krause Synchronization Transformers". In our work, we introduce Krause Attention, a principled attention mechanism inspired by bounded-confidence consensus dynamics. Krause Attention replaces similarity-based global aggregation with distance-based, localized, and selectively sparse interactions, promoting structured local synchronization instead of global mixing. We relate this behavior to recent theory modeling Transformer dynamics as interacting particle systems, and show how bounded-confidence interactions naturally moderate attention concentration and alleviate attention sinks. Restricting interactions to local neighborhoods also reduces runtime complexity from quadratic to linear in sequence length. Empirically, we validate Krause Attention across diverse settings, including vision (ViT on CIFAR/ImageNet), autoregressive image generation (MNIST/CIFAR-10), large language models (Llama/Qwen), and language models trained from scratch at multiple scales (100M/200M). Across these domains, Krause Attention achieves consistent performance gains while improving computational efficiency, highlighting bounded-confidence dynamics as a scalable and effective inductive bias for attention.
The animations above demonstrate the difference in two attention dynamics. We simulate 200 particle (tokens) on a unit circle and sphere. Initialized with random positions, these particles evolve according to distinct dynamics. As observed, while Krause Attention promotes stable multi-cluster synchronization, standard attention drives particles toward a single consensus, ultimately leading to global synchronization.
To get started with Krause-Synchronization-Transformers, follow these steps:
git clone [https://github.com/Jingkun-Liu/Krause-Synchronization-Transformers.git](https://github.com/Jingkun-Liu/Krause-Synchronization-Transformers.git)
cd Krause-Synchronization-Transformerspip install -r requirements.txtKrause-Synchronization-Transformers/
├── autoregressive_transformers/
│ ├── cifar10/
│ │ │── cifar10_generate.sh
│ │ │── cifar10_train.sh
│ │ │── completion_cifar10.py
│ │ │── generate_cifar10.py
│ │ └── train_cifar10.py
│ ├── mnist/
│ │ │── mnist_generate.sh
│ │ │── mnist_train.sh
│ │ │── completion_mnist.py
│ │ │── generate_mnist.py
│ │ └── train_mnist.py
├── vision_transformers/
│ ├── cifar10/
│ │ │── ViT-S/
│ │ │ │── data.py
│ │ │ │── module.py
│ │ │ └── vit_s_main.py
│ │ │── KViT-S/
│ │ │ │── data.py
│ │ │ │── module.py
│ │ │ └── kvit_s_main.py
│ │ │── Swin-T/
│ │ │ │── data.py
│ │ │ │── module.py
│ │ │ └── swin_t_main.py
│ │ │── KSwin-T/
│ │ │ │── data.py
│ │ │ │── module.py
│ │ │ └── kswin_t_main.py
│ │ │── run_kswin.sh
│ │ │── run_kvit.sh
│ │ │── run_swin.sh
│ │ └── run_vit.sh
│ ├── imagenet1k/
│ │ │── KViT-S-16/
│ │ │ │── data.py
│ │ │ │── module.py
│ │ │ └── kvit_s_16_main.py
│ │ │── ViT-S-16/
│ │ │ │── data.py
│ │ │ │── module.py
│ │ │ └── vit_s_16_main.py
│ │ │── KViT-B-16/
│ │ │ │── data.py
│ │ │ │── module.py
│ │ │ └── kvit_b_16_main.py
│ │ │── ViT-B-16/
│ │ │ │── data.py
│ │ │ │── module.py
│ │ │ └── vit_b_16_main.py
│ │ │── run_kvit.sh
│ │ └── run_vit.sh
├── lora_llms/
│ ├── llama/
│ │ │── module.py
│ │ │── util.py
│ │ │── run_llama3_8b.sh
│ │ └── llama3_8b_main.py
│ ├── qwen/
│ │ │── module.py
│ │ │── util.py
│ │ │── run_qwen1.5_7b.sh
│ │ └── qwen1.5_7b_main.py
│ └── evaluation/
│ │── benchmark.py
│ │── util.py
│ │── evaluation.sh
│ └── main.py
├── language_models_100m/
│ ├── train/
│ │ │── module.py
│ │ │── run_train_100m.sh
│ │ │── train_100m.py
│ │ └── training_utils.py
│ ├── evaluation/
│ │ │── eval.py
│ │ └── run_eval.sh
│ └── build_fwe10bt.py
└── images/ # images/gifs used in readme and our website
- Automatic Download: The
CIFAR-10andMNISTdatasets will be automatically downloaded upon running the scripts. - Manual Download Required:
- ImageNet-1K: Please download from [https://www.image-net.org/download.php].
- LLM Datasets: Relevant datasets can be found at [https://huggingface.co/datasets/SirNeural/flan_v2/tree/main].
- Language Model Training Datasets: Subsets of FineWeb-Edu are available at [https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu].
- LLMs: Llama3-8B can be found at [https://huggingface.co/meta-llama/Meta-Llama-3-8B]. Qwen1.5-7B can be found at [https://huggingface.co/Qwen/Qwen1.5-7B].
Checkpoints are available at https://drive.google.com/drive/folders/1wZ4MvuzXHPQO4IPaT2tANtnqlaNSCiZa?usp=sharing.
We provide run scripts that can be submitted simply using sbatch for every task. For example, to run the ImageNet-1K classification task for KViT-S-16, use the following command:
/Krause-Synchronization-Transformers-main/vision_transformers/imagenet1k/run_kvit.shPlease ensure you modify the script's configuration (such as batch size, learning rate, model implementation path or GPU requirements) before execution.
For instance, to run ImageNet-1K with KViT-S-16, the script should be adjusted as shown below:
# 1. Parameters Setting
SIGMAS="4.5"
DROPOUTS="0.0"
EPOCHS=300
LR=5e-4
WEIGHT_DECAY=0.05
BATCH_SIZE=512
NPROC_PER_NODE=2
FILE_SUFFIX="topk8-16_s${SIGMA}_d${DROPOUT}_w${WEIGHT_DECAY}_batchsize${BATCH_SIZE}"
LOG_FILE="log_kvits16_ImageNet_lr5e-4_${FILE_SUFFIX}.out"
SAVE_PATH="analysis_kvits16_ImageNet_lr5e-4_${FILE_SUFFIX}.png"
# 2. Torchrun Command
CUDA_VISIBLE_DEVICES=6,7 torchrun --nproc_per_node=2 --master_port=28888 kvit_b_16_main.py \
--top_k 8 \
--warmup_epochs 10 \
--sigma $SIGMA \
--dropout $DROPOUT \
--epochs $EPOCHS \
--lr $LR \
--weight_decay $WEIGHT_DECAY \
--batch_size $BATCH_SIZE \
--save_path $SAVE_PATH \
> $LOG_FILE 2>&1 &If you find this research useful, please consider citing our work!
@article{liukrause2026,
title={Krause Synchronization Transformers},
author={Jingkun Liu and Yisong Yue and Max Welling and Yue Song},
journal={ICML},
year={2026},
url={https://arxiv.org/abs/2602.11534}
}If you have any question, feel free to contact me at [email protected]




