Welcome to PyLO Examples!

PyLO Examples complements PyLO by providing benchmark code for standard pre-training tasks that are of interest to the ML community. Key features include:

Implementations of practical, full-scale training runs (not just toy problems)
Benchmarks comparing learned optimizers against state-of-the-art optimizers like AdamW with cosine annealing
Reproducible evaluation protocols for fair comparison
Examples showing how to integrate PyLO optimizers into real training workflows

While learned optimization has traditionally been tested on small, artificial tasks that don't represent practical training scenarios, PyLO-Examples enables rigorous evaluation on realistic workloads that matter to practitioners.

Installation

Run the following code:

install_dir=$PWD/pylo_install
mkdir $install_dir
cd $install_dir

wget https://repo.anaconda.com/miniconda/Miniconda3-py311_24.7.1-0-Linux-x86_64.sh
bash Miniconda3-py311_24.7.1-0-Linux-x86_64.sh -b -p $PWD/miniconda3
source $PWD/miniconda3/bin/activate

# Install custom MUP for MuLO support
pip install git+https://github.com/bentherien/mup.git

# Install pylo with CUDA support
git clone https://github.com/Belilovsky-Lab/pylo.git
cd pylo
pip install .
python setup.py install --cuda 

# For logging Set WANDB environment variables
export WANDB_API_KEY=YOUR_KEY
export WANDB_PROJECT=pylo_examples
export WANDB_MODE=online

GPT Training

Here are some example commands for running single-GPU meta-training.

Training a GPT model with µLO_M

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 \
train.py --config 'config' \
--override \
optimizer_name MuLO_cuda \
compile True \
init_lr 2

Training a GPT model with VeLO

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 \
train.py --config 'config' \
--override \
optimizer_name VeLO \
compile True \
init_lr 2

Training a GPT model with AdamW and cosine annealing

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 \
train.py --config 'config' \
--override \
optimizer_name AdamW \
compile True \
init_lr 2

Note: For more detailed setup see inside ./language_model_pretraining

ViT Training

Training ViT with VeLO

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py \
--model vit_base_patch16_224 \
--data-dir path_to_imagenet \
--dataset imagenet \
--opt velo \
--lr 1.0

Training ViT with µLO_M

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py \
--model vit_base_patch16_224 \
--data-dir path_to_imagenet \
--dataset imagenet \
--opt mulo \
--lr 1.0

Training ViT with AdamW and cosine annealing

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py \
--model vit_base_patch16_224 \
--data-dir path_to_imagenet \
--dataset imagenet \
--opt adamw \
--lr 4e-3

Note: For more detailed setup see inside ./image_classification/vision_transformer

Citation

If you use PyLO in your research, please consider citing our work:

@software{pylo2025,
  author = {Paul Janson, Benjamin Therien, Quentin Anthony, Xialong Huang, Abhinav Moudgil and Eugene Belilovsky},
  title = {PyLO: Towards Accessible Learned Optimizers in Pytorch},
  year = {2025},
  url = {https://github.com/Belilovsky-Lab/pylo}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
image_classification/vision_transformer		image_classification/vision_transformer
language_model_pretraining		language_model_pretraining
simple_mlp_and_conv		simple_mlp_and_conv
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Welcome to PyLO Examples!

Installation

GPT Training

Training a GPT model with µLO_M

Training a GPT model with VeLO

Training a GPT model with AdamW and cosine annealing

ViT Training

Training ViT with VeLO

Training ViT with µLO_M

Training ViT with AdamW and cosine annealing

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Belilovsky-Lab/pylo_examples

Folders and files

Latest commit

History

Repository files navigation

Welcome to PyLO Examples!

Installation

GPT Training

Training a GPT model with µLO_M

Training a GPT model with VeLO

Training a GPT model with AdamW and cosine annealing

ViT Training

Training ViT with VeLO

Training ViT with µLO_M

Training ViT with AdamW and cosine annealing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages