ScalingOpt Optimization at Scale

Discover, compare, and contribute to cutting-edge optimization algorithms designed for large-scale deep learning.

Explore Optimizers Give us a Star on GitHub!

Latest News

Recent updates from the ScalingOpt community

New Paper Jan 2026

How to Set the Batch Size for Large-Scale Pre-training?

New paper on batch size scheduling for large-scale pre-training, proposing a revised framework for WSD scheduler and dynamic batch size scheduling strategy.

Read Paper

Blog Update Dec 2025

Jianlin Su's Blog Collection

Added many profound articles from Jianlin Su (Scientific Spaces) covering optimization theory, Muon, and scaling laws.

Explore Blogs

Community Dec 2025

ScalingOpt Community Growth

Our optimizer database has grown to over 60 implementations! Join us in building the most comprehensive optimization resource.

Join Us

Our Team

Meet the members behind ScalingOpt. We thank them for their contributions.

Juanxi Tian

Personal Page

Yufei Gu

GitHub Profile

Team member information is continuously updated. We welcome email applications for collaboration.

Featured Optimizers

Discover the most powerful and innovative optimization algorithms powering modern AI

Apollo (2)

2024

SGD-like Memory, AdamW-level Performance

First-order

Conda

2025

Column-Normalized Adam for Training LLMs Faster

First-order

Muon

2024

Orthogonal weight updates via Newton-Schulz iteration

Second-order

SOAP

2024

Improving and Stabilizing Shampoo using Adam

Second-order

View All Optimizers

Industry-Optimized Implementations

Production-ready libraries with improved distributed support and hardware optimization

🤗

Hugging Face

Optimizers integrated into Transformers (AdamW, AdaFactor) with native support for distributed training and mixed precision.

View Documentation

Meta Research

Cutting-edge optimization algorithms like Distributed Shampoo developed by Meta for large-scale model training.

View Repository

NVIDIA TensorRT

Advanced model optimization toolkit for NVIDIA GPUs, focusing on quantization and inference acceleration.

View Toolkit

Why Choose ScalingOpt?

Everything you need to understand, implement, and scale optimization algorithms for modern AI

Extensive Optimizer Library

Explore all optimization algorithms from foundational SGD to cutting-edge Adam-mini and Muon, with detailed implementations and PyTorch code.

Research & Learning Hub

Access research papers, tutorials, and educational content covering optimization theory, implementation guides, and latest developments.

Open Source & Community

Contribute to open-source implementations, join GitHub discussions, and collaborate with researchers worldwide on optimization algorithms.

Join the Optimization Community

Connect with researchers and practitioners exploring efficient AI and optimization algorithms. Discover, learn, and contribute to the future of machine learning optimization.

Browse Optimizers Join Community