A curated list of awesome papers on acceleration techniques for Generative AI.
- Cache Methods
- Token Merging & Sparse
- Optimal Timestep
- Distillation
- Image Generation
- Video Generation
- Others
- Timestep Embedding Tells: Itβs Time to Cache for Video Diffusion Model, CVPR 2025 [Paper] [Code]
- Adaptive Caching for Faster Video Generation with Diffusion Transformers, arXiv 2024 [Paper] [Code]
- PAB: Real-time video generation with pyramid attention broadcast, ICLR 2025 [Paper] [Code]
- SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers, arXiv 2024 [Paper] [Code]
- Ξ-DiT: Training-Free Acceleration for Diffusion Transformers, arXiv 2024 [Paper]
- T-Gate: Faster Diffusion via Temporal Attention Decomposition, TMLR 2025 [Paper] [Code]
- Unveiling Redundancy in Diffusion Transformers (DiTs): A Systematic Study, arXiv 2024 [Paper] [Code]
- Accelerating Diffusion Transformer via Error-Optimized Cache, arXiv 2025 [Paper]
- ToCa: Token-wise Feature Caching for Diffusion Transformers, ICLR 2025 [Paper] [Code]
- DuCa: Dual Feature Caching for Diffusion Transformers, arXiv 2024 [Paper] [Code]
- UniCP: Unified Caching and Pruning Framework for Efficient Video Generation, arXiv 2025 [Paper]
- FasterCache: Training-Free Video Diffusion Acceleration with High Quality, arXiv 2024 [Paper] [Code]
- Region-Adaptive Sampling for Diffusion Transformers, arXiv 2025 [Paper] [Code]
- CacheQuant: Comprehensively Accelerated Diffusion Models, CVPR 2025 [Paper] [Code]
- Cache Me if You Can: Accelerating Diffusion Models through Block Caching, CVPR 2024 [Paper] [Code]
- BlockDance: Reuse Structurally Similar Spatio-Temporal Features, arXiv 2025 [Paper]
- Learning-to-Cache: Layer Caching for Diffusion Transformers, NeurIPS 2024 [Paper] [Code]
- TaylorSeers: Forecasting for Accelerating Diffusion Models, arXiv 2025 [Paper] [Code]
- Profiling-Based Feature Reuse for Video Diffusion Models, arXiv 2025 [Paper] [Code]
- SRDiffusion: Sketching-Rendering Cooperation, arXiv 2025 [Paper]
- AB-Cache: Adams-Bashforth Cached Feature Reuse, arXiv 2025 [Paper]
- EEdit: Rethinking Spatial and Temporal Redundancy for Efficient Image Editing, ICCV 2025 [Paper] [Code]
- CacheDiT: Training-free Cache Acceleration Toolbox for Diffusion Transformers, arXiv 2025 [Code]
- Confidence-Gated Taylor Forecasting for Acceleration, arXiv 2025 [Paper]
- HiCache: Hermite Polynomial-based Feature Caching, arXiv 2025 [Paper]
- ERTACache: Error Rectification and Timesteps Adjustment, arXiv 2025 [Paper]
- DiCache: Let Diffusion Model Determine Its Own Cache, arXiv 2025 [Paper]
- OmniCache: Global Trajectory-Oriented Cache Reuse, arXiv 2025 [Paper]
- EasyCache: Runtime-Adaptive Caching for Video Diffusion, arXiv 2025 [Paper] [Code]
- Forecast then Calibrate: Caching as ODE for Diffusion Transformers, arXiv 2025 [Paper]
- SpecDiff: Accelerating Diffusion Model Inference with Self-Speculation, arXiv 2025 [Paper]
- Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model, arXiv 2025 [Paper] [Code]
- SemCache:Adaptive Semantic-Aware Caching for Efficient Video Diffusion arXiv 2025 [Paper] [Code]
- AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration, arXiv 2024 [Paper]
- Dynamic Token Carving (Jenga): Training-Free Efficient Video Generation, arXiv 2025 [Paper] [Code]
- VORTA: Efficient Video Diffusion via Routing Sparse Attention, arXiv 2025 [Paper]
- Sparse VideoGen: Accelerating with Spatial-Temporal Sparsity, ICML 2025 [Paper] [Code]
- Sparse VideoGen2: Semantic-Aware Permutation, arXiv 2025 [Paper] [Code]
- Shortest Path in Denoising Diffusion, CVPR 2025 [Paper] [Code]
- BOSS: Bellman Optimal StepSize, ICLR 2024 [Paper] [Code]
- Optimal Stepsize for Diffusion Sampling, arXiv 2025 [Paper] [Code]
- DKDM: Data-Free Knowledge Distillation for Diffusion Models, CVPR 2025 [Paper] [Code]
- Distilling Diversity and Control in Diffusion Models, arXiv 2025 [Paper] [Code]
- Glance: Accelerating Diffusion Models with 1 Sample, arXiv 2025 [Paper] [Code]
- Impossible Videos, arXiv 2025 [Paper] [Code]
- SRDiffusion: Sketching-Rendering Cooperation, arXiv 2025 [Paper]
- Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps, CVPR 2025 [Paper]
- Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding, arXiv 2025 [Paper] [Code]
β¨ This list is continuously updated with the latest works.