Site menu:
OPT2025
We welcome you to participate in the 17th International OPT Workshop on Optimization for Machine Learning, to be held as a part of the NeurIPS 2025 conference. This year we particularly encourage (but not limit) submissions with a focus on "Statistics Meets Optimization".
We are looking forward to an exciting OPT!
Accepted Papers
- All accepted papers are visible on the OpenReview submission page.
Accepted Papers (oral)
- Data Generation without Function Estimation — Hadi Daneshmand (University of Virginia, Charlottesville), Ashkan Soleymani (Massachusetts Institute of Technology)
- Flat Minima and Generalization: Insights from Stochastic Convex Optimization — Matan Schliserman (Tel Aviv University), Shira Vansover-Hager (Tel Aviv University), Tomer Koren (Tel Aviv University)
- Can SGD Handle Heavy-Tailed Noise? — Ilyas Fatkhullin (ETHZ - ETH Zurich), Florian Hübler (ETH Zurich), Guanghui Lan (Georgia Institute of Technology)
- Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression — Tingkai Yan (Peking University), Haodong Wen (Tsinghua University), Binghui Li (Peking University), Kairong Luo (Tsinghua University), Wenguang Chen (Tsinghua University, Tsinghua University), Kaifeng Lyu (Tsinghua University)
- Muon Optimizes Under Spectral Norm Constraints — Lizhang Chen (University of Texas at Austin), Jonathan Li (University of Texas at Austin), qiang liu (University of Texas, Austin)
- Provable Benefit of Sign Descent: A Minimal Model Under Heavy-Tail Class Imbalance — Robin Yadav (University of British Columbia), Shuo Xie (Toyota Technological Institute at Chicago), Tianhao Wang (University of California, San Diego), Zhiyuan Li (Toyota Technological Institute at Chicago)
Accepted Papers (poster)
- Sharpness-Aware Minimization with Z-Score Gradient Filtering — Vincent-Daniel Yun (University of Southern California)
- OrthoGrad Improves Neural Calibration — C. Evans Hedges (University of Denver)
- EMA Without the Lag: Bias-Corrected Iterate Averaging Schemes — Adam Block (Columbia University), Cyril Zhang (Microsoft)
- Why Does Stochastic Gradient Descent Slow Down in Low-Precision Training? — Vincent-Daniel Yun (University of Southern California)
- Atlas – Rethinking Optimizer Design for Stability and Speed — Janos Horvath (Visionary Tech & Event Solution )
- On Riemannian Gradient Descent Algorithm using gradient averaging — Saugata Purkayastha (Universität des Saarlandes), Sukannya Purkayastha (Technische Universität Darmstadt)
- Learning by solving differential equations — Benoit Dherin (Google Research), Michael Munn (Google), Hanna Mazzawi (Research, Google), Michael Wunder (Google), Sourabh Medapati (Google Deepmind), Javier Gonzalvo (Google)
- New Optimization Methods for Very Large Scale SVMs — Yifan Kang (Clemson University), Yarui Cao (Clemson University), Kai Liu (Clemson University)
- Curriculum-Learning PIELMs for Hemodynamic Flows — Vikas Dwivedi (Indian Institute of Technology, Madras), Monica Sigovan (CNRS), Sixou Bruno (Institut National des Sciences Appliquées de Lyon)
- DRO: A Python Library for Distributionally Robust Optimization in Machine Learning — Jiashuo Liu (Tsinghua University), Tianyu Wang (Columbia University), Henry Lam (Columbia University), Hongseok Namkoong (LinkedIn), Jose Blanchet (Stanford University)
- On Optimizing Large Scale Multi-Class Logistic Regression — Yifan Kang (Clemson University), Yarui Cao (Clemson University), Kai Liu (Clemson University)
- Asymptotic and Finite-Time Guarantees for Langevin-Based Temperature Annealing in InfoNCE — Faris Chaudhry (Imperial College London)
- HiSo: Efficient Federated Zeroth-Order Optimization via Hessian-Informed Acceleration and Scalar-Only Communication — Zhe Li (Rochester Institute of Technology), Bicheng Ying (Google), Zidong Liu (Combocurve Inc.), Chaosheng Dong (Amazon), Haibo Yang (Rochester Institute of Technology)
- From Emergence to Intention: A Statistical Inductive Bias for Tractable Optimization in Multi-Agent Coordination — Brennen Hill (University of Wisconsin - Madison), Mant Koh En Wei (National University of Singapore), Jishnuanandh Thangavel (National University of Singapore)
- What really matters in matrix-whitening optimizers? — Kevin Frans (University of California, Berkeley), Pieter Abbeel (Amazon), Sergey Levine (University of California Berkeley)
- Distributionally Robust Optimization via Diffusion Ambiguity Modeling — JIAQI WEN (University of Houston), Jianyi Yang (University of Houston)
- Revisiting the Geometrically Decaying Step Size: Linear Convergence for Smooth or Non-Smooth Functions — Jihun Kim (University of California, Berkeley)
- Fast decentralized gradient tracking for federated learning with local updates — Chris Junchi Li (University of California Berkeley)
- Implicit Bias of Polyak and Line-Search Step Sizes on Linear Classification with Separable Data — Chen Fan (University of British Columbia), Reza Babanezhad Harikandeh (Samsung), Christos Thrampoulidis (University of British Columbia), Mark Schmidt (University of Alberta), Sharan Vaswani (Simon Fraser University)
- Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers — Eric Tillmann Bill (ETHZ - ETH Zurich), Cristian Perez Jensen (ETHZ - ETH Zurich)
- Benefits of Learning Rate Annealing for Tuning-Robustness in Stochastic Optimization — Amit Attia (Tel Aviv University), Tomer Koren (Tel Aviv University)
- On the Limits of Momentum in Decentralized and Federated Optimization — Riccardo Zaccone (Polytechnic Institute of Turin), Sai Praneeth Karimireddy (University of Southern California), Carlo Masone (Polytechnic Institute of Turin)
- Lipschitz Optimization via Weighted Sampling Based on Expected Potential Maximizers Reduction — Hideyuki Masui (Mitsubishi Electric Corporation), Koki Nakane (Mitsubishi Electric Corporation), Renshi Nagasawa (Mitsubishi Electric Corporation)
- On the Benefits of Weight Normalization for Overparameterized Matrix Sensing — Yudong Wei (ETHZ - ETH Zurich), Liang Zhang (Department of Computer Science, ETHZ - ETH Zurich), Bingcong Li (ETHZ - ETH Zurich), Niao He (Swiss Federal Institute of Technology)
- Block-Diagonal K-FAC: A Trade-off Between Curvature Information and Resource Efficiency — Mingzhe Yu (University of Tsukuba, Tsukuba University), Osamu Tatebe (University of Tsukuba)
- Projected Compression — Maciej Stefaniak (University of Warsaw), Michał Krutul (University of Warsaw), Mikołaj Dziok (University of Warsaw), Jan Małaśnicki (University of Warsaw), Maciej Pióro (Polish Academy of Sciences), Jakub Krajewski (University of Warsaw), Sebastian Jaszczur (Anthropic), Marek Cygan (University of Warsaw), Kamil Adamczewski (Technical University of Wroclaw), Jan Ludziejewski (University of Warsaw)
- Convergence for Discrete Parameter Update Schemes — Paul W Wilson (Independent), Fabio Zanasi (University College London, University of London), George Anthony Constantinides (Imperial College London)
- Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization — Lesi Chen (Tsinghua Univeristy), Junru Li (Tsinghua University), El Mahdi Chayti (Swiss Federal Institute of Technology Lausanne), Jingzhao Zhang (Tsinghua University, Tsinghua University)
- Multi-Timescale Gradient Sliding for Distributed Optimization — Junhui Zhang (Massachusetts Institute of Technology), Patrick Jaillet (Massachusetts Institute of Technology)
- Weight Decay may matter more than µP for Learning Rate Transfer in Practice — Atli Kosson (EPFL - EPF Lausanne), Jeremy Welborn (Amazon), Yang Liu (University of Illinois, Urbana Champaign), Martin Jaggi (EPFL), Xi Chen (Amazon)
- FineAMP: Optimization-Based Automatic Mixed Precision Quantization for Efficient Diffusion Model Inference — Burak Bartan (Qualcomm Inc, QualComm), Ruizhong Qiu (University of Illinois Urbana-Champaign), Rafael Esteves (Qualcomm Inc, QualComm), Yuwei Ren (QualComm), Weiliang Will Zeng (QualComm), An Chen (QualComm)
- Evolution of the Spectral Dimension of Transformer Activations — Andy Zeyi Liu (Yale University), Elliot Paquette (McGill University), John Sous (Yale University)
- Primal-dual hybrid algorithms for chi-squared regularized Optimal Transport: statistical-computational trade-offs and applications to Wasserstein Barycenters — Denys Ruban (University of Ottawa), Augusto Gerolin (University of Ottawa)
- Contextual-Dueling with Offline Regression: Near-Optimal Personalized Recommendation with Realizable Preferences — Aadirupa Saha (University of Illinois at Chicago)
- A Simplified Analysis of SGD for Linear Regression with Weight Averaging — Alexandru Meterez (School of Engineering and Applied Sciences, Harvard University), Depen Morwani (Harvard University, Harvard University), Costin-Andrei Oncescu (Harvard University, Harvard University), Jingfeng Wu (University of California, Berkeley), Cengiz Pehlevan (School of Engineering and Applied Sciences, Harvard University), Sham M. Kakade (Harvard University)
- Towards Quantifying the Hessian Structure of Neural Networks — Zhaorui Dong (The Chinese University of Hong Kong-Shenzhen), Yushun Zhang (The Chinese University of Hong Kong, Shenzhen), Jianfeng Yao (The Chinese University of Hong Kong (Shenzhen)), Ruoyu Sun (The Chinese University of Hong Kong)
- Analysis of Schedule Free Non-Convex Optimization — Connor Brown (Department of Computer Science, Princeton University), Ahmed Khaled (Princeton University), Chi Jin (Princeton University)
- Entropy Meets Importance: A Unified Head Importance–Entropy Score for Stable and Efficient Transformer Pruning — MINSIK CHOI (Korea University), Hyegang Son (Korea University), Joohun Hyun (Korea University), Seokmin Kim (Korea University), Young Geun Kim (Korea University)
- Parameter-Agnostic Error Feedback Enhanced With Hessian-Corrected Momentum — Abdurakhmon Sadiev (King Abdullah University of Science and Technology), Yury Demidovich (King Abdullah University of Science and Technology), Grigory Malinovsky (King Abdullah University of Science and Technology), Igor Sokolov (King Abdullah University of Science and Technology), Sarit Khirirat (King Abdullah University of Science and Technology), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- A Non-Convex Method for Polynomial Manifold Learning — Param Mody (University of British Columbia), Elina Robeva (University of British Columbia)
- A stochastic Lagrangian-based method for nonconvex empirical risk minimization with nonlinear constraints — Dimitri Papadimitriou (Math. and Computational Optimization Inst.)
- First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions — Egor Shulgin (KAUST), Grigory Malinovsky (King Abdullah University of Science and Technology), Sarit Khirirat (King Abdullah University of Science and Technology), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Simultaneous Fine-Tuning and Pruning of LLMs — Finn Reinecke (Albert-Ludwigs-Universität Freiburg), Jörg K.H. Franke (ELLIS Institute Tübingen), Frank Hutter (Prior Labs), Michael Hefenbrock (perspix.ai)
- The Hidden Cost of Approximation in Online Mirror Descent — Ofir Schlisselberg (Tel Aviv University), Uri Sherman (Tel Aviv University), Tomer Koren (Tel Aviv University), Yishay Mansour (School of Computer Science, Tel Aviv University)
- Aligning Distributionally Robust Optimization with Practical Deep Learning Needs — Dmitrii Feoktistov (Yandex), Igor Ignashin (Independent), Andrey Veprikov (Mohamed bin Zayed University of Artificial Intelligence), Nikita Borovko (Moscow State University, Lomonosov Moscow State University), Aleksandr Bogdanov (Independent), Savelii Chezhegov (Independent), Aleksandr Beznosikov (Independent)
- Hessian-Dependent Sample Complexity in Zeroth-Order Stochastic Optimization: Nonconvex Support Sampling Is Necessary for Optimality — Mengtian Hong (University of Glasgow), Jason D. Lee (Princeton University), Qian Yu (University of California, Santa Barbara)
- A Theoretical Analysis for CUR Decomposition based Active Learning and Feature Selection — Zhong Chen (Southern Illinois University-Carbondale), Chen Zhao (Baylor University), Yi He (College of William and Mary)
- One-Sided Matrix Completion from Ultra-Sparse Samples — Hongyang R. Zhang (Northeastern University), Zhenshuo Zhang (Northeastern University), Huy Nguyen (Northeastern University), Guanghui Lan (Georgia Institute of Technology)
- Partial Parameter Updates for Efficient Distributed Training — Anastasiia Filippova (Apple), Angelos Katharopoulos (Apple), David Grangier (Apple), Ronan Collobert (Apple)
- Graph-theoretic perspectives on splitting methods for sparse optimal transport — Jacob Lindbäck (KTH Royal Institute of Technology), Mikael Johansson (KTH Royal Institute of Technology, Stockholm, Sweden)
- DSGD-AC: controlled consensus errors improve generalization in decentralized training — Zesen Wang (KTH Royal Institute of Technology), Mikael Johansson (KTH Royal Institute of Technology, Stockholm, Sweden)
- Connecting Membership Inference Privacy and Generalization through Instance-Wise Measurements — Leah Woldemariam (Cornell University), Anna Scaglione (Cornell University)
- Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization — Fangzhao Zhang (Stanford University), Mert Pilanci (Stanford University)
- Gradient Descent’s Last Iterate is Often (slightly) Suboptimal — Guy Kornowski (Weizmann Institute of Science), Ohad Shamir (Weizmann Institute)
- Algorithm design and sharper bounds for improving bandits — Avrim Blum (Toyota Technological Institute at Chicago), Marten Garicano (University of Chicago), Kavya Ravichandran (Toyota Technological Institute at Chicago), Dravyansh Sharma (Toyota Technological Institute at Chicago)
- Augmented Normalization: Differentiating the Generalized Geometric Median — Tyler King (Cornell University), Ser-Nam Lim (University of Central Florida)
- Designing Algorithms for Entropic Optimal Transport from an Optimisation Perspective — Vishwak Srinivasan (Massachusetts Institute of Technology), Qijia Jiang (University of California, Davis)
- A Monte Carlo Approach to Nonsmooth Convex Optimization via Proximal Splitting Algorithms — Nicholas Di (Rice University), Eric Chi (University of Minnesota - Twin Cities), Samy Wu Fung (Colorado School of Mines)
- EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients — He-Yen Hsieh (Harvard University, Harvard University), Hong Wang (Intel), H. T. Kung (Harvard University)
- Data Geometry Determines Generalization Below the Edge-of-Stability — Tongtong Liang (University of California, San Diego), Alex Cloninger (University of California, San Diego), Rahul Parhi (University of California, San Diego), Yu-Xiang Wang (University of California, San Diego)
- Quantum Optimal Transport: Regularization and Algorithms — Pavlo Pelikh (University of Ottawa), Augusto Gerolin (University of Ottawa)
- OptiBridge: Multi-Scale Multi-Shift Bridging for Conditioning Optimization Landscapes — Farnaz Salehi Sadaghiani (University of Illinois at Chicago), Mojtaba Soltanalian (University of Illinois at Chicago)
- Spiking Brain Compression: Exploring One-Shot Post-Training Pruning and Quantization for Spiking Neural Networks — Lianfeng Shi (University of Bristol), Ao Li (University of Bristol), Benjamin Ward-Cherrier (University of Bristol)
- Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs — Shmulik Markovich-Golan (Intel), Daniel Ohayon (Technion - Israel Institute of Technology, Technion - Israel Institute of Technology), Itay Niv (Tel Aviv University), Yair Hanani (Intel)
- Hessian Spectrum is Constant Across Minimizers in Regularized Deep Scalar Factorization — Anıl Kamber (University of California, San Diego), Rahul Parhi (University of California, San Diego)
- Data-Aware Training Quality Monitoring and Certification for Deep Learning — Farhang Yeganegi (University of Illinois at Chicago), Arian Eamaz (University of Illinois at Chicago), Mojtaba Soltanalian (University of Illinois at Chicago)
- Switching Gradient Methods for Constrained Federated Optimization — Antesh Upadhyay (Purdue University), Sang Bin Moon (Purdue University), Abolfazl Hashemi (Purdue University)
- Distributionally Robust Nash Equilibria via Variational Inequalities — Zeinab Alizadeh (University of Arizona), Azadeh Farsi (University of Arizona), Afrooz Jalilzadeh (University of Arizona)
- PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts — Zeman Li (Google), Yuan Deng (Google Research), Peilin Zhong (Meta), Meisam Razaviyayn (University of Southern California), Vahab Mirrokni (Google Research)
- Grassmannian Optimization Drives Generationlization in Overparameterized DNN — Changfeng Wang (Boston Data Science)
- Domain-Aware Scaling Laws Uncover Data Synergy — Kimia Hamidieh (Massachusetts Institute of Technology), Lester Mackey (Microsoft Research New England), David Alvarez-Melis (School of Engineering and Applied Sciences, Harvard University)
- Revisiting Stochastic Proximal Point Methods: Generalized Smoothness and Similarity — Zhirayr Tovmasyan (King Abdullah University of Science and Technology), Grigory Malinovsky (King Abdullah University of Science and Technology), Laurent Condat (KAUST), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Feature Learning as a Virtual Covariance Learning — Taehun Cha (Korea University), Donghun Lee (Korea University)
- \textsc{LeonArDBO}: Fast and Prior-Driven Bayesian Optimization without Surrogate Modeling — Efe Mert Karagözlü (School of Computer Science, Carnegie Mellon University), Conor Igoe (Carnegie Mellon University), Barnabas Poczos (Machine Learning Department, School of Computer Science), Jeff Schneider (Carnegie Mellon University)
- AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates — Minxin Zhang (University of California, Los Angeles), Yuxuan Liu (University of California, Los Angeles), Hayden Schaeffer (University of California, Los Angeles)
- Quantum Non-Linear Bandit Optimization — Zakaria Shams Siam (University at Albany, State University of New York), Chaowen Guan (University of Cincinnati), Chong Liu (State University of New York at Albany)
- Optimal Implicit Bias in Linear Regression — K Nithin Varma (California Institute of Technology), Babak Hassibi (California Institute of Technology)
- FairPO: Fair Preference Optimization for Multi-Label Learning — Soumen Kumar Mondal (Indian Institute of Technology, Bombay), Prateek Chanda (Indian Institute of Technology, Bombay), Akshit Varmora (Indian Institute of Technology Bombay, Indian Institute of Technology, Bombay), Ganesh Ramakrishnan (Indian Institute of Technology Bombay, Indian Institute of Technology Bombay)
- Error Feedback for Muon and Friends — Kaja Gruntkowska (King Abdullah University of Science and Technology), Alexander Gaponov (King Abdullah University of Science and Technology), Zhirayr Tovmasyan (King Abdullah University of Science and Technology), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- LOTION: Smoothing the Optimization Landscape for Quantized Training — Mujin Kwun (School of Engineering and Applied Sciences, Harvard University), Depen Morwani (Harvard University, Harvard University), Huangyuan Su (School of Engineering and Applied Sciences, Harvard University), Stephanie Gil (Harvard University), Nikhil Anand (Harvard University), Sham M. Kakade (Harvard University)
- Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update — Abdulla Jasem Almansoori (Mohamed bin Zayed University of Artificial Intelligence), Maria Ivanova (Yandex School of Data Analysis), Andrey Veprikov (Mohamed bin Zayed University of Artificial Intelligence), Aleksandr Beznosikov (Independent), Samuel Horváth (MBZUAI), Martin Takáč (Mohamed bin Zayed University of Artificial Intelligence)
- Achieving First-Order Statistical Improvements in Data-Driven Optimization — Henry Lam (Columbia University), Tianyu Wang (Columbia University)
- Towards Characterizing the Complexity of Riemannian Online Convex Optimization — Hibiki Fukushima (The University of Tokyo), Hiroshi Hirai (Nagoya University), Shinji Ito (The University of Tokyo)
- Central Limit Theorems for Asynchronous Averaged Q-Learning — Xingtu Liu (Simon Fraser University)
- Can We Estimate The Entropy Of Arbitrary Distributions Known Up To A Normalization Constant? — Safa Messaoud (Qatar Computing Research Institute), Skander Charni (Hamad Bin Khalifa University), Elaa Bouazza (Qatar Computing Research Institute), Ali Pourghasemi Fatideh (University of Maine), Halima Bensmail (Qatar Computing Research Institute)
- On the Finite-Sample Bias of Minimizing Expected Wasserstein Loss Between Empirical Distributions — Cheongjae Jang (Hanyang University), Yung-Kyun Noh (Hanyang University)
- Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime — Beomhan Baek (Seoul National University), Minhak Song (Korea Advanced Institute of Science & Technology), Chulhee Yun (Google)
- On the Rollout-Training Mismatch in Modern RL Systems — Feng Yao (Thinking Machines Lab), Liyuan Liu (Microsoft), Dinghuai Zhang (Microsoft Research), Chengyu Dong (University of California, San Diego), Jingbo Shang (University of California, San Diego), Jianfeng Gao (Microsoft Research)
- Understanding and Improving Shampoo via Kullback–Leibler Minimization — Wu Lin (Vector Institute), Scott C. Lowe (Vector Institute), Felix Dangel (Vector Institute, Toronto), Runa Eschenhagen (University of Cambridge), Zikun Xu (Microsoft), Roger Baker Grosse (Department of Computer Science, University of Toronto)
- Communication Efficient LLM Pre-training with SparseLoCo — Amir Sarfi (Templar), Benjamin Thérien (Université de Montréal), Joel Lidin (Volvo Cars), Eugene Belilovsky (Concordia University, Montreal)
- Sparse Adversarial Perturbation-Driven Scalable Coreset Optimization — Tushar Shinde (IIT Madras Zanzibar), Manasa Madabhushi (Indian Institute of Technology, Madras Zanzibar)
- Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game — Barna Pásztor (ETHZ - ETH Zurich), Thomas Kleine Buening (ETHZ - ETH Zurich), Andreas Krause (ETH Zurich)
- Extending $\mu$P: Spectral Conditions for Feature Learning Across Optimizers — akshita gupta (Purdue University), Marieme Ngom (Argonne National Laboratory), Sam Foreman (Argonne National Laboratory), Venkatram Vishwanath (Argonne National Laboratory)
- Efficient Training of CNN Ensembles via Feature-Prioritized Boosting — ~Biyi_Fang2, Truong Vo (Northwestern University), Jean Utke (Allstate), Diego Klabjan (Northwestern University)
- PEARL-Prox: Proximal Algorithm for Resolving Player Drift in Multiplayer Federated Learning — TaeHo Yoon (Johns Hopkins University), Nicolas Loizou (Johns Hopkins University)
- Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation — Kaoru Otsuka (Okinawa Institute of Science and Technology (OIST)), Yuki Takezawa (Kyoto University), Makoto Yamada (Okinawa Institute of Science and Technology (OIST))
- Who to Trust? Aggregating Client Knowledge in Logit-Based Federated Learning — Viktor Kovalchuk (Mohamed bin Zayed University of Artificial Intelligence), Nikita Kotelevskii (Mohamed bin Zayed University of Artificial Intelligence), Maxim Panov (Mohamed bin Zayed University of Artificial Intelligence), Samuel Horváth (MBZUAI), Martin Takáč (Mohamed bin Zayed University of Artificial Intelligence)
- Optimized Statistical Ranking is All You Need for Robust Coreset Selection in Efficient Transformer-Based Spam Detection — Aisha Hamad Hassan (Indian Institute of Technology Madras-Zanzibar), Tushar Shinde (IIT Madras Zanzibar)
- Adaptive acceleration without strong convexity priors or restarts — Joao V. Cavalcanti (Massachusetts Institute of Technology), Laurent Lessard (Northeastern University), Ashia C. Wilson (Massachusetts Institute of Technology)
- High-dimensional isotropic scaling dynamics of Muon and SGD — Guangyuan Wang (Mila - Quebec Artificial Intelligence Institute), Elliot Paquette (McGill University), Atish Agarwala (Google)
- Balanced Locality-Sensitive Hashing for Online Data Selection — Hoang Phan (New York University), Yijun Dong (New York University), Andrew Gordon Wilson (New York University), Qi Lei (New York University)
- Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games — Ashkan Soleymani (Massachusetts Institute of Technology), Georgios Piliouras (Google DeepMind), Gabriele Farina (Massachusetts Institute of Technology)
- BatchNorm Layers have an Outsized Effect on Adversarial Robustness — Noam Zeise (University of Glasgow), Tiffany Joyce Vlaar (University of Glasgow)
- Incentivizing Permissionless Distributed Learning of LLMs — Joel Lidin (Volvo Cars), Amir Sarfi (Templar), Evangelos Pappas, Samuel Dare (Templar AI), Eugene Belilovsky (Concordia University, Montreal), Jacob steeves (Simon Fraser University)
- Towards Robust Unroll Generalization in Learned Optimizers — Xiaolong Huang (Concordia University), Benjamin Thérien (Université de Montréal), Eugene Belilovsky (Concordia University, Montreal)
- Convex Neural Networks For Robust ASR Language Detection — Miria Feng (Stanford University), Mert Pilanci (Stanford University)
- M+Adam: Stable Low-Precision Training with Combined Adam--Madam Updates — Xiaoyuan Liang (California Institute of Technology), Sebastian Loeschcke (University of Copenhagen), Mads Toftrup (Aarhus University), Anima Anandkumar (California Institute of Technology)
- Toward the First Optimization Framework for Low-Rank Adaptation — Grigory Malinovsky (King Abdullah University of Science and Technology), Umberto Michieli (Samsung), Hasan Abed Al Kader Hammoud (KAUST), Taha Ceritli (Samsung Research UK), Hayder Elesedy (Samsung), Mete Ozay (Samsung Research), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- The Limits of large learning rates: A Case Study in Single Index Models — Bhavesh Kumar (University of Washington), Libin Zhu (University of Washington)
- Stochastic Neural Tangent Kernel: Revisiting the NTK For SGD — Bhavesh Kumar (University of Washington), Dan Mikulincer (University of Washington)
- The Hebbian Forward-Forward Algorithm — Andrii Krutsylo ( Institute of Computer Science of the Polish Academy of Sciences )
- Quasi-Newton Methods for Federated Learning with Error Feedback — Yanlin Wu (Mohamed bin Zayed University of Artificial Intelligence), Dmitry Kamzolov (Toulouse School of Economics), Martin Takáč (Mohamed bin Zayed University of Artificial Intelligence)
- Aligning Theory with Practice for Muon-type Optimizers: A Layer-wise Framework — Artem Riabinin (King Abdullah University of Science and Technology), Egor Shulgin (KAUST), Kaja Gruntkowska (King Abdullah University of Science and Technology), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Empirical-Bayes XTFC for Inverse Parameter Estimation — Vikas Dwivedi (Indian Institute of Technology, Madras), Monica Sigovan (CNRS), Sixou Bruno (Institut National des Sciences Appliquées de Lyon)
- A Unified Noise-Curvature View of Loss of Trainability — Gunbir Singh Baveja (University of British Columbia), Alex Lewandowski (University of Alberta), Mark Schmidt (University of Alberta)
- Hyperparameter-Free Auto-Scaled Gradient Normalization via Global Standard Deviation Dynamics — Vincent-Daniel Yun (University of Southern California)
- HyperPALoRA: Parameter-Efficient Pareto Hypernetworks via Preference-Based Diverse Low-Rank Adaptations — Ashmita Bhattacharya (Pennsylvania State University), Malyaban Bal (Pennsylvania State University)
- Efficient Algorithms for Combinatorial-Bandits with Monotonicity — Aniket Wagde (University of Illinois at Chicago), Aadirupa Saha (University of Illinois at Chicago)
- Foundations of Top-$k$ Decoding for Language Models — Georgy Noarov (School of Engineering and Applied Science, University of Pennsylvania), Soham Mallick (The Wharton School, University of Pennsylvania), Tao Wang (The Wharton School, University of Pennsylvania), Sunay Joshi (University of Pennsylvania), Yan Sun (New Jersey Institute of Technology), Yangxinyu Xie (University of Pennsylvania, University of Pennsylvania), Mengxin Yu (Washington University, Saint Louis), Edgar Dobriban (The Wharton School, University of Pennsylvania)
- Policy Gradient Methods Converge Globally in Imperfect-Information Extensive-Form Games — Fivos Kalogiannis (University of California, San Diego), Gabriele Farina (Massachusetts Institute of Technology)
- Data Source Adaptive Online Learning under Heteroscedastic Noise — Amith Bhat Hosadurga Anand (University of Illinois at Chicago), Aadirupa Saha (University of Illinois at Chicago), Thomas Kleine Buening (ETHZ - ETH Zurich), Haipeng Luo (University of Southern California)
- Per-Group Distributionally Robust Optimization (Per-GDRO) with Learnable Ambiguity Set Sizes via Bilevel Optimization — Seobeom Jung (Sung Kyun Kwan University), Woojae Lee (Sung Kyun Kwan University), Jihun Hamm (Tulane University), Jangho Park (Sung Kyun Kwan University)
- Chebyshev Moment Regularization (CMR): Condition-Number Control with Moment Shaping — Jinwoo Baek (Oregon State University)
- Zero-Infinity GAN: Stable Dynamics and Implicit Bias of Extragradient — Kyungjae Lee (Korea Advanced Institute of Science & Technology), Donghwan Kim (Korea Advanced Institute of Science & Technology)
- How Does Layer Normalization Improve Deep $Q$-learning? — Braham Snyder (University of Virginia, Charlottesville), Hadi Daneshmand (University of Virginia, Charlottesville), Chen-Yu Wei (University of Virginia, Charlottesville)
- On the Potential of the Four-Point Model for Studying the Role of Optimization in Robustness to Spurious Correlations — Mahdi Ghaznavi (Sharif University of Technology), Hesam Asadollahzadeh (University of Melbourne)
- Regularizing the Entropy Landscape of Self-Attention: Towards a Soft Inductive Bias in LLMs — Nandan Kumar Jha (New York University), Brandon Reagen (LG Corporation)