ACM SIGMETRICS 2026 Workshop
Organized by Yuxin Chen (UPenn), Laixi Shi (JHU), Yingying Li (UIUC)
This one-day workshop will feature recent advances in generative AI, broadly covering foundational and algorithmic aspects as well as applications. The workshop aims to combine theory and practice, bringing together researchers across academia and industry. The event will be organized to promote meaningful interactions and discussions, and foster interdisciplinary collaboration, through a series of invited talks and networking breaks.
If there are any questions, please contact [email protected].
Harvard University
Title: Theory for Discrete Diffusions: Parallel Decoding and Variable-Length Generation
Abstract: Compared to autoregressive models and even to continuous diffusions, diffusion language models offer a fundamentally different design space for crafting efficient and flexible generation processes. This talk discusses work along two axes of this design space: parallel decoding and variable-length generation. In the first half, an exact characterization of the optimal inference schedule for masked diffusion models is given, which depends on a certain "information profile" specific to the data distribution. From this characterization, simple schedules are derived that enable sampling provably more efficiently than autoregressive models for any distribution with bounded correlations. In the second half, FlexMDM is presented, a theoretically principled and empirically lightweight method for equipping diffusion language models with the ability to generate sequences of arbitrary length, while provably preserving their any-order generation capabilities.
Biography: Sitan Chen is an Assistant Professor of Computer Science at Harvard University, where he is a member of the Theory of Computation, the ML Foundations group, and the Harvard Quantum Initiative. Previously, he was an NSF math postdoc at UC Berkeley, after completing his PhD in EECS at MIT in 2021. He is broadly interested in algorithmic questions about learning from data, most recently related to the science and theory of localization-based generative modeling, and the design of quantum protocols for learning about the physical universe. His work has been recognized with an NSF CAREER award, an ICML Outstanding Paper Award, and the Harvard Dean's Competitive Fund for Promising Scholarship.
The Ohio State University
Title: Breaking the Sampling Barrier in Discrete Diffusion: Sharp Theory and Accelerated Sampling
Abstract: Diffusion models have become a central paradigm in modern generative AI, and in discrete domains such as natural language, code, and molecular design, discrete diffusion models have emerged as especially compelling due to their strong empirical performance and their natural fit to discrete data. Despite this rapid empirical progress, the theoretical understanding of their convergence behavior and sampling error remains limited. Characterizing how quickly discrete diffusion samplers approach realistic data distributions is not only a fundamental question, but also a practical one, as it directly guides the design of faster samplers that reduce inference-time computation and power consumption, both of which are critical to the real-world deployability of generative AI systems.
In this talk, I will present our recent analytical framework for establishing non-asymptotic error bounds and convergence guarantees for discrete diffusion models. Our results sharpen the current state of the art, as evidenced by matching lower bounds that characterize the fundamental error scaling. Building on these insights, I will introduce our recently developed Gibbs-based accelerated sampler, which, for the first time, breaks the polynomial sampling-complexity barrier in target accuracy and achieves a poly-logarithmic rate for uniform-rate discrete diffusion, thereby substantially reducing sampling cost. I will conclude with open directions at the intersection of foundational theory and practical sampler design, including fine-tuning and test-time design of discrete diffusion models toward downstream objectives and constraints.
Biography: Dr. Yingbin Liang is currently a Professor at the Department of Electrical and Computer Engineering at the Ohio State University (OSU), and a core faculty of the Ohio State Translational Data Analytics Institute (TDAI). She also serves as the Deputy Director of the NSF AI-EDGE Institute and the Co-Lead for Foundational AI Pillar of OSU AI^X Hub. Dr. Liang received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign in 2005, and served on the faculty of University of Hawaii and Syracuse University before she joined OSU. Dr. Liang's research lies at the intersection of machine learning, large-scale optimization, statistical signal processing, information theory, and wireless networks, with their growing applications to other scientific domains. She received the National Science Foundation CAREER Award and the State of Hawaii Governor Innovation Award in 2009. She also received EURASIP Best Paper Award in 2014. She is currently an Information Theory Society Distinguished Lecturer for 2026–2027. Dr. Liang is an IEEE fellow.
Carnegie Mellon University
Title: Dynamical System Perspectives of LLMs
Abstract: Large Language Models are opaque black boxes that are extremely difficult to interpret. Motivated by thermodynamics, where macroscopic variables (like temperature, pressure) are identified to describe complex systems consisting of billions of molecules, we propose Representational Effective, a framework for describing large language model computation in terms of learned macrostates rather than microscopic details. RET learns these macrostates from hidden-state trajectories using a BYOL/JEPA-style self-supervised objective, coarse-graining activations into macrovariables that preserve higher-level structure relevant for prediction and interpretation. We evaluate whether these macrovariables are practically relevant for interpretability: RET yields temporally consistent states that reveal "mental-state" trajectories of reasoning, capture high-level semantic structure, support early prediction of behavioral outcomes such as sycophancy, and provide causal handles for steering generations toward interpretable computational phases. Together, these results suggest that LLM computation admits useful effective descriptions via RET: high-level, dynamically meaningful variables that support interpretation, prediction, and intervention.
Biography: Guannan Qu is an Associate Professor at the Electrical and Computer Engineering Department of Carnegie Mellon University. He received his B.S. degree in Electrical Engineering from Tsinghua University in Beijing, China in 2014, and his Ph.D. in Applied Mathematics from Harvard University in Cambridge, MA in 2019. He was a postdoctoral scholar in the Department of Computing and Mathematical Sciences at California Institute of Technology from 2019 to 2021. He is the recipient of NSF CAREER Award, Finalist of ICRA 2025 Best Conference Paper Award, Caltech Simoudis Discovery Award, PIMCO Fellowship, Amazon AI4Science Fellowship, and IEEE SmartGridComm Best Student Paper Reward. His research interests lie in control, optimization, and machine/reinforcement learning.
University of Michigan
Title: The Effect of Training Task Diversity on In-Context Learning through the Lens of Low-Dimensional Subspaces
Abstract: The transformer's emergent ability to perform in-context learning (ICL) has sparked a wide range of studies designed to understand its underlying mechanism. Existing works often study how training task diversity, defined either as the number of ICL training task vectors or as the number of function classes from which the task vectors are drawn, shapes both the learning dynamics and generalization capabilities of ICL. While both definitions have uncovered many interesting phenomena, many observations under the latter definition remain theoretically unexplained. This paper presents a minimal analytical model under which these phenomena provably emerge from the properties of the pre-training data. By modeling the pre-training task vectors as a mixture of low-rank Gaussians, we show how pre-training task diversity, defined by the number of non-overlapping columns between subspaces that parameterize the covariance matrices, improves both the generalization and optimization trajectory of ICL with linear attention. In particular, we show that our model can explain (i) why pre-training with multiple tasks can shorten the ICL training plateau (Kim et al., 2025) and (ii) why ICL appears to achieve out-of-distribution generalization. We conclude by showing how our results empirically extend to nonlinear transformers and nonlinear function classes. Overall, our work presents a mathematically tractable framework to unify existing observations.
Biography: Qing Qu is an Assistant Professor in EECS at the University of Michigan. He works at the intersection of the foundations of machine learning, numerical optimization, and signal/image processing, with a current focus on the theory of deep generative models and representation learning. Prior to joining Michigan in 2021, he was a Moore-Sloan Data Science Fellow at the Center for Data Science, New York University (2018-2020). He received his Ph.D. in Electrical Engineering from Columbia University in October 2018 and his B.Eng. in Electrical and Computer Engineering from Tsinghua University in July 2011. His work has been recognized with multiple honors, including the Best Student Paper Award at SPARS 2015, a Microsoft PhD Fellowship in Machine Learning (2016), the Best Paper Award at the NeurIPS Diffusion Models Workshop (2023), NSF CAREER Award (2022), Amazon Research Award (AWS AI, 2023), UM CHS Junior Faculty Award (2025), Google Research Scholar Award (2025), and the 1938E Award in Michigan Engineering (2026). He has led and delivered multiple tutorials at ICASSP, CPAL, CVPR, ICCV, and ICML. He was one of the founding organizers and Program Chair for the new Conference on Parsimony & Learning (CPAL), regularly serves as an Area Chair for NeurIPS, ICML, and ICLR, senior area chair for ICASSP'26, and is an Action Editor for TMLR.
University of Michigan
Title: Constrained and Controllable Diffusion Models for Solving Scientific Inverse Problems
Abstract: Diffusion models have recently demonstrated remarkable promise as generative priors for addressing inverse problems. However, their training remains data-intensive and computationally demanding, posing significant challenges for high-dimensional and high-resolution domains such as medical and scientific imaging. In this talk, I will present our recent contributions toward improving the data and computational efficiency of diffusion-based generative models for general inverse problems via constrained posterior sampling. Specifically, I will introduce two principled frameworks we have developed to facilitate the learning of diffusion priors in high-dimensional settings: latent diffusion models and patch diffusion models. The effectiveness of these approaches is validated across a range of inverse problems involving both natural and medical images, including 3D medical image reconstruction, demonstrating substantial gains in both model efficiency and reconstruction performance. If time permits, I will further discuss our recent investigations into the generalization and controllability of diffusion-based sampling. Our analysis of the initial noise space reveals novel opportunities for uncertainty quantification, strengthening weak priors, and fine-grained control in image restoration and reconstruction tasks. Together, this research work establishes a principled foundation for deploying diffusion-based generative models in complex, real-world settings, for addressing critical inverse problems across broad biomedical and scientific applications.
Biography: Liyue Shen is an Assistant Professor in the EECS department at the University of Michigan. Prior to that, she received her B.E. degree in Electronic Engineering from Tsinghua University in 2016, and obtained her Ph.D. degree from the Department of Electrical Engineering, Stanford University in 2022. She also spent one year as a postdoctoral research fellow at the Department of Biomedical Informatics, Harvard Medical School. Her research interest lies in Biomedical AI, which lies in the interdisciplinary areas of machine learning, computer vision, signal and image processing, biomedical imaging, and medical data analysis. Her recent research focuses on the generative diffusion models, multimodal agentic models, and implicit neural representation learning. She is the recipient of Stanford Bio-X Bowes Graduate Student Fellowship (2019-2022), and was selected as the Rising Star in EECS by MIT and the Rising Star in Data Science by the University of Chicago in 2021. She has served as area chairs for ICLR, ICML, NeurIPS, MLHC, and has delivered multiple tutorials or organized workshops at CVPR, ICCV, ICML, and CPAL. Website: https://liyueshen.engin.umich.edu/
University of Illinois Urbana-Champaign
Title: Noise Schedule Design for Diffusion Models: An Optimal Control Perspective
Abstract: We develop a principled framework for analyzing and designing noise schedules in diffusion models. We show that one can recast this design problem as an optimal control problem, whose state is the Fisher information of the diffusion process which evolves according to an ODE and the control input is the noise schedule. The objective of the optimal control problem is a functional involving the Fisher information, which is shown to be an upper bound on the Kullback-Leibler sampling error. By solving this optimal control problem, we obtain sufficient conditions on noise schedules under which state-of-the-art O~(d/n) sampling error is achievable, where d is the data dimension and n is the number of discretization steps. While existing theoretical work also proves that O~(d/n) sampling error bounds are achievable, these results hold for specific noise schedules, which do not include the schedules used in practice. Under a further parametric assumption on the data distribution, we show that one can obtain closed-form expressions for the noise schedules. These noise schedules generalize standard empirical schedules such as exponential and sigmoid schedules by allowing additional parameters that can be tuned. Systematically tuning the parameters of these schedules yields new schedules that achieve superior FID scores on image generation benchmarks.
Biography: R. Srikant is the Director of the National Center for Supercomputing Applications, Grainger Distinguished Chair in Engineering, and a Professor of Electrical and Computer Engineering and Coordinated Science Lab at the University of Illinois Urbana-Champaign.
University of Pennsylvania
Title: How Geometry Shapes Optimization in Deep Generative Models
Abstract: Deep generative models have achieved remarkable empirical success, but their theoretical foundations remain poorly understood. In this talk, I present recent progress on the geometry and optimization of modern generative models, focusing on three representative settings. First, I analyze generative model inversion and establish linear convergence of gradient descent under two geometric conditions on the loss landscape, avoiding unrealistic random-weight assumptions. Second, I study transformer-based diffusion models trained on multi-token Gaussian mixture data, showing that gradient descent converges to the Bayes-optimal denoiser and that self-attention approximates the optimal MMSE estimator. Third, I introduce Parsimonious Flow Matching (PFM), which replaces the standard isotropic Gaussian latent with a multimodal mixture aligned with data structure, yielding better-conditioned optimization and faster convergence. Together, these results highlight how geometric structure in data and latent spaces enables sharper theoretical guarantees and more efficient generative modeling.
Biography: René Vidal is the Penn Integrates Knowledge and Rachleff University Professor of Electrical and Systems Engineering and Radiology at the University of Pennsylvania, where he directs the Center for Innovation in Data Engineering and Science (IDEAS) and serves as Co-Chair of Penn AI. He is also an Amazon Scholar, Affiliated Chief Scientist at NORCE, and former Associate Editor-in-Chief of IEEE Transactions on Pattern Analysis and Machine Intelligence. His research advances the mathematical foundations of deep learning and trustworthy AI, with broad impact across computer vision and biomedical data science. His contributions have been recognized with major honors, including the IEEE Edward J. McCluskey Technical Achievement Award, the D’Alembert Faculty Award, the J.K. Aggarwal Prize, the ONR Young Investigator Award, the NSF CAREER Award, and best paper awards in machine learning, computer vision, signal processing, control, and medical robotics. He is a Fellow of ACM, AIMBE, IEEE, IAPR, and Sloan Foundation.
University of Michigan
Title: Stochastic Zeroth-Order Policy Optimization for RLHF
Abstract: TBA
Biography: TBA
Tentative schedule — each invited talk is 45 minutes, and shared coffee and lunch breaks follow the general workshop plan.
Sitan Chen
Theory for Discrete Diffusions: Parallel Decoding and Variable-Length Generation
Qing Qu
The Effect of Training Task Diversity on In-Context Learning through the Lens of Low-Dimensional Subspaces
Idea Hub
Yingbin Liang
Breaking the Sampling Barrier in Discrete Diffusion: Sharp Theory and Accelerated Sampling
R. Srikant
Noise Schedule Design for Diffusion Models: An Optimal Control Perspective
Rogel Ballroom
Lei Ying
Stochastic Zeroth-Order Policy Optimization for RLHF
René Vidal
How Geometry Shapes Optimization in Deep Generative Models
Idea Hub
Liyue Shen
Constrained and Controllable Diffusion Models for Solving Scientific Inverse Problems
Guannan Qu