The One World Seminar Series on the Mathematics of Machine Learning is an online platform for research seminars, workshops and seasonal schools in theoretical machine learning. The focus of the series lies on theoretical advances in machine learning and deep learning. The series was started during the Covid-19 epidemic in 2020 to bring together researchers from all over the world for presentations and discussions in a virtual environment. It follows in the footsteps of other community projects under the One World Umbrella which originated around the same time.
We welcome suggestions for speakers concerning new and exciting developments and are committed to providing a platform also for junior researchers. We recognize the advantages that online seminars provide in terms of flexibility. Any feedback on different events is welcome.
Zoom talks are held on Wednesdays at 12:00 pm New York time (9:00pm Pacific).
A list of past seminars can be found here and recordings can be viewed on our Youtube channel. The invitation to future seminars will be shared on this site before the talk and distributed via email.
Wed 1 July
Minshuo Chen
Unlocking Adaptive Generative Decision-Making with Diffusion Models: From Reward Steering to Latency-Aware Meta-Control
Abstract: Diffusion models have emerged as an expressive generative policy class for decision making, capable of representing rich distributions over actions and trajectories. Yet deploying these models in real-world sequential tasks raises two fundamental control questions: how to align a pretrained generative policy with task-specific rewards, and how to schedule inference when sample generation introduces non-negligible latency. In this talk, we present two principled approaches to these challenges. The first part develops a training-free reward steering framework based on the classical Doob $h$-transform. The method adapts a pre-trained model at inference time by modifying its sampling dynamics toward reward-preferred distributions, without requiring reward differentiability. We establish convergence guarantees for a prototypical version of the method and propose practical implementations that incur only modest additional sampling cost while achieving state-of-the-art empirical performance. The second part shifts from reward steering to meta-control of diffusion policies. Because diffusion policies generate actions through iterative denoising, inference latency becomes a central issue in real-time decision making. This makes inference timing critical for both task performance and computational efficiency. We formulate latency-aware inference scheduling as a semi-Markov decision process, in which a meta-controller decides when to invoke the diffusion policy while accounting for delayed execution and action buffering. We then develop a learning algorithm with theoretical guarantees. Together, these works suggest a broader view of diffusion models for decision making: they are not merely expressive policy parameterizations, but controllable components inside sequential decision-making systems.
Sign up here to join our mailing list and receive announcements. If your browser automatically signs you into a google account, it may be easiest to join on a university account by going through an incognito window. With other concerns, please reach out to one of the organizers.
Ricardo Baptista (University of Toronto)
Wuyang Chen (Simon Fraser University)
Bin Dong (Peking University)
Lyudmila Grigoryeva (University of St. Gallen)
Boumediene Hamzi (Caltech)
Yuka Hashimoto (NTT)
Qianxiao Li (National University of Singapore)
Lizao Li (Google)
George Stepaniants (Caltech)
Zhiqin Xu (Shanghai Jiao Tong University)
Simon Shaolei Du (University of Washington)
Franca Hoffmann (Caltech)
Surbhi Goel (Microsoft Research NY)
Issa Karambal (Quantum Leap Africa)
Tiffany Vlaar (University of Glasgow)
Chao Ma (Stanford University)
Song Mei (UC Berkeley)
Philipp Petersen (University of Vienna)
Matthew Thorpe (University of Warwick)
Stephan Wojtowytsch (University of Pittsburgh)