I am a final-year PhD candidate at Mila and Concordia University advised by Eugene Belilovsky. My research is supported by FRQNT and Frederick Lowy Scholars fellowships.
My PhD research focuses on learnable optimization algorithms that can improve by leveraging data and compute. My recent work, Celo2, shows that learned update rules can scale to practical billion-scale tasks like GPT-3 pretraining which are six orders of magnitude larger than the meta-training distribution. This unlocks a scalable path towards improving optimization algorithms and I'm excited to push them beyond academic research to evolve the training pipelines of modern AI systems!
I have been fortunate to work with wonderful folks in diverse areas of machine learning. Most recently, I worked at Meta Superintelligence Labs (FAIR) with Yoram Bachrach and Jakob Foerster on AI research agents. At Apple MLR, I worked with Federico Danieli on state-space models (Mamba) as an intern. Before starting my PhD, I spent 1.5 years as a Visiting Scholar with Devi Parikh and Dhruv Batra at Georgia Tech, where I worked on multi-modal embodied agents.
| Oct 2025 | Our work, Celo, has been awarded a J2C Certification (Top 10% in TMLR)! |
| Sep 2025 | Thanks to Google TRC program for the generous TPU support for my PhD research! |
| Aug 2025 | Started at Meta Superintelligence Labs (FAIR) in London, working on the AI Scientist! |
| Jun 2025 | Released assayer for automatic ML model checkpoint monitoring and evaluation. |
| Jun 2025 | Celo accepted to TMLR! JAX code is released here. |
| May 2025 | Project with Apple MLR on understanding input selectivity in Mamba accepted to ICML 2025! |
| Mar 2025 | Finished thesis proposal, I am a PhD candidate now! |
(* denotes equal contribution)
Celo2: Towards Learned Optimization Free Lunch
Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky
Under review
PyLO: Towards Accessible Learned Optimizers in PyTorch
Paul Janson, Benjamin Therien, Quentin Anthony, Xiaolong Huang, Abhinav Moudgil, Eugene Belilovsky
Under review
Celo: Training Versatile Learned Optimizers on a Compute Diet
Abhinav Moudgil, Boris Knyazev, Guillaume Lajoie, Eugene Belilovsky
TMLR 2025 (ICLR 2026, Journal-to-Conference Track)
Accelerating Training with Neuron Interaction and Nowcasting Networks
Boris Knyazev, Abhinav Moudgil, Guillaume Lajoie, Eugene Belilovsky, Simon Lacoste-Julien
ICLR 2025
Understanding Input Selectivity in Mamba: Impact on Approximation Power, Memorization, and Associative Recall Capacity
Ningyuan Teresa Huang, Miguel Sarabia, Abhinav Moudgil, Pau Rodriguez, Luca Zappella, Federico Danieli
ICML 2025
Meta-learning Optimizers for Communication-Efficient Learning
Charles-Étienne Joseph*, Benjamin Thérien*, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky
TMLR 2025
Learning to Optimize with Recurrent Hierarchical Transformers
Abhinav Moudgil, Boris Knyazev, Guillaume Lajoie, Eugene Belilovsky
Frontiers4LCD Workshop, ICML 2023
Towards Scaling Difference Target Propagation by Learning Backprop Targets
Maxence Ernoult, Fabrice Normandin*, Abhinav Moudgil*, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio
ICML 2022
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra
NeurIPS 2021
Contrast and Classify: Alternate Training for Robust VQA
Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal
ICCV 2021, NeurIPS Self-Supervised Learning Workshop 2020
Exploring 3Rs of Long-term Tracking: Re-detection, Recovery and Reliability
Shyamgopal Karthik, Abhinav Moudgil, Vineet Gandhi
WACV 2020
Long-Term Visual Object Tracking Benchmark
Abhinav Moudgil, Vineet Gandhi
ACCV 2018 (Oral Presentation)
Python RQ watchdog to automatically monitor and evaluate ML model checkpoints offline during training.
Implements custom distributed scheme for our DTP algorithm (ICML 2022) in PyTorch, parallelizing feedback weight training across GPUs.
Fast PyTorch implementation of visual tracker GOTURN (Held et al., ECCV 2016) which tracks an input object in a video at 100FPS with a deep siamese convolutional network.
MATLAB implementation of MOSSE tracker (Bolme et al., CVPR 2010), which forms the basis for all the correlation filter-based object tracking algorithms.
Python implementation which reproduces results of the paper “A computational model of linguistic humor in puns” (Kao et al., CogSci 2015). It employs a probabilistic model to compute funniness rating for a given sentence.
Collection of Python scripts for building Short Jokes dataset containing 231,657 jokes scraped from various websites like Reddit, Twitter etc.
Implementation of various algorithms like Deep Q-learning, Policy Gradient, Simulated Annealing and Hill Climbing in Tensorflow / PyTorch; tested on OpenAI Gym environments.