I am a final-year PhD candidate at Mila and Concordia University advised by Eugene Belilovsky. My research is supported by FRQNT and Frederick Lowy Scholars fellowships.

My PhD research focuses on learnable optimization algorithms that can improve by leveraging data and compute. My recent work, Celo2, shows that learned update rules can scale to practical billion-scale tasks like GPT-3 pretraining which are six orders of magnitude larger than the meta-training distribution. This unlocks a scalable path towards improving optimization algorithms and I'm excited to push them beyond academic research to evolve the training pipelines of modern AI systems!

I have been fortunate to work with wonderful folks in diverse areas of machine learning. Most recently, I worked at Meta Superintelligence Labs (FAIR) with Yoram Bachrach and Jakob Foerster on AI research agents. At Apple MLR, I worked with Federico Danieli on state-space models (Mamba) as an intern. Before starting my PhD, I spent 1.5 years as a Visiting Scholar with Devi Parikh and Dhruv Batra at Georgia Tech, where I worked on multi-modal embodied agents.

Meta
Fall 2025

Apple MLR
Summer 2024

Mila
2021 - Present

Georgia Tech
2020 - 2021

Stanford
Fall 2018

UC San Diego
Summer 2018

GSOC, CERN
Summer 2016

IIIT Hyderabad
2013 - 2019

Recent News

Oct 2025	Our work, Celo, has been awarded a J2C Certification (Top 10% in TMLR)!
Sep 2025	Thanks to Google TRC program for the generous TPU support for my PhD research!
Aug 2025	Started at Meta Superintelligence Labs (FAIR) in London, working on the AI Scientist!
Jun 2025	Released assayer for automatic ML model checkpoint monitoring and evaluation.
Jun 2025	Celo accepted to TMLR! JAX code is released here.
May 2025	Project with Apple MLR on understanding input selectivity in Mamba accepted to ICML 2025!
Mar 2025	Finished thesis proposal, I am a PhD candidate now!

Publications
(* denotes equal contribution)

Celo2: Towards Learned Optimization Free Lunch

Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

Under review

paper

PyLO: Towards Accessible Learned Optimizers in PyTorch

Paul Janson, Benjamin Therien, Quentin Anthony, Xiaolong Huang, Abhinav Moudgil, Eugene Belilovsky

Under review

paper code

Celo: Training Versatile Learned Optimizers on a Compute Diet

Abhinav Moudgil, Boris Knyazev, Guillaume Lajoie, Eugene Belilovsky

TMLR 2025 (ICLR 2026, Journal-to-Conference Track)

paper code

Accelerating Training with Neuron Interaction and Nowcasting Networks

Boris Knyazev, Abhinav Moudgil, Guillaume Lajoie, Eugene Belilovsky, Simon Lacoste-Julien

ICLR 2025

paper code

Understanding Input Selectivity in Mamba: Impact on Approximation Power, Memorization, and Associative Recall Capacity

Ningyuan Teresa Huang, Miguel Sarabia, Abhinav Moudgil, Pau Rodriguez, Luca Zappella, Federico Danieli

ICML 2025

paper talk

Meta-learning Optimizers for Communication-Efficient Learning

Charles-Étienne Joseph*, Benjamin Thérien*, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

TMLR 2025

paper code

Learning to Optimize with Recurrent Hierarchical Transformers

Abhinav Moudgil, Boris Knyazev, Guillaume Lajoie, Eugene Belilovsky

Frontiers4LCD Workshop, ICML 2023

paper code

Towards Scaling Difference Target Propagation by Learning Backprop Targets

Maxence Ernoult, Fabrice Normandin*, Abhinav Moudgil*, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio

ICML 2022

paper code

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra

NeurIPS 2021

paper slides

Contrast and Classify: Alternate Training for Robust VQA

Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal

ICCV 2021, NeurIPS Self-Supervised Learning Workshop 2020

paper website code

Exploring 3Rs of Long-term Tracking: Re-detection, Recovery and Reliability

Shyamgopal Karthik, Abhinav Moudgil, Vineet Gandhi

WACV 2020

paper talk

Long-Term Visual Object Tracking Benchmark

Abhinav Moudgil, Vineet Gandhi

ACCV 2018 (Oral Presentation)

paper website slides

Side Projects

assayer

Python RQ watchdog to automatically monitor and evaluate ML model checkpoints offline during training.

distributed-dtp

Implements custom distributed scheme for our DTP algorithm (ICML 2022) in PyTorch, parallelizing feedback weight training across GPUs.

pygoturn

Fast PyTorch implementation of visual tracker GOTURN (Held et al., ECCV 2016) which tracks an input object in a video at 100FPS with a deep siamese convolutional network.

mosse-tracker

MATLAB implementation of MOSSE tracker (Bolme et al., CVPR 2010), which forms the basis for all the correlation filter-based object tracking algorithms.

pun-model

Python implementation which reproduces results of the paper “A computational model of linguistic humor in puns” (Kao et al., CogSci 2015). It employs a probabilistic model to compute funniness rating for a given sentence.

short-jokes-dataset

Collection of Python scripts for building Short Jokes dataset containing 231,657 jokes scraped from various websites like Reddit, Twitter etc.

ai-bots

Implementation of various algorithms like Deep Q-learning, Policy Gradient, Simulated Annealing and Hill Climbing in Tensorflow / PyTorch; tested on OpenAI Gym environments.

Abhinav Moudgil

PhD Student, Mila.

Recent News

Publications (* denotes equal contribution)

Side Projects

Publications
(* denotes equal contribution)