Sarthak Kumar Maharana

Sarthak Kumar Maharana

Email: [email protected]
CV & Google Scholar & GitHub & LinkedIn

I'm a "third-year" CS PhD candidate at the University of Texas at Dallas (UTD), advised by Dr. Yunhui Guo. I'm also a part of the Data Efficient Intelligent Learning Lab. Before this, I obtained a Master of Science in Electrical Engineering from the University of Southern California (USC) and a Bachelor's degree from IIIT Bhubaneswar (IIIT-Bh), India.

My research primarily lies in computer vision, where I study continual learning i.e., how modern models can accumulate knowledge over time at test-time, while remaining robust and adaptable in open and dynamic environments. This is increasingly critical in today’s AI landscape, where deployed systems need to be more sample and compute efficient.

During my Masters, I closely worked with Dr. Yonggang Shi. Previously, I had also worked with Dr. Shri Narayanan. As an undergraduate, I was fortunate enough to work with Dr. Ren Hongliang (NUS), Dr. Prasanta Kumar Ghosh (IISc), and Dr. Aurobinda Routray (IIT-Kharagpur).

I have published at top-tier ML/computer vision/signal processing conferences such as ICCV, NeurIPS(3x), AAAI, ECCV, and ICASSP(2x).

I'm happy to chat and discuss potential collaborations. Feel free to contact me.

News

Papers (Preprints included)

AVROBUSTBENCH
AVROBUSTBENCH: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
In NeurIPS (Datasets and Benchmarks), 2025

A comprehensive benchmark designed to evaluate the test-time robustness of audio-visual models. We hope this will drive future research on robust, adaptable audio-visual systems in real-world settings.

SELECT
SELECT: A Submodular Approach for Active LiDAR Semantic Segmentation
Ruiyu Mao, Sarthak Kumar Maharana, Xulong Tang, Yunhui Guo
Under Review

A voxel-centric submodular approach tailored for active LiDAR semantic segmentation.

BATCLIP
BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky, Rogerio Schmidt Feris, Yunhui Guo
In ICCV, 2025

Bimodal online test-time adaptation method to improve CLIP's robustness to common corruptions. Also extends to domain generalization settings.

PALM
PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation
Sarthak Kumar Maharana, Baoming Zhang, Yunhui Guo
In AAAI, 2025 (Oral)

Adaptive learning rate continual test-time adaptation method based on model prediction uncertainty and parameter sensitivity to rapid distributional shifts.

VDU
Variational Diffusion Unlearning: A Variational Inference Framework for Unlearning in Diffusion Models
Subhodip Panda, MS Varun, Shreyans Jain, Sarthak Kumar Maharana, Prathosh AP
In NeurIPS Safe Generative AI Workshop, 2024

Machine unlearning of user-specific classes/concepts in pre-trained diffusion models (DDPMs).

STONE
STONE: A Submodular Optimization Framework for Active 3D Object Detection
Ruiyu Mao, Sarthak Kumar Maharana, Rishabh K Iyer, Yunhui Guo
In NeurIPS, 2024

A submodular optimization scheme to handle data imbalance and label distributional coverage for active 3D object detection.

MAT
Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data
Yuxuan Li, Sarthak Kumar Maharana, Yunhui Guo
In ECCV, 2024

Novel watermarking technique based on multi-view data for defending against model extraction attacks.

SASB
Acoustic-to-Articulatory Inversion for Dysarthric Speech: Are Pre-Trained Self-Supervised Representations Favorable?
In ICASSP 2024 Workshop on Self-supervision in Audio, Speech, and Beyond (SASB), 2024

Effectiveness of pre-trained self-supervised learning representations for acoustic-to-articulatory inversion of dysarthric speech.

ICASSP
Acoustic-to-Articulatory Inversion for Dysarthric Speech by Using Cross-Corpus Acoustic-Articulatory Data
Sarthak Kumar Maharana, Aravind Illa, Renuka Mannem, Yamini Bellur, Veeramani Preethish Kumar, Seena Vengalil, Kiran Polavarapu, Nalini Atchayaram, Prasanta Kumar Ghosh
In ICASSP, 2021

Joint and multi-corpus training for acoustic-to-articulatory inversion of dysarthric speech, using x-vectors, at low-resource data conditions.

Academic/Volunteer Work

Miscellaneous


Source code by Jon Barron, with a few added elements.