Jack (Hao) Bai

haob2 AT illinois DOT edu

professional.jpeg

Hi there! I’m Jack. I’m a third-year Ph.D. student at UIUC CS, advised by Prof. Tong Zhang. I work closely with Prof. Aviral Kumar @ CMU MLD. I also spend some time at Microsoft Research (AIF Lab).

Recently, I focus my research on scaling the reasoning & planning capability of intelligent agents with foundation models and reinforcement leanring (RL). I am identified as an empirical RL person but still try to make methods principled.

I was previously a visiting scholar advised by Sergey Levine @ BAIR. I received my dual undergrad degree from UIUC and Zhejiang University. During those wonderful years, I was lucky enough to have worked with great minds like Yi Ma @ BAIR and Chengxiang Zhai @ UIUC.

In my free time, I practice guitar and produce J-pop/J-rock. Check out my portfolio.

A public up-to-date resume can be found here.

Logo

News

Jan 09, 2026 Today, we proudly announce the release of WebGym, the largest yet open-source RL training environment for visual web agents. The preprint can be accessed at ArXiv. We proposed (1) the RL framework with highest rollout speed, (2) recipe that supports training agents on long-horizon tasks, and (3) scaling dimensions that effectively improves the RL performance with the task set proposed.
Jun 11, 2025 My first paper on web agents with RL, TTI is released! Check out the preprint! I am super proud of this work and believe it will lead to a shift of paradigm in multi-step agent reasoning with RL+VLM.
Jan 23, 2025 My second paper on building device control agents with RL, Digi-Q, has been accepted to ICLR 2025! Check out the preprint! This work was done when I visited BAIR, advised by Sergey Levine and Aviral Kumar.

Latest Posts

Selected Publications

  1. Preprint
    Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
    Hao Bai , Junhong Shen, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, and Aviral Kumar
    May 2025
  2. ICLR 2025
    Digi-Q: Transforming VLMs to Device-Control Agents via Value-Based Offline RL
    Hao Bai , Yifei Zhou, Erran Li, Sergey Levine, and Aviral Kumar
    Jan 2025
  3. Oral @ CPAL 2025
    Improving Neuron-level Interpretability with White-box Language Models
    Hao Bai , and Yi Ma
    Oct 2024
  4. NeurIPS 2024 Oral @ ICML WS
    DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
    Hao Bai , Yifei Zhou, Jiayi Pan, Mert Cemri, Alane Suhr, Sergey Levine, and Aviral Kumar
    Jun 2024
  5. NeurIPS 2024
    Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
    Yuexiang Zhai,  Hao Bai , Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, and Sergey Levine
    May 2024
  6. JMLR
    White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?
    Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong,  Hao Bai , Yuexiang Zhai, Benjamin D Haeffele, and Yi Ma
    Apr 2024
  7. EMNLP’23
    Social Commonsense-Guided Search Query Generation for Open-Domain Knowledge-Powered Conversations
    Revanth Reddy,  Hao Bai , Wentao Yao, Sharath Chandra Etagi Suresh, Heng Ji, and ChengXiang Zhai
    Oct 2023