Welcome!
I am an first-year CS Ph.D. student at University of California San Diego (UCSD), advised by Prof. Lianhui Qin. During my undergraduate studies, I was also a Visiting Researcher at the Berkeley NLP Group, working closely with Prof. Alane Suhr and Zineng Tang.
My research focuses on natural language processing, machine learning, and computer vision. Currently, I am working on building intelligent agents (e.g., embodied agents and coding agents), as well as developing realistic world simulations (e.g., SimWorld) for agent training. In the long term, I aim to leverage increasingly realistic simulated worlds to systematically study the capability boundaries of current models in complex environments, and to explore how such environments can facilitate the learning and generalization of intelligent agents.
You can find my CV here. I am always open to any form of collaboration. If you have any ideas for potential collaboration, or just feel like having a casual chat, please feel free to reach out!
🔥 News
- 2025.04: Thrilled to join UCSD as a CS Ph.D. student. Looking forward to starting this new journey!🌴🌊☀️
- 2025.02: Our work on evaluating VLMs on photorealistic color illusion scenes has been accepted to CVPR 2025.
- 2024.09: Our work on multi-perspective communication has been accepted by EMNLP main 2024.
- 2024.09: Our work on multimodal instruction-tuning for biomedicine has been accepted to NeurIPS D&B 2024!
📝 Publications

SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds
Jiawei Ren, Yan Zhuang, Xiaokang Ye*, Lingjun Mao, Xuhong He, Jianzhi Shen, …, Tianmin Shu†, Zhiting Hu†, Lianhui Qin†
Technical Report
- We propose SimWorld Simulator, featuring three key designs: (1) realistic, open-ended world simulation, (2) rich interface for LLM/VLM agents, and (3) diverse physical and social reasoning scenarios

Evaluating Model Perception of Color Illusions in Photorealistic Scenes
Lingjun Mao, Zineng Tang, Alane Suhr
CVPR 2025
- We propose an automated framework for generating realistic color illusion images, build a large-scale dataset (RCID), and systematically investigate the underlying mechanisms by which VLMs are misled by color illusions.

Grounding Language in Multi-Perspective Referential Communication
Zineng Tang, Lingjun Mao, Alane Suhr
EMNLP main 2024
- We introduce a task and dataset for referring expression generation and comprehension in multi-agent embodied environments.

Biomedical Visual Instruction Tuning with Clinician Preference Alignment
Hejie Cui*, Lingjun Mao*, Xin Liang, Jieyu Zhang, Hui Ren, Quanzheng Li, Xiang Li, Carl Yang
NeurIPS 2024
- we propose a data-centric framework (Biomed-VITAl) that incorporates clinician preferences into both stages of generating and selecting instruction data for tuning biomedical multimodal foundation models.
📖 Educations
- 2024.09 - 2024.10, Visiting Student in University of California, Berkeley, USA
- 2020.09 - 2025.7, Software Engineering (GPA: 4.0/4.0), Tongji University, Shanghai, China
2025@Lingjun Mao