Xiaotian Han

Redmond, WA 98052

I’m currently a Researcher at OpenAI, focus on Multimodal. I was a Senior Research Scientist at ByteDance Seed. I was a senior applied scientist at Microsoft Azure AI Computer Vision Team. Before joining Microsoft, I received my M.S. degree from Duke University. I received my B.S. degree from University of Science and Technology of China (USTC).

My research experiences in various areas are more or less related to computer vision, multi-modal, reinforcement learning and deep learning.

news

Nov 3, 2024	🎉 Exciting News! 🎉 Two papers DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation and Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model have been accepted by 2024 Conference on Neural Information Processing Systems (NeurIPS 2024). One paper InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning has been accepted by 4th MATH-AI Workshop at NeurIPS 24.
Jul 30, 2024	🎉 Exciting News! 🎉 We’ve been focusing on enhancing the capabilities of multimodal language models in math, coding, and STEM. We’ve summarized some of the latest research papers and are thrilled to share with the community. GitHub repo: Awesome-Multimodal-LLM-for-Math-STEM.
Mar 12, 2024	Our paper COCO is “ALL’’ You Need for Visual Instruction Fine-tuning has been accepted by 2024 IEEE International Conference on Multimedia and Expo (ICME 2024).

latest posts

Sep 19, 2024	Math Reasoning for Multimodal Large Language Models
Jan 18, 2024	Multimodal Large Language Models Sharing Series -- 1
Dec 1, 2023	InfiMM-Eval Benchmark Released

selected publications

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Ai Yuang, Zhou Xiaoqiang, Huang Huaibo, and 4 more authors

2024

Bib HTML

@misc{ai2024dreamclear,
  title = {DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation},
  author = {Yuang, Ai and Xiaoqiang, Zhou and Huaibo, Huang and Xiaotian, Han and Zhengyu, Chen and Quanzeng, You and Hongxia, Yang},
  year = {2024},
  journal = {Advances in Neural Information Processing Systems},
  url = {https://arxiv.org/abs/2410.18666},
}

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Xiaotian Han, Yiren Jian, Xuefeng Hu, and 8 more authors

2024

Bib HTML

@misc{han2024infimmwebmath40badvancingmultimodalpretraining,
  title = {InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning},
  author = {Han, Xiaotian and Jian, Yiren and Hu, Xuefeng and Liu, Haogeng and Wang, Yiqi and Fan, Qihang and Ai, Yuang and Huang, Huaibo and He, Ran and Yang, Zhenheng and You, Quanzeng},
  year = {2024},
  eprint = {2409.12568},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
  url = {https://arxiv.org/abs/2409.12568},
}

InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model

Haogeng Liu, Quanzeng You, Xiaotian Han, and 9 more authors

In Findings of the Association for Computational Linguistics ACL 2024, 2024

Bib HTML

@inproceedings{liu-etal-2024-infimm,
  author = {Liu, Haogeng and You, Quanzeng and Han, Xiaotian and Wang, Yiqi and Zhai, Bohan and Liu, Yongfei and Chen, Wentao and Jian, Yiren and Tao, Yunzhe and Yuan, Jianbo and He, Ran and Yang, Hongxia},
  title = {InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model},
  booktitle = {Findings of the Association for Computational Linguistics ACL 2024},
  year = {2024},
  pages = {485--492},
  address = {Bangkok, Thailand and virtual meeting},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.findings-acl.27},
}

ViTAR: Vision Transformer with Any Resolution

Qihang Fan, Quanzeng You, Xiaotian Han, and 5 more authors

2024

Bib HTML

@misc{fan2024vitar,
  title = {ViTAR: Vision Transformer with Any Resolution},
  author = {Fan, Qihang and You, Quanzeng and Han, Xiaotian and Liu, Yongfei and Tao, Yunzhe and Huang, Huaibo and He, Ran and Yang, Hongxia},
  year = {2024},
  eprint = {2403.18361},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
  url = {https://arxiv.org/abs/2403.18361},
}

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

Haogeng Liu, Quanzeng You, Xiaotian Han, and 7 more authors

2024

Bib HTML

@misc{liu2024infimmhd,
  title = {InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding},
  author = {Liu, Haogeng and You, Quanzeng and Han, Xiaotian and Wang, Yiqi and Zhai, Bohan and Liu, Yongfei and Tao, Yunzhe and Huang, Huaibo and He, Ran and Yang, Hongxia},
  year = {2024},
  eprint = {2403.01487},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
  url = {https://arxiv.org/abs/2403.01487},
}

COCO is "ALL” You Need for Visual Instruction Fine-tuning

Xiaotian Han, Yiqi Wang, Bohan Zhai, and 2 more authors

2024

Bib HTML

@misc{han2024coco,
  title = {COCO is "ALL'' You Need for Visual Instruction Fine-tuning},
  author = {Han, Xiaotian and Wang, Yiqi and Zhai, Bohan and You, Quanzeng and Yang, Hongxia},
  year = {2024},
  eprint = {2401.08968},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
  url = {https://arxiv.org/abs/2401.08968},
}

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Yiqi Wang, Wentao Chen, Xiaotian Han, and 7 more authors

2024

Bib HTML

@misc{wang2024exploring,
  title = {Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning},
  author = {Wang, Yiqi and Chen, Wentao and Han, Xiaotian and Lin, Xudong and Zhao, Haiteng and Liu, Yongfei and Zhai, Bohan and Yuan, Jianbo and You, Quanzeng and Yang, Hongxia},
  year = {2024},
  eprint = {2401.06805},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
  url = {https://arxiv.org/abs/2401.06805},
}

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Xiaotian Han, Quanzeng You, Yongfei Liu, and 9 more authors

2023

Bib HTML

@misc{han2023infimmeval,
  title = {InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models},
  author = {Han, Xiaotian and You, Quanzeng and Liu, Yongfei and Chen, Wentao and Zheng, Huangjie and Mrini, Khalil and Lin, Xudong and Wang, Yiqi and Zhai, Bohan and Yuan, Jianbo and Wang, Heng and Yang, Hongxia},
  year = {2023},
  eprint = {2311.11567},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
  url = {https://infimm.github.io/InfiMM-Eval/},
}

MMPTRACK: Large-Scale Densely Annotated Multi-Camera Multiple People Tracking Benchmark

Xiaotian Han, Quanzeng You, Chunyu Wang, and 5 more authors

In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023

Bib HTML

@inproceedings{Han_2023_WACV,
  author = {Han, Xiaotian and You, Quanzeng and Wang, Chunyu and Zhang, Zhizheng and Chu, Peng and Hu, Houdong and Wang, Jiang and Liu, Zicheng},
  title = {MMPTRACK: Large-Scale Densely Annotated Multi-Camera Multiple People Tracking Benchmark},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year = {2023},
  pages = {4860-4869},
  url = {https://arxiv.org/abs/2111.15157},
}

Image Scene Graph Generation (SGG) Benchmark

Xiaotian Han, Jianwei Yang, Houdong Hu, and 3 more authors

2021

Bib HTML

@misc{han2021image,
  title = {Image Scene Graph Generation (SGG) Benchmark},
  author = {Han, Xiaotian and Yang, Jianwei and Hu, Houdong and Zhang, Lei and Gao, Jianfeng and Zhang, Pengchuan},
  year = {2021},
  eprint = {2107.12604},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
  url = {https://arxiv.org/abs/2107.12604},
}