Xiaotian Han

selfie.jpg

Redmond, WA 98052

I’m currently a Researcher at OpenAI, focus on Multimodal. I was a Senior Research Scientist at ByteDance Seed. I was a senior applied scientist at Microsoft Azure AI Computer Vision Team. Before joining Microsoft, I received my M.S. degree from Duke University. I received my B.S. degree from University of Science and Technology of China (USTC).

My research experiences in various areas are more or less related to computer vision, multi-modal, reinforcement learning and deep learning.

news

Nov 3, 2024 🎉 Exciting News! 🎉 Two papers DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation and Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model have been accepted by 2024 Conference on Neural Information Processing Systems (NeurIPS 2024). One paper InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning has been accepted by 4th MATH-AI Workshop at NeurIPS 24.
Jul 30, 2024 🎉 Exciting News! 🎉 We’ve been focusing on enhancing the capabilities of multimodal language models in math, coding, and STEM. We’ve summarized some of the latest research papers and are thrilled to share with the community. GitHub repo: Awesome-Multimodal-LLM-for-Math-STEM.
Mar 12, 2024 Our paper COCO is “ALL’’ You Need for Visual Instruction Fine-tuning has been accepted by 2024 IEEE International Conference on Multimedia and Expo (ICME 2024).

latest posts

selected publications

  1. DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
    Ai Yuang, Zhou Xiaoqiang, Huang Huaibo, and 4 more authors
    2024
  2. InfiMM-WebMath-40B.png
    InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
    Xiaotian Han, Yiren Jian, Xuefeng Hu, and 8 more authors
    2024
  3. InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model
    Haogeng Liu, Quanzeng You, Xiaotian Han, and 9 more authors
    In Findings of the Association for Computational Linguistics ACL 2024, 2024
  4. ViTAR: Vision Transformer with Any Resolution
    Qihang Fan, Quanzeng You, Xiaotian Han, and 5 more authors
    2024
  5. InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
    Haogeng Liu, Quanzeng You, Xiaotian Han, and 7 more authors
    2024
  6. coco_all_you_need.png
    COCO is "ALL” You Need for Visual Instruction Fine-tuning
    Xiaotian Han, Yiqi Wang, Bohan Zhai, and 2 more authors
    2024
  7. MLLM_reasoning_tree.png
    Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
    Yiqi Wang, Wentao Chen, Xiaotian Han, and 7 more authors
    2024
  8. InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
    Xiaotian Han, Quanzeng You, Yongfei Liu, and 9 more authors
    2023
  9. MMPTRACK: Large-Scale Densely Annotated Multi-Camera Multiple People Tracking Benchmark
    Xiaotian Han, Quanzeng You, Chunyu Wang, and 5 more authors
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023
  10. Image Scene Graph Generation (SGG) Benchmark
    Xiaotian Han, Jianwei Yang, Houdong Hu, and 3 more authors
    2021