Lijun Wu is a Researcher in Shanghai AI Laboratory. Previously, he was a Research Scientist in ByteDance, a Senior Researcher in Microsoft Research. He got the Ph.D. degree from Sun Yat-sen University (SYSU), and was a member of joint Ph.D. program between SYSU and MSRA, advised by Dr. Tie-Yan Liu and Prof. Jianhuang Lai.
His research interests are on AI/LLMs (e.g., data-centric intelligence, SFT/RL), AI4Science (e.g., LLM4Science, scientific reasoning). His research works are published in top conferences and journals, such as Nature Communications, Nature Machine Intelligence, TPAMI, NeurIPS, ICML, ICLR, ACL, KDD and so on, with more than 8500+ citations. He has served as AC/SPC in top conferences, e.g., ICLR, NeurIPS, ACL, EMNLP, NAACL, AAAI, IJCAI and so on.
He has received numerous prestigious awards, including the 2018 MSRA Ph.D. Fellowship. He secured 8 championships in the WMT2019 Competition. He led his team to develop the BioT5 series of multimodal biomolecular models, winning 1st and 2nd place in the ACL 2024 Language+Molecule Shared Task. In 2025, he guided students to secure 2nd place in the 2025 NeurIPS CURE-Bench Internal Reasoning Competition. Many of his research innovations have been successfully translated into practical products. Notably, his R-Drop algorithm was deployed in Microsoft Translator across over 20 translation tasks and is widely used in business scenarios at companies like Meituan. His CT4Rec model was applied to Tencent News recommendation products. Furthermore, he participated in the development of the worldβs first Chinese-English translation system to achieve human parity in 2018.
We are hiring AI researchers working on LLM/MLLM and AI4Science, contact me if you are interested!
π₯ News
2025.9π Caco is accepted by NeurIPS-2025, which aims to scaling the reasoning data by code-assisted verfications.2025.8π 3 papers are accepted by EMNLP-2025,topics cover math reasoning and advanced data synthesis. Check CFT, MetaLadder, Middo.2025.8Invited to serve as Area Chair for ICLR-2026.2025.7ΞΌFormer is accepted by Nature Machine Intelligence!2025.7Invited to serve as Area Chair for NeurIPS-2025 workshop AI4Science and SEA.2025.7Invited to serve as Area Chair for AAAI-2026.2025.6CovDocker is accepted by KDD-2025.2025.56 papers are accepted by ACL-2025, topics cover math reasoning, data synthesis and LLM benchmarks. Check Mathfusion, GRA, Lemma, CipherBank.2025.3Invited to serve as Area Chair for NeurIPS-2025.2025.3NatureLM, a large scientific foundation model, is released.
π» Open-source Projects
- OpenDataArena
, a fair, open, and transparent Arena for data value benchmarking.
- InternVL
, a series of leading VLM models developed by Shanghai AI Laboratory.
π Surveys/Repos
- π₯
2024.3We have updated the comprehensive survey about Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey. Check it! - π₯
2023.11We have released a comprehensive report on Large Language Models (GPT-4) on Scienctific Discovery. Check it! - π₯
2022.4We have released a comprehensive survey about Non-Autoregressive Generation for Neural Machine Translation and Beyond. Check it! - π₯Awesome-LLM-Ready-Datasets
- π₯Awesome-Biomolecule-Language-Cross-Modeling
- π₯Awesome-Bio-Foundation-Models
- π₯Awesome-Docking
π Selected Publications
βοΈ LLM/MLLMs
NeurIPS 2025: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning, Honglin Lin, Qizhi Pei, Xin Gao, Zhuoshi Pan, Yu Li, Juntao Li, Conghui He, Lijun WuArxiv 2025: OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value, Mengzhang Cai, Xin Gao, Yu Li, Honglin Lin, Zheng Liu, Zhuoshi Pan, Qizhi Pei, Xiaoran Shang, Mengyuan Sun, Zinan Tang, Xiaoyang Wang, Zhanping Zhong, Yun Zhu, Dahua Lin, Conghui He, Lijun Wu | Project Page |(technical report for OpenDataArena π)
π¬ AI4Science
-
EMNLP 2023: BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations, Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, Rui Yan, ||
(>20W downloads π)
-
ACL 2024: BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning, Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui Yan ||
(win 1st/2nd for ACL24 workshop share tasks π)
-
NeurIPS 2023: FABind: Fast and Accurate Protein-Ligand Binding, Qizhi Pei (co-first author), Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Kun He, Tie-Yan Liu, Rui Yan | Project Page ||
β¨οΈ AI
-
FL@FM NeurIPS 2024: Hot Pluggable Federated Learning, Lei Shen, Zhenheng Tang, Lijun Wu, Yonggang Zhang, Xiaowen Chu, Tao Qin, Bo Han (Outstanding Student Paper Award, Oral π) -
NeurIPS 2020: R-Drop: Regularized Dropout for Neural Networks, Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu |(R-Drop has been shaped into Microsoft Translator for 20+ language translations! π)
-
EMNLP 2019: Exploiting Monolingual Data at Scale for Neural Machine Translation, Lijun Wu, Yiren Wang, Yingce Xia, Tao Qin, Jianhuang Lai, and Tie-Yan Liu (Help won the WMT-19 champion! π).
π Honors and Awards
- 2nd place in Internal Reasoning Track of CURE-Bench@NeurIPS2025, 2025
- 1st place in Text2Molecule and 2nd place in Molecue2Tedt on Language+Molecule@ACL2024 shared task, 2024
- Runner up of OGB-LSC @ KDD cup, 2021, Solution
- Outstanding Graduate Awards of SYSU, 2020
- Outstanding Reviewer of EMNLP, 2019
- 1st Place of WMT 2019 in 5 translation directions: En->De, De->En, De->Fr, Fr->De and Ru->En, 2019
- Microsoft Research Asia Ph.D. Fellowship, 2018
- Graduate Student National Scholarship, 2018
- Stars of Tomorrow Internship Award of Microsoft Research Asia, 2018
- Outstanding Undergraduate Awards of SYSU, 2015
- 1st Place of Global IBM/IEEE Smarter Planet Challenge, 2013
- Undergraduate Student National Scholarship, 2012, 2013
- First Class Scholarship of SYSU, 2012, 2013, 2014
π Experience
- 2025.08-Now, Young Scientist, Shanghai Artificial Intelligence Laboratory
- 2024.05-2024.08, Research Scientist, ByteDance,
- 2022.07-2024.05, Senior Researcher, MSR AI4Science
- 2020.6-2022.07, Senoir Researcher, MSRA
- 2014.07-2020.06, Research Intern, MSRA
π¬ Academic Services
- AC: ICLR-26, NeurIPS-25, ACL-21/22/23/24/25, EMNLP-23/24/25, NNACL-22/23/24/25, EACL-24, COLING-23, ARR-21/22/23/24/25
- SPC: AAAI-22/23/24/25/26, IJCAI-21
- Conference reviewers: ICLR, ICML, NeurIPS, AAAI, IJCAI, ACL, CVPR, EMNLP, KDD, NAACL, COLING, EACL, AACL
- Journal reviewers: TPAMI, TASLP, KBS, Neurocomputing, CSL