


default search action
Zhe Chen 0017
Person information
- affiliation: Nanjing University, OpenGVLab, Shanghai AI Laboratory, China
Other persons with the same name
- Zhe Chen — disambiguation page
- Zhe Chen 0001
(aka: Zhe S. Chen 0001, Zhe (Sage) Chen, Zhe Sage Chen) — New York University School of Medicine, Department of Psychiatry / Neuroscience and Physiology, NY, USA (and 1 more) - Zhe Chen 0002
— Technical University of Munich, Institute for Electrical Drive Systems and Power Electronics, Germany - Zhe Chen 0003
— Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences / Picower Institute for Learning and Memory, Cambridge, MA, USA - Zhe Chen 0004
— Hohai University, College of Computer and Information, Nanjing, China (and 1 more) - Zhe Chen 0005
— Dalian University of Technology, Faculty of Electronic Information and Electrical Engineering, School of Information and Communication Engineering, China - Zhe Chen 0006
— Huawei Technologies, Beijing, China (and 1 more) - Zhe Chen 0007
— Aalborg University, Institute of Energy Technology, Denmark - Zhe Chen 0008
— Tsinghua University, State Key Laboratory of Automotive Safety and Energy, Beijing, China - Zhe Chen 0009 — University of Edinburgh, Institute for Digital Communications School of Engineering, Li-Fi R&D Centre, UK
- Zhe Chen 0010
— Chinese University of Hong Kong, Department of Electronic Engineering, Hong Kong - Zhe Chen 0011
— Nanjing University of Aeronautics and Astronautics, China (and 1 more) - Zhe Chen 0012 — Yale University, Department of Diagnostic Radiology, New Haven, CT, USA
- Zhe Chen 0013
— University of Sydney, School of Computer ScienceNSW, Australia (and 1 more) - Zhe Chen 0014 — University of Michigan, USA
- Zhe Chen 0015
— China-Singapore International Joint Research Institute, Nanyang Technological University, Singapore (and 1 more) - Zhe Chen 0016
— Monash University, VIC, Australia - Zhe Chen 0018
— Jiangnan University, School of Artificial Intelligence and Computer Science, Wuxi, China - Zhe Chen 0019
— Sichuan University, Business School, Chengdu, China (and 2 more) - Zhe Chen 0020
— Chengdu University of Technology, College of Mathematics and Physics, Digital Hu Line Research Institute, China (and 2 more) - Zhe Chen 0021
— Southeast University, State Key Laboratory of Millimeter-Waves, School of Information Science and Engineering, China (and 2 more) - Zhe Chen 0022
— Beihang University, School of Economics and Management, Beijing, China (and 1 more) - Zhe Chen 0023
— Innovation Lab, PayPal, Singapore (and 1 more) - Zhe Chen 0024
— Shanghai Jiao Tong University, Cooperative Medianet Innovation Center, China - Zhe Chen 0025
![0000-0003-4715-5651 [0000-0003-4715-5651]](https://dblp.uni-trier.de/img/orcid-mark.12x12.png)
- Zhe Chen 0026
![0000-0003-2286-4988 [0000-0003-2286-4988]](https://dblp.uni-trier.de/img/orcid-mark.12x12.png)
- Zhe Chen 0027
![0000-0002-9992-5251 [0000-0002-9992-5251]](https://dblp.uni-trier.de/img/orcid-mark.12x12.png)
- Zhe Chen 0028
![0000-0003-4410-5261 [0000-0003-4410-5261]](https://dblp.uni-trier.de/img/orcid-mark.12x12.png)
- Zhe Chen 0029
![0009-0005-4935-8921 [0009-0005-4935-8921]](https://dblp.uni-trier.de/img/orcid-mark.12x12.png)
- Zhe Chen 0030
![0000-0002-5371-2058 [0000-0002-5371-2058]](https://dblp.uni-trier.de/img/orcid-mark.12x12.png)
- Zhe Chen 0031
![0000-0002-8635-6109 [0000-0002-8635-6109]](https://dblp.uni-trier.de/img/orcid-mark.12x12.png)
- Zhe Chen 0032
![0000-0002-4413-1606 [0000-0002-4413-1606]](https://dblp.uni-trier.de/img/orcid-mark.12x12.png)
- Zhe Chen 0033
![0000-0003-3539-4049 [0000-0003-3539-4049]](https://dblp.uni-trier.de/img/orcid-mark.12x12.png)
- Zhe Chen 0034
![0000-0002-8661-5107 [0000-0002-8661-5107]](https://dblp.uni-trier.de/img/orcid-mark.12x12.png)
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2026
[j4]Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Jiahao Wang, Zhe Chen, Zhiqi Li, Tong Lu, Limin Wang
:
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding. Int. J. Comput. Vis. 134(1): 20 (2026)
[i41]Chuxue Cao, Jinluan Yang, Haoran Li, Kunhao Pan, Zijian Zhao, Zhe Chen, Yuchen Tian, Lijun Wu, Conghui He, Sirui Han, Yike Guo:
Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification. CoRR abs/2601.22642 (2026)- 2025
[c19]Yuchen Duan, Zhe Chen, Yusong Hu, Weiyun Wang, Shenglong Ye
, Botian Shi, Lewei Lu, Qibin Hou, Tong Lu, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Docopilot: Improving Multimodal Models for Document-Level Understanding. CVPR 2025: 4026-4037
[c18]Chenxin Tao, Shiqian Su, Xizhou Zhu, Chenyu Zhang, Zhe Chen, Jiawen Liu, Wenhai Wang, Lewei Lu, Gao Huang, Yu Qiao, Jifeng Dai:
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding. CVPR 2025: 14559-14569
[c17]Chenyu Yang, Xuan Dong, Xizhou Zhu, Weijie Su, Jiahao Wang, Hao Tian, Zhe Chen, Wenhai Wang, Lewei Lu, Jifeng Dai:
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models. CVPR 2025: 24939-24949
[c16]Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures. ICLR 2025
[c15]Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, et al.:
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text. ICLR 2025
[i40]Weiyun Wang, Zhangwei Gao, Lianjie Chen, Zhe Chen, Jinguo Zhu, Xiangyu Zhao, Yangzhou Liu, Yue Cao, Shenglong Ye, Xizhou Zhu, Lewei Lu, Haodong Duan, Yu Qiao, Jifeng Dai, Wenhai Wang:
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning. CoRR abs/2503.10291 (2025)
[i39]Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu
, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang
, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang, Jiapeng Luo, Yi Wang, Conghui He, Botian Shi, Xingcheng Zhang, Wenqi Shao, Junjun He, Yingtong Xiong, Wenwen Qu, Peng Sun, Penglong Jiao, Han Lv, Lijun Wu, Kaipeng Zhang
, Huipeng Deng, Jiaye Ge, Kai Chen, Limin Wang, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models. CoRR abs/2504.10479 (2025)
[i38]Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu:
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models. CoRR abs/2504.15279 (2025)
[i37]Gen Luo, Ganlin Yang, Ziyang Gong, Guanzhou Chen, Haonan Duan, Erfei Cui, Ronglei Tong, Zhi Hou, Tianyi Zhang
, Zhe Chen, Shenglong Ye, Lewei Lu, Jingbo Wang, Wenhai Wang, Jifeng Dai, Yu Qiao, Rongrong Ji, Xizhou Zhu:
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces. CoRR abs/2506.00123 (2025)
[i36]Nianchen Deng, Lixin Gu, Shenglong Ye, Yinan He, Zhe Chen, Songze Li, Haomin Wang
, Xingguang Wei, Tianshuo Yang, Min Dou, Tong He, Wenqi Shao, Kaipeng Zhang
, Yi Wang, Botian Shi, Yanting Zhang, Jifeng Dai, Yu Qiao, Hongjie Zhang, Wenhai Wang:
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models. CoRR abs/2506.18385 (2025)
[i35]Yuchen Duan, Zhe Chen, Yusong Hu, Weiyun Wang, Shenglong Ye, Botian Shi, Lewei Lu, Qibin Hou, Tong Lu, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Docopilot: Improving Multimodal Models for Document-Level Understanding. CoRR abs/2507.14675 (2025)
[i34]Xuehui Wang, Zhenyu Wu, JingJing Xie, Zichen Ding, Bowen Yang, Zehao Li, Zhaoyang Liu, Qingyun Li, Xuan Dong, Zhe Chen, Weiyun Wang, Xiangyu Zhao, Jixuan Chen, Haodong Duan, Tianbao Xie, Chenyu Yang, Shiqian Su, Yue Yu, Yuan Huang, Yiqian Liu, Xiao Zhang, Yanting Zhang, Xiangyu Yue, Weijie Su, Xizhou Zhu, Wei Shen, Jifeng Dai, Wenhai Wang:
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents. CoRR abs/2507.19478 (2025)
[i33]Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, Zhaokai Wang, Zhe Chen, Hongjie Zhang, Ganlin Yang, Haomin Wang
, Qi Wei, Jinhui Yin, Wenhao Li, Erfei Cui, Guanzhou Chen, Zichen Ding, Changyao Tian, Zhenyu Wu, JingJing Xie, Zehao Li, Bowen Yang, Yuchen Duan, Xuehui Wang, Zhi Hou, Haoran Hao, Tianyi Zhang
, Songze Li, Xiangyu Zhao, Haodong Duan, Nianchen Deng, Bin Fu, Yinan He, Yi Wang, Conghui He, Botian Shi, Junjun He, Yingtong Xiong, Han Lv, Lijun Wu, Wenqi Shao, Kaipeng Zhang
, Huipeng Deng, Biqing Qi, Jiaye Ge, Qipeng Guo, Wenwei Zhang, Songyang Zhang, Maosong Cao, Junyao Lin, Kexian Tang, Jianfei Gao, Haian Huang, Yuzhe Gu, Chengqi Lyu, Huanze Tang, Rui Wang, Haijun Lv, Wanli Ouyang, Limin Wang, Min Dou, Xizhou Zhu, Tong Lu, Dahua Lin, Jifeng Dai, Weijie Su, Bowen Zhou, Kai Chen, Yu Qiao, Wenhai Wang, Gen Luo:
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency. CoRR abs/2508.18265 (2025)
[i32]Yangzhou Liu, Yue Cao, Hao Li, Gen Luo, Zhe Chen, Weiyun Wang, Xiaobo Liang, Biqing Qi, Lijun Wu, Changyao Tian, Yanting Zhang, Yuqiang Li, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang:
Sequential Diffusion Language Models. CoRR abs/2509.24007 (2025)
[i31]MiroMind Team, Song Bai, Lidong Bing, Carson Chen, Guanzheng Chen, Yuntao Chen, Zhe Chen, Ziyi Chen, Jifeng Dai, Xuan Dong, Wenhan Dou, Yue Deng, Yunjie Fu, Junqi Ge, Chenxia Han, Tammy Huang, Zhenhang Huang, Jerry Jiao, Shilei Jiang, Tianyu Jiao, Xiaoqi Jian, Lei Lei, Ruilin Li, Ryan Luo, Tiantong Li, Xiang Lin, Ziyuan Liu, Zhiqi Li, Jie Ni, Qiang Ren, Pax Sun, Shiqian Su, Chenxin Tao, Bin Wang, Hellen Wang, Haonan Wang, James Wang, Jin Wang, Jojo Wang, Letian Wang, Shizun Wang, Weizhi Wang, Zixuan Wang, Jinfan Xu, Sen Xing, Chenyu Yang, Hai Ye, Jiaheng Yu, Yue Yu, Muyan Zhong, Tianchen Zhao, Xizhou Zhu, Yanpeng Zhou, Yifan Zhang, Zhi Zhu:
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling. CoRR abs/2511.11793 (2025)- 2024
[j3]Zhe Chen, Weiyun Wang
, Hao Tian, Shenglong Ye
, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang
, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu
, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites. Sci. China Inf. Sci. 67(12) (2024)
[j2]Yangzhou Liu, Yue Cao, Zhangwei Gao, Weiyun Wang
, Zhe Chen, Wenhai Wang, Hao Tian, Lewei Lu, Xizhou Zhu
, Tong Lu, Yu Qiao, Jifeng Dai:
MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity. Sci. China Inf. Sci. 67(12) (2024)
[j1]Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang
:
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance. Vis. Intell. 2(1): 32 (2024)
[c14]Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu
, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai:
Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. CVPR 2024: 24185-24198
[c13]Weiyun Wang
, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu
, Yu Qiao, Jifeng Dai:
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World. ECCV (33) 2024: 471-490
[c12]Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao:
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World. ICLR 2024
[c11]Yang Yang, Wenhai Wang, Zhe Chen, Jifeng Dai, Liang Zheng:
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments. ICLR 2024
[c10]Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD. NeurIPS 2024
[c9]Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao, Wenhai Wang:
Needle In A Multimodal Haystack. NeurIPS 2024
[c8]Jiannan Wu, Muyan Zhong, Sen Xing, Zeqiang Lai, Zhaoyang Liu, Zhe Chen, Wenhai Wang, Xizhou Zhu, Lewei Lu, Tong Lu, Ping Luo, Yu Qiao, Jifeng Dai:
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks. NeurIPS 2024
[i30]Changyao Tian, Xizhou Zhu, Yuwen Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Yuntao Chen, Lewei Lu, Tong Lu, Jie Zhou, Hongsheng Li, Yu Qiao, Jifeng Dai:
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer. CoRR abs/2401.10208 (2024)
[i29]Weiyun Wang, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu, Yu Qiao, Jifeng Dai:
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World. CoRR abs/2402.19474 (2024)
[i28]Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures. CoRR abs/2403.02308 (2024)
[i27]Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, Limin Wang:
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding. CoRR abs/2403.09626 (2024)
[i26]Yang Yang, Wenhai Wang, Zhe Chen, Jifeng Dai, Liang Zheng:
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments. CoRR abs/2403.13803 (2024)
[i25]Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang
, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
:
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD. CoRR abs/2404.06512 (2024)
[i24]Zhe Chen, Weiyun Wang
, Hao Tian, Shenglong Ye
, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang
, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites. CoRR abs/2404.16821 (2024)
[i23]Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu
, Mengkang Hu, Zhe Chen, Kaipeng Zhang
, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao, Wenhai Wang:
Needle In A Multimodal Haystack. CoRR abs/2406.07230 (2024)
[i22]Jiannan Wu, Muyan Zhong, Sen Xing, Zeqiang Lai, Zhaoyang Liu
, Wenhai Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Ping Luo, Yu Qiao, Jifeng Dai:
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks. CoRR abs/2406.08394 (2024)
[i21]Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu
, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang, Min Dou, Changyao Tian, Xizhou Zhu, Lewei Lu, Yushi Chen, Junjun He, Zhongying Tu, Tong Lu, Yali Wang, Limin Wang, Dahua Lin, Yu Qiao, Botian Shi, Conghui He, Jifeng Dai:
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text. CoRR abs/2406.08418 (2024)
[i20]Yangzhou Liu, Yue Cao, Zhangwei Gao, Weiyun Wang, Zhe Chen, Wenhai Wang, Hao Tian, Lewei Lu, Xizhou Zhu, Tong Lu, Yu Qiao, Jifeng Dai:
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity. CoRR abs/2407.15838 (2024)
[i19]Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang:
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance. CoRR abs/2410.16261 (2024)
[i18]Weiyun Wang, Zhe Chen, Wenhai Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Jinguo Zhu, Xizhou Zhu, Lewei Lu, Yu Qiao, Jifeng Dai:
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization. CoRR abs/2411.10442 (2024)
[i17]Tianbin Li, Yanzhou Su, Wei Li, Bin Fu, Zhe Chen, Ziyan Huang, Guoan Wang, Chenglong Ma
, Ying Chen, Ming Hu, Yanjun Li, Pengcheng Chen, Xiaowei Hu, Zhongying Deng, Yuanfeng Ji, Jin Ye, Yu Qiao, Junjun He:
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI. CoRR abs/2411.14522 (2024)
[i16]Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu
, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling. CoRR abs/2412.05271 (2024)
[i15]Chenyu Yang, Xuan Dong, Xizhou Zhu, Weijie Su, Jiahao Wang, Hao Tian, Zhe Chen, Wenhai Wang, Lewei Lu, Jifeng Dai:
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models. CoRR abs/2412.09613 (2024)
[i14]Chenxin Tao, Shiqian Su, Xizhou Zhu, Chenyu Zhang, Zhe Chen, Jiawen Liu, Wenhai Wang, Lewei Lu, Gao Huang, Yu Qiao, Jifeng Dai:
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding. CoRR abs/2412.16158 (2024)- 2023
[c7]Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu
, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li
, Xiaogang Wang, Yu Qiao:
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. CVPR 2023: 14408-14419
[c6]Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu
, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo:
DDP: Diffusion Model for Dense Visual Prediction. ICCV 2023: 21684-21695
[c5]Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao:
Vision Transformer Adapter for Dense Predictions. ICLR 2023
[c4]Guo Chen, Yin-Dong Zheng, Zhe Chen, Jiahao Wang, Tong Lu:
ELAN: Enhancing Temporal Action Detection with Location Awareness. ICME 2023: 1020-1025
[c3]Zhe Chen, Hao Tan, Tao Wang, Tianrun Shen, Tong Lu, Qiuying Peng, Cheng Cheng, Yue Qi:
Graph Propagation Transformer for Graph Representation Learning. IJCAI 2023: 3559-3567
[c2]Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai:
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. NeurIPS 2023
[i13]Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu:
Champion Solution for the WSDM2023 Toloka VQA Challenge. CoRR abs/2301.09045 (2023)
[i12]Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo:
DDP: Diffusion Model for Dense Visual Prediction. CoRR abs/2303.17559 (2023)
[i11]Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, Limin Wang, Ping Luo, Jifeng Dai, Yu Qiao:
InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language. CoRR abs/2305.05662 (2023)
[i10]Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai:
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. CoRR abs/2305.11175 (2023)
[i9]Zhe Chen, Hao Tan, Tao Wang, Tianrun Shen, Tong Lu, Qiuying Peng, Cheng Cheng, Yue Qi:
Graph Propagation Transformer for Graph Representation Learning. CoRR abs/2305.11424 (2023)
[i8]Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu:
AVSegFormer: Audio-Visual Segmentation with Transformer. CoRR abs/2307.01146 (2023)
[i7]Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao:
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World. CoRR abs/2308.01907 (2023)
[i6]Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su
, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai:
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. CoRR abs/2312.14238 (2023)- 2022
[c1]Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo:
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization. AAAI 2022: 393-400
[i5]Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao:
Vision Transformer Adapter for Dense Predictions. CoRR abs/2205.08534 (2022)
[i4]Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao:
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. CoRR abs/2211.05778 (2022)
[i3]Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei Huang, Zun Wang, Jiashuo Yu
, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, Limin Wang, Yu Qiao:
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges. CoRR abs/2211.09529 (2022)- 2021
[i2]Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo:
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization. CoRR abs/2103.11784 (2021)
[i1]Zhe Chen, Wenhai Wang, Enze Xie, Zhibo Yang, Tong Lu, Ping Luo:
FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation. CoRR abs/2111.02394 (2021)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-03-21 01:19 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







