


default search action
Yu Qiao 0001
Person information
- affiliation: Shanghai AI Laboratory, OpenGVLab, China
- affiliation: Chinese Academy of Sciences, Shenzhen Institutes of Advanced Technology, China
- affiliation (former): University of Tokyo, Graduate School of Information Science and Technology, Japan
- affiliation (PhD 2006): University of Electro-Communications, Tokyo, Japan
Other persons with the same name
- Yu Qiao — disambiguation page
- Yu Qiao 0002 — Biomedical Imaging Lab, Singapore
- Yu Qiao 0003
— Shanghai Jiao Tong University, Department of Automation, Institute of Image Processing and Pattern Recognition, China (and 1 more) - Yu Qiao 0004
— Kyung Hee University, School of Computing, Department of Artificial Intelligence, Yongin, South Korea (and 1 more) - Yu Qiao 0005 — RWTH Aachen University, Germany
- Yu Qiao 0006
— Nanjing University, National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2026
[j142]Xiangyu Chen
, Xintao Wang
, Wenlong Zhang, Xiangtao Kong, Yu Qiao
, Jiantao Zhou
, Chao Dong
:
HAT: Hybrid Attention Transformer for Image Restoration. IEEE Trans. Pattern Anal. Mach. Intell. 48(3): 2676-2694 (2026)
[j141]Ziqi Huang
, Fan Zhang
, Xiaojie Xu, Yinan He, Jiashuo Yu
, Ziyue Dong
, Qianli Ma
, Nattapol Chanpaisit, Chenyang Si
, Yuming Jiang
, Yaohui Wang
, Xinyuan Chen
, Ying-Cong Chen
, Limin Wang
, Dahua Lin
, Yu Qiao
, Ziwei Liu
:
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models. IEEE Trans. Pattern Anal. Mach. Intell. 48(3): 3268-3285 (2026)
[i526]Junhao Cai, Zetao Cai, Jiafei Cao, Yilun Chen, Zeyu He, Lei Jiang, Hang Li, Hengjie Li, Yang Li, Yufei Liu, Yanan Lu, Qi Lv, Haoxiang Ma, Jiangmiao Pang, Yu Qiao, Zherui Qiu, Yanqing Shen, Xu Shi, Yang Tian, Bolun Wang, Hanqing Wang, Jiaheng Wang, Tai Wang, Xueyuan Wei, Chao Wu, Yiman Xie, Boyang Xing, Yuqiang Yang, Yuyin Yang, Qiaojun Yu, Feng Yuan, Jia Zeng, Jingjing Zhang, Shenghan Zhang, Shi Zhang, Zhuoma Zhaxi, Bowen Zhou, Yuanzhen Zhou, Yunsong Zhou, Hongrui Zhu, Yangkun Zhu, Yuchen Zhu:
InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation. CoRR abs/2601.02456 (2026)
[i525]Bowen Yang, Kaiming Jin, Zhenyu Wu, Zhaoyang Liu, Qiushi Sun, Zehao Li, JingJing Xie, Zhoumianze Liu, Fangzhi Xu, Kanzhi Cheng, Qingyun Li, Yian Wang, Yu Qiao, Zun Wang, Zichen Ding:
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent. CoRR abs/2601.07779 (2026)
[i524]Yiming Ren, Junjie Wang, Yuxin Meng, Yihang Shi, Zhiqiang Lin, Ruihang Chu, Yiran Xu, Ziming Li, Yunfei Zhao, Zihan Wang, Yu Qiao, Ruiming Tang, Minghao Liu, Yujiu Yang:
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature. CoRR abs/2601.10108 (2026)- 2025
[j140]Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Yu Qiao, Li Li, Fei-Yue Wang:
Building intelligence identification system via large language model watermarking: a survey and beyond. Artif. Intell. Rev. 58(8): 249 (2025)
[j139]Kunchang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, Yu Qiao:
VideoChat: chat-centric video understanding. Sci. China Inf. Sci. 68(10) (2025)
[j138]Shixiang Wu, Chao Dong, Yu Qiao:
Exploring Contextual Priors for Real-World Image Super-Resolution. Comput. Vis. Media 11(1): 159-177 (2025)
[j137]Yu Qiao, Xiaohui Yang
, Jing Wang, Tongzhen Si, Qingbei Guo:
Driver Cognitive Distraction Detection based on eye movement behavior and integration of multi-view space-channel feature. Expert Syst. Appl. 266: 125975 (2025)
[j136]Yaohui Wang
, Xin Ma, Xinyuan Chen, Cunjian Chen, Antitza Dantcheva, Bo Dai, Yu Qiao:
LEO: Generative Latent Image Animator for Human Video Synthesis. Int. J. Comput. Vis. 133(3): 1277-1289 (2025)
[j135]Yaohui Wang
, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu:
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models. Int. J. Comput. Vis. 133(5): 3059-3078 (2025)
[j134]Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu
:
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy. Int. J. Comput. Vis. 133(8): 5806-5821 (2025)
[j133]Baoqi Pei, Yifei Huang
, Guo Chen, Jilan Xu, Yali Wang, Limin Wang, Tong Lu, Yu Qiao, Fei Wu:
Guiding Audio-Visual Question Answering with Collective Question Reasoning. Int. J. Comput. Vis. 133(10): 6912-6929 (2025)
[j132]Hongjie Zhang
, Lu Dong, Yi Liu, Yifei Huang, Yali Wang, Limin Wang, Yu Qiao:
LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering. Int. J. Comput. Vis. 133(11): 7726-7747 (2025)
[j131]Yifei Huang
, Jilan Xu
, Baoqi Pei
, Lijin Yang
, Mingfang Zhang
, Yuping He
, Guo Chen
, Xinyuan Chen
, Yaohui Wang
, Zheng Nie
, Jinyao Liu
, Dechen Lin
, Fang Fang
, Kunpeng Li
, Chang Yuan
, Yu Qiao
, Yali Wang
, Limin Wang
:
Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 9(3): 88:1-88:33 (2025)
[j130]Ziyan Huang
, Zhongying Deng
, Jin Ye
, Haoyu Wang, Yanzhou Su
, Tianbin Li, Hui Sun, Junlong Cheng
, Jianpin Chen, Junjun He
, Yun Gu, Shaoting Zhang, Lixu Gu, Yu Qiao:
A-Eval: A benchmark for cross-dataset and cross-modality evaluation of abdominal multi-organ segmentation. Medical Image Anal. 101: 103499 (2025)
[j129]Peng Xu
, Wenqi Shao
, Kaipeng Zhang
, Peng Gao
, Shuo Liu
, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao
, Ping Luo
:
LVLM-EHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models. IEEE Trans. Pattern Anal. Mach. Intell. 47(3): 1877-1893 (2025)
[j128]Zhiqi Li
, Wenhai Wang
, Hongyang Li
, Enze Xie
, Chonghao Sima
, Tong Lu
, Yu Qiao, Jifeng Dai
:
BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers. IEEE Trans. Pattern Anal. Mach. Intell. 47(3): 2020-2036 (2025)
[j127]Xiaowei Hu
, Min Shi, Weiyun Wang
, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhou, Lewei Lu, Jie Zhou
, Xiaogang Wang
, Yu Qiao
, Jifeng Dai
:
Demystify Transformers & Convolutions in Modern Image Deep Networks. IEEE Trans. Pattern Anal. Mach. Intell. 47(4): 2416-2428 (2025)
[j126]Haoyi Zhu
, Honghui Yang
, Xiaoyang Wu
, Di Huang
, Sha Zhang
, Xianglong He
, Hengshuang Zhao
, Chunhua Shen
, Yu Qiao
, Tong He
, Wanli Ouyang
:
PonderV2: Improved 3D Representation With a Universal Pre-Training Paradigm. IEEE Trans. Pattern Anal. Mach. Intell. 47(8): 6550-6565 (2025)
[j125]Xiangchao Yan
, Runjian Chen
, Bo Zhang
, Hancheng Ye
, Renqiu Xia
, Jiakang Yuan
, Hongbin Zhou
, Xinyu Cai, Botian Shi
, Wenqi Shao
, Ping Luo
, Yu Qiao
, Tao Chen
, Junchi Yan
:
SPOT: Scalable 3D Pre-Training via Occupancy Prediction for Learning Transferable 3D Representations. IEEE Trans. Pattern Anal. Mach. Intell. 47(11): 9609-9625 (2025)
[j124]Zhaokai Wang
, Xizhou Zhu, Xue Yang
, Gen Luo, Hao Li, Changyao Tian
, Wenhan Dou
, Junqi Ge, Lewei Lu
, Yu Qiao
, Jifeng Dai
:
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 47(11): 10142-10159 (2025)
[j123]Zihan Li
, Diping Song
, Zefeng Yang, Deming Wang
, Fei Li
, Xiulan Zhang, Paul E. Kinahan
, Yu Qiao
:
VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced With Clinical Knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 47(12): 11848-11862 (2025)
[j122]Boyu Chen
, Siran Chen
, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang:
Percept, Chat, Adapt: Knowledge transfer of foundation models for open-world video recognition. Pattern Recognit. 160: 111189 (2025)
[j121]Xiaohui Yang
, Yu Qiao, Tongzhen Si, Jing Wang, Tao Xu:
Eye-SCAN: Eye-Movement-Attention-based Spatial Channel Adaptive Network for traffic accident prediction. Pattern Recognit. 165: 111590 (2025)
[j120]Qingsong Zhao
, Yi Wang
, Yinan He, Yu Qiao
, Cairong Zhao
:
Learning Discriminative Representations in Videos via Active Embedding Distance Correlation. IEEE Signal Process. Lett. 32: 56-60 (2025)
[j119]Wenqi Shao
, Meng Lei, Yutao Hu, Peng Gao
, Peng Xu, Kaipeng Zhang
, Fanqing Meng
, Siyuan Huang, Hongsheng Li
, Yu Qiao
, Ping Luo:
TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models. IEEE Trans. Big Data 11(3): 933-947 (2025)
[j118]Lu Dong
, Haiyu Zhang
, Hongjie Zhang
, Yifei Huang, Zhen-Hua Ling
, Yu Qiao
, Limin Wang, Yali Wang
:
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining. IEEE Trans. Circuits Syst. Video Technol. 35(10): 10396-10409 (2025)
[j117]Hao Zhang
, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Nanning Zheng
, Kaipeng Zhang
:
B-AVIBench: Toward Evaluating the Robustness of Large Vision-Language Model on Black-Box Adversarial Visual-Instructions. IEEE Trans. Inf. Forensics Secur. 20: 1434-1446 (2025)
[j116]Weidong Zhang
, Yu Qiao, Ying Liu, Ran Song
, Wei Zhang
:
Fast 3D Room Layout Estimation Based on Compact High-Level Representation. IEEE Trans. Image Process. 34: 3930-3943 (2025)
[j115]Xin Ma, Yaohui Wang, Xinyuan Chen, Gengyun Jia, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, Yu Qiao:
Latte: Latent Diffusion Transformer for Video Generation. Trans. Mach. Learn. Res. 2025 (2025)
[j114]Weigao Sun, Zhen Qin, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong:
LASP: Linear Attention Sequence Parallelism. Trans. Mach. Learn. Res. 2025 (2025)
[j113]Xiangyu Chen
, Zheyuan Li, Zhengwen Zhang, Jimmy S. Ren, Yihao Liu
, Jingwen He
, Yu Qiao
, Jiantao Zhou
, Chao Dong
:
Towards Efficient SDRTV-to-HDRTV by Learning From Image Formation. IEEE Trans. Multim. 27: 8340-8354 (2025)
[j112]Xu Liu, Tong Zhou, Chong Wang, Yuping Wang, Yuanxin Wang, Qinjingwen Cao, Weizhi Du, Yonghuan Yang, Junjun He, Yu Qiao, Yiqing Shen:
Toward the unification of generative and discriminative visual foundation model: a survey. Vis. Comput. 41(5): 3371-3412 (2025)
[c429]Junyi Chen, Weicai Ye, Yifan Wang, Danpeng Chen, Di Huang, Wanli Ouyang, Guofeng Zhang, Yu Qiao, Tong He:
GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction. AAAI 2025: 2088-2096
[c428]Siran Chen, Yuxiao Luo
, Yue Ma, Yu Qiao, Yali Wang:
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving. AAAI 2025: 2212-2220
[c427]Yanbo Ding, Shaobin Zhuang, Kunchang Li, Zhengrong Yue, Yu Qiao, Yali Wang:
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration. AAAI 2025: 2753-2761
[c426]Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu:
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis. ACL (1) 2025: 5555-5579
[c425]Fan Zhang, Shulin Tian, Ziqi Huang, Yu Qiao, Ziwei Liu:
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models. ACL (1) 2025: 7561-7582
[c424]Fangzhi Xu, Qiushi Sun, Kanzhi Cheng, Jun Liu, Yu Qiao, Zhiyong Wu:
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models. ACL (1) 2025: 12975-12993
[c423]Jiakang Yuan, Xiangchao Yan, Bo Zhang, Tao Chen, Botian Shi, Wanli Ouyang, Yu Qiao, Lei Bai, Bowen Zhou:
Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback. ACL (1) 2025: 21768-21789
[c422]Haochen Xue, Feilong Tang, Ming Hu, Yexin Liu, Qidong Huang, Yulong Li, Chengzhi Liu, Zhongxing Xu, Chong Zhang, Chun-Mei Feng, Yutong Xie, Imran Razzak, Zongyuan Ge, Jionglong Su, Junjun He, Yu Qiao:
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation. ACL (1) 2025: 22477-22503
[c421]Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao:
LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts. ACL (1) 2025: 24763-24785
[c420]Yu Qiao, Lun Li, Feng Cheng, Jie Zhang, Jin Gao, Hongsong Zhu:
SecRAG: A Graph-Enhanced RAG Framework with Dynamic Prompt for Cybersecurity Applications. CSCWD 2025: 2053-2058
[c419]Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu
, Tianhua Li, Yuxuan Xie, Xiaojun Chang
, Yu Qiao, Wenqi Shao, Kaipeng Zhang
:
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation. CVPR 2025: 56-66
[c418]Bingjie Gao, Xinyu Gao, Xiaoxue Wu, Yujie Zhou, Yu Qiao, Li Niu, Xinyuan Chen, Yaohui Wang:
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation. CVPR 2025: 3173-3183
[c417]Ying Chen, Guoan Wang, Yuanfeng Ji, Yanjun Li, Jin Ye, Tianbin Li, Ming Hu, Rongshan Yu, Yu Qiao, Junjun He:
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding. CVPR 2025: 5134-5143
[c416]Chenxin Tao, Shiqian Su, Xizhou Zhu, Chenyu Zhang, Zhe Chen, Jiawen Liu, Wenhai Wang, Lewei Lu, Gao Huang, Yu Qiao, Jifeng Dai:
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding. CVPR 2025: 14559-14569
[c415]Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao:
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models. CVPR 2025: 19867-19878
[c414]Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jiawen Liu, Jifeng Dai, Yu Qiao, Xizhou Zhu:
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training. CVPR 2025: 24960-24971
[c413]Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang:
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment. CVPR 2025: 29880-29892
[c412]Mingzhou Liu, Ching-Wen Lee, Xinwei Sun, Xueqing Yu, Yu Qiao, Yizhou Wang:
Learning Causal Alignment for Reliable Disease Diagnosis. ICLR 2025
[c411]Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao:
OS-ATLAS: Foundation Action Model for Generalist GUI Agents. ICLR 2025
[c410]Hengwei Bian, Lingdong Kong, Haozhe Xie, Liang Pan, Yu Qiao, Ziwei Liu:
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes. ICLR 2025
[c409]Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures. ICLR 2025
[c408]Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xie, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, Tong He, Jingwen He, Junjun He, Yu Qiao, Hongsheng Li:
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation. ICLR 2025
[c407]Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong:
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality. ICLR 2025
[c406]Fanqing Meng, Jin Wang, Chuanhao Li, Quanfeng Lu, Hao Tian, Tianshuo Yang, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao:
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models. ICLR 2025
[c405]Baoqi Pei, Yifei Huang, Jilan Xu, Guo Chen, Yuping He, Lijin Yang, Yali Wang, Weidi Xie, Yu Qiao, Fei Wu, Limin Wang:
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning. ICLR 2025
[c404]Chongjie Si, Xuehui Wang, Xue Yang, Zhengqin Xu, Qingyun Li, Jifeng Dai, Yu Qiao, Xiaokang Yang, Wei Shen:
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning. ICLR 2025
[c403]Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang:
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel. ICLR 2025
[c402]Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, Limin Wang:
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning. ICLR 2025
[c401]Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, Jing Shao:
REEF: Representation Encoding Fingerprints for Large Language Models. ICLR 2025
[c400]Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, Chao Dong:
An Intelligent Agentic System for Complex Image Restoration Problems. ICLR 2025
[c399]Yu Qiao, Tianyu Meng, Huilin Ge, Xinning Wang, Jiayue Zhao, Qianchen Xia, Xin Yang:
Localization Hints Exploration for Object Matting. ICME 2025: 1-6
[c398]Guoqing Zhao, Qi Zhang, Shaopeng Zhai, Dazhong Shen, Tianyi Zhang, Yu Qiao, Tong Xu:
I-Lora: Iterative Merging of Routing-Tuned Low-Rank Adapters for Multi-Task Learning. ICME 2025: 1-6
[c397]Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He
, Sucheng Qian, Enshen Zhou, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng:
RH20T-P: A Primitive-Level Robotic Manipulation Dataset towards Composable Generalization Agents in Real-world Scenarios. IROS 2025: 20532-20539
[c396]Zhaodong Wu, Qiaochu Zhao, Ming Hu, Yulong Li, Haochen Xue, Zhengyong Jiang, Angelos Stefanidis, Qiufeng Wang, Imran Razzak, Zongyuan Ge, Junjun He, Yu Qiao, Zhong Zheng, Feilong Tang, Kang Dang, Jionglong Su:
MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset. MICCAI (2) 2025: 378-388
[c395]Yangyang Xu
, Shengfeng He
, Wenqi Shao
, Yong Du
, Kwan-Yee K. Wong
, Yu Qiao
, Jun Yu
, Ping Luo
:
DiffusionMat: Alpha Matting as Deterministic Sequential Refinement Learning. ACM Multimedia 2025: 9454-9462
[c394]Jingwen He
, Hongbo Liu
, Jiajun Li
, Ziqi Huang
, Yu Qiao, Wanli Ouyang
, Ziwei Liu
:
Cut2Next: Generating Next Shot via In-Context Tuning. SIGGRAPH Asia 2025: 22:1-22:11
[i523]Xinhao Li, Yi Wang, Jiashuo Yu, Xiangyu Zeng, Yuhan Zhu, Haian Huang, Jianfei Gao, Kunchang Li, Yinan He, Chenting Wang, Yu Qiao, Yali Wang, Limin Wang:
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling. CoRR abs/2501.00574 (2025)
[i522]Jiakang Yuan, Xiangchao Yan, Botian Shi, Tao Chen, Wanli Ouyang, Bo Zhang, Lei Bai, Yu Qiao, Bowen Zhou:
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback. CoRR abs/2501.03916 (2025)
[i521]Siran Chen, Yuxiao Luo, Yue Ma, Yu Qiao, Yali Wang:
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving. CoRR abs/2501.04302 (2025)
[i520]Zhaokai Wang, Xizhou Zhu, Xue Yang, Gen Luo, Hao Li, Changyao Tian, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai:
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding. CoRR abs/2501.07783 (2025)
[i519]Weichen Fan, Chenyang Si, Junhao Song, Zhenyu Yang, Yinan He, Long Zhuo, Ziqi Huang, Ziyue Dong, Jingwen He, Dongwei Pan, Yi Wang, Yuming Jiang, Yaohui Wang, Peng Gao, Xinyuan Chen, Hengjie Li, Dahua Lin, Yu Qiao, Ziwei Liu:
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models. CoRR abs/2501.08453 (2025)
[i518]Chenyang Si, Weichen Fan, Zhengyao Lv, Ziqi Huang, Yu Qiao, Ziwei Liu:
RepVideo: Rethinking Cross-Layer Representation for Video Generation. CoRR abs/2501.08994 (2025)
[i517]Xiaohui Li, Yihao Liu, Shuo Cao, Ziyan Chen, Shaobin Zhuang, Xiangyu Chen, Yinan He, Yi Wang, Yu Qiao:
DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency. CoRR abs/2501.10110 (2025)
[i516]Yi Wang, Xinhao Li, Ziang Yan, Yinan He, Jiashuo Yu, Xiangyu Zeng, Chenting Wang, Changlian Ma, Haian Huang, Jianfei Gao, Min Dou, Kai Chen, Wenhai Wang, Yu Qiao, Yali Wang, Limin Wang:
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling. CoRR abs/2501.12386 (2025)
[i515]Jia Yu, Fei Yuan, Rui Min, Jing Yu, Pei Chu, Jiayang Li, Wei Li, Ruijie Zhang, Zhenxiang Li, Zhifei Ren, Dong Zheng, Wenjian Zhang, Yan Teng, Lingyu Meng, Zhenjiang Jin, Jiantao Qiu, ShaSha Wang, Zhongying Tu, Dahua Lin, Yu Wang, Yu Qiao, Yanfeng Wang, Conghui He:
WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages. CoRR abs/2501.14506 (2025)
[i514]Dongyang Liu, Shicheng Li, Yutong Liu, Zhen Li, Kai Wang, Xinyue Li, Qi Qin, Yufei Liu, Yi Xin, Zhongyu Li, Bin Fu, Chenyang Si, Yuewen Cao, Conghui He, Ziwei Liu, Yu Qiao, Qibin Hou, Hongsheng Li, Peng Gao:
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT. CoRR abs/2502.06782 (2025)
[i513]Daocheng Fu, Naiting Zhong, Xu Han, Pinlong Cai, Licheng Wen, Song Mao, Botian Shi, Yu Qiao:
LimSim Series: An Autonomous Driving Simulation Platform for Validation and Enhancement. CoRR abs/2502.09170 (2025)
[i512]Haochen Xue, Feilong Tang, Ming Hu, Yexin Liu, Qidong Huang, Yulong Li, Chengzhi Liu, Zhongxing Xu, Chong Zhang, Chun-Mei Feng, Yutong Xie, Imran Razzak, Zongyuan Ge
, Jionglong Su, Junjun He, Yu Qiao:
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation. CoRR abs/2502.11903 (2025)
[i511]Baoqi Pei, Yifei Huang, Jilan Xu, Guo Chen, Yuping He, Lijin Yang, Yali Wang, Weidi Xie, Yu Qiao, Fei Wu, Limin Wang:
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning. CoRR abs/2503.00986 (2025)
[i510]Yifei Huang, Jilan Xu, Baoqi Pei, Yuping He, Guo Chen, Mingfang Zhang, Lijin Yang, Zheng Nie, Jinyao Liu, Guoshun Fan, Dechen Lin, Fang Fang, Kunpeng Li, Chang Yuan, Xinyuan Chen, Yaohui Wang, Yali Wang, Yu Qiao, Limin Wang:
An Egocentric Vision-Language Model based Portable Real-time Smart Assistant. CoRR abs/2503.04250 (2025)
[i509]AgiBot-World-Contributors, Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui
, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xu Huang, Shu Jiang, Yuxin Jiang, Cheng Jing, Hongyang Li, Jialu Li, Chiming Liu, Yi Liu, Yuxiang Lu, Jianlan Luo, Ping Luo, Yao Mu, Yuehan Niu, Yixuan Pan, Jiangmiao Pang, Yu Qiao, Guanghui Ren
, Cheng Ruan, Jiaqi Shan, Yongjian Shen, Chengshi Shi, Mingkang Shi, Modi Shi, Chonghao Sima, Jianheng Song, Huijie Wang, Wenhao Wang, Dafeng Wei, Chengen Xie, Guo Xu, Junchi Yan, Cunbiao Yang, Lei Yang, Shukai Yang, Maoqing Yao, Jia Zeng, Chi Zhang, Qinglin Zhang, Bin Zhao, Chengyue Zhao, Jiaqi Zhao, Jianchao Zhu:
AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems. CoRR abs/2503.06669 (2025)
[i508]Fanqing Meng, Lingxiao Du, Zongkai Liu, Zhixiang Zhou, Quanfeng Lu, Daocheng Fu, Botian Shi, Wenhai Wang, Junjun He, Kaipeng Zhang
, Ping Luo, Yu Qiao, Qiaosheng Zhang, Wenqi Shao:
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning. CoRR abs/2503.07365 (2025)
[i507]Weiyun Wang, Zhangwei Gao, Lianjie Chen, Zhe Chen, Jinguo Zhu, Xiangyu Zhao, Yangzhou Liu, Yue Cao, Shenglong Ye, Xizhou Zhu, Lewei Lu, Haodong Duan, Yu Qiao, Jifeng Dai, Wenhai Wang:
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning. CoRR abs/2503.10291 (2025)
[i506]Zhaodong Wu, Qiaochu Zhao, Ming Hu, Yulong Li, Haochen Xue, Kang Dang, Zhengyong Jiang, Angelos Stefanidis, Qiufeng Wang, Imran Razzak, Zongyuan Ge, Junjun He, Yu Qiao, Zhong Zheng, Feilong Tang, Jionglong Su:
MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset. CoRR abs/2503.13560 (2025)
[i505]Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu
, Yunhong Wang, Yu Qiao:
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset. CoRR abs/2503.19462 (2025)
[i504]Zhi Hou, Tianyi Zhang
, Yuwen Xiong, Haonan Duan, Hengjun Pu, Ronglei Tong, Chengyang Zhao, Xizhou Zhu, Yu Qiao, Jifeng Dai, Yuntao Chen:
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy. CoRR abs/2503.19757 (2025)
[i503]Shitian Zhao, Qilong Wu
, Xinyue Li, Bo Zhang, Ming Li, Qi Qin, Dongyang Liu, Kai Zhang, Hongsheng Li, Yu Qiao, Peng Gao, Bin Fu, Zhen Li:
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis. CoRR abs/2503.21749 (2025)
[i502]Dian Zheng, Ziqi Huang, Hongbo Liu, Kai Zou, Yinan He, Fan Zhang, Yuanhan Zhang, Jingwen He, Wei-Shi Zheng, Yu Qiao, Ziwei Liu:
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness. CoRR abs/2503.21755 (2025)
[i501]Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Jiakang Yuan, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Manyuan Zhang, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Conghui He, Bo Zhang, Xiaohong Liu, Hongsheng Li, Yu Qiao, Chang Xu, Peng Gao:
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework. CoRR abs/2503.21758 (2025)
[i500]Ruifeng Luo, Zhengjie Liu, Tianxiao Cheng, Jie Wang, Tongjie Wang, Xingguang Wei, Haomin Wang
, Yanpeng Li, Fu Chai, Fei Cheng, Shenglong Ye, Wenhai Wang, Yanting Zhang, Yu Qiao, Hongjie Zhang, Xianzhong Zhao:
ArchCAD-400K: An Open Large-Scale Architectural CAD Dataset and New Baseline for Panoptic Symbol Spotting. CoRR abs/2503.22346 (2025)
[i499]Yuandong Pu, Le Zhuo, Kaiwen Zhu, Liangbin Xie, Wenlong Zhang, Xiangyu Chen, Peng Gao, Yu Qiao, Chao Dong, Yihao Liu:
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision. CoRR abs/2504.04903 (2025)
[i498]Xinhao Li, Ziang Yan, Desen Meng, Lu Dong, Xiangyu Zeng, Yinan He, Yali Wang, Yu Qiao, Yi Wang, Limin Wang:
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning. CoRR abs/2504.06958 (2025)
[i497]Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu
, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang
, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang, Jiapeng Luo, Yi Wang, Conghui He, Botian Shi, Xingcheng Zhang, Wenqi Shao, Junjun He, Yingtong Xiong, Wenwen Qu, Peng Sun, Penglong Jiao, Han Lv, Lijun Wu, Kaipeng Zhang
, Huipeng Deng, Jiaye Ge, Kai Chen, Limin Wang, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models. CoRR abs/2504.10479 (2025)
[i496]Bingjie Gao, Xinyu Gao, Xiaoxue Wu, Yujie Zhou, Yu Qiao, Li Niu, Xinyuan Chen, Yaohui Wang:
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation. CoRR abs/2504.11739 (2025)
[i495]Daocheng Fu, Zijun Chen, Renqiu Xia, Qi Liu, Yuan Feng, Hongbin Zhou, Renrui Zhang, Shiyang Feng
, Peng Gao, Junchi Yan, Botian Shi, Bo Zhang, Yu Qiao:
TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving. CoRR abs/2504.15780 (2025)
[i494]Siqi Li, Yufan Shen, Xiangnan Chen, Jiayi Chen, Hengwei Ju, Haodong Duan, Song Mao, Hongbin Zhou, Bo Zhang, Bin Fu, Pinlong Cai, Licheng Wen, Botian Shi, Yong Liu, Xinyu Cai, Yu Qiao:
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling. CoRR abs/2505.00063 (2025)
[i493]Lu Dong, Haiyu Zhang, Hongjie Zhang, Yifei Huang, Zhen-Hua Ling, Yu Qiao, Limin Wang, Yali Wang:
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining. CoRR abs/2505.06557 (2025)
[i492]Jianbiao Mei, Tao Hu, Daocheng Fu, Licheng Wen, Xuemeng Yang, Rong Wu, Pinlong Cai, Xinyu Cai, Xing Gao, Yu Yang, Chengjun Xie, Botian Shi, Yong Liu, Yu Qiao:
O2-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering. CoRR abs/2505.16582 (2025)
[i491]Xingguang Wei, Haomin Wang
, Shenglong Ye, Ruifeng Luo, Yanting Zhang, Lixin Gu, Jifeng Dai, Yu Qiao, Wenhai Wang, Hongjie Zhang:
Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings. CoRR abs/2505.23395 (2025)
[i490]Chenyu Yang, Shiqian Su, Shi Liu, Xuan Dong, Yue Yu, Weijie Su, Xuehui Wang, Zhaoyang Liu, Jinguo Zhu, Hao Li, Wenhai Wang, Yu Qiao, Xizhou Zhu, Jifeng Dai:
ZeroGUI: Automating Online GUI Learning at Zero Human Cost. CoRR abs/2505.23762 (2025)
[i489]Gen Luo, Ganlin Yang, Ziyang Gong, Guanzhou Chen, Haonan Duan, Erfei Cui, Ronglei Tong, Zhi Hou, Tianyi Zhang
, Zhe Chen, Shenglong Ye, Lewei Lu, Jingbo Wang, Wenhai Wang, Jifeng Dai, Yu Qiao, Rongrong Ji, Xizhou Zhu:
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces. CoRR abs/2506.00123 (2025)
[i488]Yue Yang, MingKang Chen, Qihua Liu, Mengkang Hu, Qiguang Chen, Gengrui Zhang, Shuyue Hu, Guangtao Zhai, Yu Qiao, Yu Wang, Wenqi Shao, Ping Luo:
Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation. CoRR abs/2506.02648 (2025)
[i487]Zhengyao Lv, Chenyang Si, Tianlin Pan
, Zhaoxi Chen, Kwan-Yee K. Wong, Yu Qiao, Ziwei Liu:
DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation. CoRR abs/2506.03123 (2025)
[i486]Zikang Wang, Boyu Chen, Zhengrong Yue, Yi Wang, Yu Qiao, Limin Wang, Yali Wang:
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning. CoRR abs/2506.06097 (2025)
[i485]Boyu Chen, Siran Chen, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang:
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding. CoRR abs/2506.07576 (2025)
[i484]


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID