


default search action
Bowen Zhang 0002
Person information
- affiliation: Apple, USA
- affiliation (PhD 2022): University of Southern California, Department of Computer Science, CA, USA
- affiliation (former): Tongji University, Department of Computer Science and Technology, Shanghai, China
Other persons with the same name
- Bowen Zhang (aka: Bo-Wen Zhang) — disambiguation page
- Bowen Zhang 0001
— Boston University, Department of Mechanical Engineering, MA, USA - Bowen Zhang 0003
— Xidian University, School of Economics and Management, Xi'an, China - Bowen Zhang 0004
— Xidian University, State Key Laboratory of Integrated Service Networks, Xi'an, China - Bowen Zhang 0005
— Shenzhen Technology University, College of Big Data and Internet, Shenzhen, China (and 1 more) - Bowen Zhang 0006 — Shanghai Jiao Tong University, Shanghai, China
- Bowen Zhang 0007 — Bytedance Inc, Beijing, China
- Bowen Zhang 0008 — Delft University of Technology, Delft, The Netherlands
- Bowen Zhang 0009
— University of Adelaide, SA, Australia - Bowen Zhang 0010
— University of Science and Technology of China, USTC, China - Bowen Zhang 0011
(aka: Bo-Wen Zhang 0011) — Beijing Academy of Artificial Intelligence (BAAI), China (and 1 more) - Bowen Zhang 0012
— Shanghai Jiao Tong University, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c21]Ruohong Zhang, Bowen Zhang, Yanghao Li, Haotian Zhang, Zhiqing Sun, Zhe Gan, Yinfei Yang, Ruoming Pang, Yiming Yang:
Improve Vision Language Model Chain-of-thought Reasoning. ACL (1) 2025: 1631-1662
[c20]Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Wenze Hu, Juan Lao Tebar, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang:
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models. ICLR 2025
[c19]Hanrong Ye, Haotian Zhang, Erik A. Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, Jiasen Lu, Yinfei Yang:
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA. ICLR 2025
[c18]Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier, Zhengfeng Lai, Haoxuan You, Zirui Wang, et al.:
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning. ICLR 2025
[c17]Hong-You Chen, Zhengfeng Lai, Haotian Zhang, Xinze Wang, Marcin Eichner, Keen You, Meng Cao, Bowen Zhang, Yinfei Yang, Zhe Gan:
Contrastive Localized Language-Image Pre-Training. ICML 2025
[i28]Chen Chen, Rui Qian, Wenze Hu, Tsu-Jui Fu, Jialing Tong, Xinze Wang, Lezhi Li, Bowen Zhang, Alex Schwing, Wei Liu, Yinfei Yang:
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation. CoRR abs/2503.10618 (2025)
[i27]Mark Lee, Tom Gunter, Chang Lan, John Peebles, Hanzhi Zhou, Kelvin Zou, Sneha Bangalore, Chung-Cheng Chiu, Nan Du, Xianzhi Du, Philipp Dufter, Ruixuan Hou, Haoshuo Huang, Dongseong Hwang, Xiang Kong, Jinhao Lei, Tao Lei, Meng Li, Li Li, Jiarui Lu, Zhiyun Lu, Yiping Ma, David Qiu, Vivek Rathod, Senyu Tong, Zhucheng Tu, Jianyu Wang, Yongqiang Wang, Zirui Wang, Floris Weers, Sam Wiseman, Guoli Yin, Bowen Zhang, Xiyou Zhou, Danyang Zhuo, Cheng Leong, Ruoming Pang:
AXLearn: Modular Large Model Training on Heterogeneous Infrastructure. CoRR abs/2507.05411 (2025)
[i26]Yanghao Li, Rui Qian, Bowen Pan, Haotian Zhang, Haoshuo Huang, Bowen Zhang, Jialing Tong, Haoxuan You, Xianzhi Du, Zhe Gan, Hyunjik Kim, Chao Jia, Zhenbang Wang, Yinfei Yang, Mingfei Gao, Zi-Yi Dou, Wenze Hu, Chang Gao, Dongxu Li, Philipp Dufter, Zirui Wang, Guoli Yin, Zhengdong Zhang, Chen Chen, Yang Zhao, Ruoming Pang, Zhifeng Chen:
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer. CoRR abs/2509.16197 (2025)- 2024
[c16]Zhengfeng Lai
, Haotian Zhang
, Bowen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah
, Yinfei Yang, Meng Cao:
VeCLIP: Improving CLIP Training via Visual-Enriched Captions. ECCV (42) 2024: 111-127
[c15]Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang:
MM1: Methods, Analysis and Insights from Multimodal LLM Pre-training. ECCV (29) 2024: 304-323
[c14]Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Yinfei Yang:
MOFI: Learning Image Representations from Noisy Entity Annotated Images. ICLR 2024
[c13]Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang:
Ferret: Refer and Ground Anything Anywhere at Any Granularity. ICLR 2024
[i25]Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang:
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training. CoRR abs/2403.09611 (2024)
[i24]Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang:
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models. CoRR abs/2404.07973 (2024)
[i23]Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier, Zhengfeng Lai, Haoxuan You, Zirui Wang, Afshin Dehghan, Peter Grasch, Yinfei Yang:
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning. CoRR abs/2409.20566 (2024)
[i22]Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Juan Lao Tebar, Wenze Hu, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang:
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models. CoRR abs/2410.02740 (2024)
[i21]Hong-You Chen, Zhengfeng Lai, Haotian Zhang, Xinze Wang, Marcin Eichner, Keen You, Meng Cao, Bowen Zhang, Yinfei Yang, Zhe Gan:
Contrastive Localized Language-Image Pre-Training. CoRR abs/2410.02746 (2024)
[i20]Hanrong Ye, Haotian Zhang, Erik A. Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, Jiasen Lu, Yinfei Yang:
MM-Ego: Towards Building Egocentric Multimodal LLMs. CoRR abs/2410.07177 (2024)
[i19]Ruohong Zhang, Bowen Zhang, Yanghao Li, Haotian Zhang, Zhiqing Sun, Zhe Gan, Yinfei Yang, Ruoming Pang, Yiming Yang:
Improve Vision Language Model Chain-of-thought Reasoning. CoRR abs/2410.16198 (2024)
[i18]Zongyu Lin, Wei Liu, Chen Chen, Jiasen Lu, Wenze Hu, Tsu-Jui Fu, Jesse Allardice
, Zhengfeng Lai, Liangchen Song, Bowen Zhang, Cha Chen, Yiran Fei, Yifan Jiang, Lezhi Li, Yizhou Sun, Kai-Wei Chang, Yinfei Yang:
STIV: Scalable Text and Image Conditioned Video Generation. CoRR abs/2412.07730 (2024)- 2023
[c12]Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Yantao Zheng, Jonathon Shlens, Ruoming Pang, Yinfei Yang:
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens. EMNLP 2023: 15079-15094
[i17]Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang:
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens. CoRR abs/2301.13081 (2023)
[i16]Liangliang Cao, Bowen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng:
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness. CoRR abs/2305.05095 (2023)
[i15]Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang:
MOFI: Learning Image Representations from Noisy Entity Annotated Images. CoRR abs/2306.07952 (2023)
[i14]Erik A. Daxberger, Floris Weers, Bowen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du:
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts. CoRR abs/2309.04354 (2023)
[i13]Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang:
Ferret: Refer and Ground Anything Anywhere at Any Granularity. CoRR abs/2310.07704 (2023)- 2021
[c11]Bowen Zhang, Hexiang Hu, Linlu Qiu, Peter Shaw, Fei Sha:
Visually Grounded Concept Composition. EMNLP (Findings) 2021: 201-215
[c10]Linlu Qiu, Hexiang Hu, Bowen Zhang, Peter Shaw, Fei Sha:
Systematic Generalization on gSCAN: What is Nearly Solved and What is Next? EMNLP (1) 2021: 2180-2188
[c9]Yichao Zhou, Wei-Ting Chen, Bowen Zhang, David Lee, J. Harry Caufield, Kai-Wei Chang, Yizhou Sun, Peipei Ping, Wei Wang
:
CREATe: Clinical Report Extraction and Annotation Technology. ICDE 2021: 2677-2680
[i12]Yichao Zhou, Wei-Ting Chen, Bowen Zhang, David Lee, J. Harry Caufield, Kai-Wei Chang, Yizhou Sun, Peipei Ping, Wei Wang:
CREATe: Clinical Report Extraction and Annotation Technology. CoRR abs/2103.00562 (2021)
[i11]Linlu Qiu, Hexiang Hu, Bowen Zhang, Peter Shaw, Fei Sha:
Systematic Generalization on gSCAN: What is Nearly Solved and What is Next? CoRR abs/2109.12243 (2021)
[i10]Bowen Zhang, Hexiang Hu, Linlu Qiu, Peter Shaw, Fei Sha:
Visually Grounded Concept Composition. CoRR abs/2109.14115 (2021)
[i9]Bowen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha:
Co-training Transformer with Videos and Images Improves Action Recognition. CoRR abs/2112.07175 (2021)- 2020
[c8]Bowen Zhang, Hexiang Hu, Vihan Jain, Eugene Ie, Fei Sha:
Learning to Represent Image and Text with Denotation Graph. EMNLP (1) 2020: 823-839
[i8]Bowen Zhang, Hexiang Hu, Fei Sha:
Visual Storytelling via Predicting Anchor Word Embeddings in the Stories. CoRR abs/2001.04541 (2020)
[i7]Bowen Zhang, Hexiang Hu, Vihan Jain, Eugene Ie, Fei Sha:
Learning to Represent Image and Text with Denotation Graph. CoRR abs/2010.02949 (2020)
[i6]Bowen Zhang, Hexiang Hu, Joonseok Lee
, Ming Zhao, Sheide Chammas, Vihan Jain, Eugene Ie, Fei Sha:
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus. CoRR abs/2011.09046 (2020)
2010 – 2019
- 2019
[i5]Melissa Ailem, Bowen Zhang, Fei Sha:
Topic Augmented Generator for Abstractive Summarization. CoRR abs/1908.07026 (2019)- 2018
[j3]Bowen Zhang
, Limin Wang
, Zhe Wang
, Yu Qiao
, Hanli Wang
:
Real-Time Action Recognition With Deeply Transferred Motion Vector CNNs. IEEE Trans. Image Process. 27(5): 2326-2339 (2018)
[c7]Bowen Zhang, Hexiang Hu
, Fei Sha:
Cross-Modal and Hierarchical Modeling of Video and Text. ECCV (13) 2018: 385-401
[c6]Melissa Ailem, Bowen Zhang, Aurélien Bellet, Pascal Denis, Fei Sha:
A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images. EMNLP 2018: 1478-1487
[i4]Bowen Zhang, Hexiang Hu, Fei Sha:
Cross-Modal and Hierarchical Modeling of Video and Text. CoRR abs/1810.07212 (2018)- 2017
[j2]Yun Yi
, Hanli Wang, Bowen Zhang:
Learning correlations for human action recognition in videos. Multim. Tools Appl. 76(18): 18891-18913 (2017)
[j1]Zhe Wang
, Limin Wang
, Yali Wang, Bowen Zhang
, Yu Qiao:
Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition. IEEE Trans. Image Process. 26(4): 2028-2041 (2017)- 2016
[c5]Bowen Zhang
, Limin Wang, Zhe Wang, Yu Qiao
, Hanli Wang:
Real-Time Action Recognition with Enhanced Motion Vector CNNs. CVPR 2016: 2718-2726
[i3]Bowen Zhang, Limin Wang, Zhe Wang, Yu Qiao, Hanli Wang:
Real-time Action Recognition with Enhanced Motion Vector CNNs. CoRR abs/1604.07669 (2016)
[i2]Yuanjun Xiong, Limin Wang, Zhe Wang, Bowen Zhang, Hang Song, Wei Li, Dahua Lin, Yu Qiao, Luc Van Gool, Xiaoou Tang:
CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016. CoRR abs/1608.00797 (2016)
[i1]Zhe Wang, Limin Wang, Yali Wang, Bowen Zhang, Yu Qiao:
Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition. CoRR abs/1609.00153 (2016)- 2015
[c4]Yun Yi, Hanli Wang, Bowen Zhang, Jian Yu:
MIC-TJU in MediaEval 2015 Affective Impact of Movies Task. MediaEval 2015
[c3]Bowen Zhang
, Hanli Wang:
Encoding scale into fisher vector for human action recognition. VCIP 2015: 1-4- 2014
[c2]Bowen Zhang, Yun Yi, Hanli Wang, Jian Yu:
MIC-TJU at MediaEval Violent Scenes Detection (VSD) 2014. MediaEval 2014
[c1]Lei Wang, Yun Yi, Bowen Zhang, Fengkuangtian Zhu, Bo Xiao, Tianyao Sun, Hanli Wang:
MIC_TJ at TRECVID 2014. TRECVID 2014
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-03-21 23:40 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







