


default search action
Zhen Ye 0006
Person information
- affiliation: Hong Kong University of Science and Technology, Hong Kong, SAR, China
Other persons with the same name
- Zhen Ye — disambiguation page
- Zhen Ye 0001
— University College London, London, UK - Zhen Ye 0002
— Beihang University, Beijing, China - Zhen Ye 0003
— Anhui Normal University, Wuhu, China - Zhen Ye 0004
— Chengdu University of Technology, Chengdu, China - Zhen Ye 0005
— Lishui University, Lishui, China - Zhen Ye 0007
— Chang'an University, Xi'an, China - Zhen Ye 0008
— Hubei University of Technology, Wuhan, China - Zhen Ye 0009
— Technische Universität München, Munich, Germany
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c11]Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue:
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model. AAAI 2025: 25697-25705
[c10]Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen, Zhiyong Huang, Jing Ma:
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness. ACL (1) 2025: 4234-4253
[c9]Chi-Min Chan, Chunpu Xu, Junqi Zhu, Jiaming Ji, Donghai Hong, Pengcheng Wen, Chunyang Jiang, Zhen Ye, Yaodong Yang, Wei Xue, Sirui Han, Yike Guo:
Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA. ACL (Findings) 2025: 7433-7451
[c8]Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo:
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation. ICLR 2025
[c7]Rao Fu, Ziyang Luo, Hongzhan Lin, Zhen Ye, Jing Ma:
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges. NAACL (Short Papers) 2025: 689-699
[i18]Zhen Ye, Xinfa Zhu, Chi-Min Chan, Xinsheng Wang, Xu Tan
, Jiahe Lei, Yi Peng, Haohe Liu, Yizhu Jin, Zheqi Dai, Hongzhan Lin, Jianyi Chen, Xingjian Du, Liumeng Xue, Yunlin Chen, Zhifei Li, Lei Xie, Qiuqiang Kong, Yike Guo, Wei Xue:
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis. CoRR abs/2502.04128 (2025)
[i17]Boyi Kang, Xinfa Zhu, Zihan Zhang, Zhen Ye, Mingshuai Liu, Ziqian Wang, Yike Zhu, Guobin Ma, Jun Chen, Longshuai Xiao, Chao Weng, Wei Xue, Lei Xie:
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement. CoRR abs/2503.00493 (2025)
[i16]Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo
, Wei Xue:
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens. CoRR abs/2503.01710 (2025)
[i15]Ruibin Yuan, Hanfeng Lin, Shuyue Guo, Ge Zhang, Jiahao Pan, Yongyi Zang, Haohe Liu, Yiming Liang, Wenye Ma, Xingjian Du, Xinrun Du, Zhen Ye, Tianyu Zheng, Yinghao Ma, Minghao Liu, Zeyue Tian, Ziya Zhou, Liumeng Xue, Xingwei Qu, Yizhi Li, Shangda Wu, Tianhao Shen, Ziyang Ma, Jun Zhan, Chunhui Wang, Yatian Wang, Xiaowei Chi, Xinyue Zhang, Zhenzhu Yang, Xiangzhou Wang, Shansong Liu, Lingrui Mei, Peng Li, Junjie Wang, Jianwei Yu, Guojian Pang, Xu Li, Zihao Wang, Xiaohuan Zhou, Lijun Yu, Emmanouil Benetos, Yong Chen, Chenghua Lin, Xie Chen, Gus Xia, Zhaoxiang Zhang, Chao Zhang, Wenhu Chen, Xinyu Zhou, Xipeng Qiu, Roger B. Dannenberg, Zheng-Jia Liu, Jian Yang, Wenhao Huang, Wei Xue, Xu Tan, Yike Guo:
YuE: Scaling Open Foundation Models for Long-Form Music Generation. CoRR abs/2503.08638 (2025)
[i14]Chi-Min Chan, Chunpu Xu, Jiaming Ji, Zhen Ye, Pengcheng Wen, Chunyang Jiang, Yaodong Yang, Wei Xue, Sirui Han, Yike Guo:
J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge. CoRR abs/2505.11875 (2025)
[i13]Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen, Zhiyong Huang, Jing Ma:
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness. CoRR abs/2507.01702 (2025)
[i12]Yizhu Jin, Zhen Ye, Zeyue Tian, Haohe Liu, Qiuqiang Kong, Yike Guo, Wei Xue:
Inference-time Scaling for Diffusion-based Audio Super-resolution. CoRR abs/2508.02391 (2025)
[i11]Wenjie Tian, Xinfa Zhu, Hanke Xie, Zhen Ye, Wei Xue, Lei Xie:
Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis. CoRR abs/2508.06262 (2025)
[i10]Pengyu Wang, Shaojun Zhou, Chenkun Tan, Xinghao Wang, Wei Huang, Zhen Ye, Zhaowei Li, Botian Jiang, Dong Zhang, Xipeng Qiu:
UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets. CoRR abs/2509.14738 (2025)- 2024
[c6]Jianyi Chen, Zheqi Dai, Zhen Ye, Xu Tan, Qifeng Liu, Yike Guo, Wei Xue:
PyramidCodec: Hierarchical Codec for Long-form Music Generation in Audio Domain. EMNLP (Findings) 2024: 4253-4263
[c5]Jianyi Chen, Wei Xue, Xu Tan, Zhen Ye, Qifeng Liu, Yike Guo:
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation. IJCAI 2024: 7618-7626
[c4]Yiwen Lu, Zhen Ye, Wei Xue, Xu Tan
, Qifeng Liu, Yike Guo:
COMOSVC: Consistency Model-Based Singing Voice Conversion. ISCSLP 2024: 184-188
[c3]Zhen Ye
, Zeqian Ju
, Haohe Liu
, Xu Tan
, Jianyi Chen
, Yiwen Lu
, Peiwen Sun
, Jiahao Pan
, Weizhen Bian
, Shulin He
, Wei Xue
, Qifeng Liu
, Yike Guo
:
FlashSpeech: Efficient Zero-Shot Speech Synthesis. ACM Multimedia 2024: 6998-7007
[i9]Yiwen Lu, Zhen Ye, Wei Xue, Xu Tan
, Qifeng Liu, Yike Guo
:
CoMoSVC: Consistency Model-based Singing Voice Conversion. CoRR abs/2401.01792 (2024)
[i8]Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan
, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo
, Wei Xue:
FlashSpeech: Efficient Zero-Shot Speech Synthesis. CoRR abs/2404.14700 (2024)
[i7]Jianyi Chen, Wei Xue, Xu Tan
, Zhen Ye, Qifeng Liu, Yike Guo
:
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation. CoRR abs/2405.07682 (2024)
[i6]Shengkang Wang, Hongzhan Lin, Ziyang Luo, Zhen Ye, Guang Chen, Jing Ma:
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models. CoRR abs/2406.11288 (2024)
[i5]Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan
, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo
, Wei Xue:
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model. CoRR abs/2408.17175 (2024)
[i4]Peiwen Sun, Sitong Cheng
, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo
:
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation. CoRR abs/2410.10676 (2024)
[i3]Rao Fu, Ziyang Luo, Hongzhan Lin, Zhen Ye, Jing Ma:
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges. CoRR abs/2411.18932 (2024)- 2023
[c2]Zhen Ye, Wei Xue
, Xu Tan
, Qifeng Liu, Yike Guo
:
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation. IJCAI 2023: 5869-5877
[c1]Zhen Ye
, Wei Xue
, Xu Tan
, Jie Chen
, Qifeng Liu
, Yike Guo
:
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model. ACM Multimedia 2023: 1831-1839
[i2]Zhen Ye, Wei Xue, Xu Tan
, Jie Chen, Qifeng Liu, Yike Guo
:
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model. CoRR abs/2305.06908 (2023)
[i1]Zhen Ye, Wei Xue, Xu Tan
, Qifeng Liu, Yike Guo
:
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation. CoRR abs/2305.12868 (2023)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-01-27 03:54 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







