


default search action
Haizhou Li 0001
李海洲
Person information
- unicode name: 李海洲
- affiliation: Chinese University of Hong Kong (Shenzhen), China
- affiliation: National University of Singapore, Department of Electrical and Computer Engineering, Singapore
- affiliation (2006 - 2016): Nanyang Technological University, Singapore
- affiliation (2003 - 2016): Institute for Infocomm Research, A*STAR, Singapore
- affiliation (2011): University of New South Wales, Sydney, Australia
- affiliation (2009): University of Eastern Finland, Kuopio, Finland
- affiliation (PhD 1990): South China University of Technology, Guangzhou, China
Other persons with the same name
- Haizhou Li 0002 — Blaise Pascal University, Clermont-Ferrand, France
- Haizhou Li 0003 — City University of Hong Kong, Department of Computer Science, Hong Kong
- Haizhou Li 0004 — Beijing Institute of Technology, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2026
[j210]Siqi Cai, Zheyuan Lin
, Xiaoli Liu, Wenjie Wei, Shuai Wang
, Malu Zhang, Tanja Schultz, Haizhou Li:
Spiking neural networks for EEG signal analysis: From theory to practice. Neural Networks 194: 108127 (2026)
[j209]Rui Liu, Zhenqi Jia
, Jie Yang, Yifan Hu, Haizhou Li:
Emphasis rendering for conversational text-to-speech with multi-modal multi-scale context modeling. Speech Commun. 178: 103353 (2026)
[j208]Ziyang Jiang
, Xueyan Chen
, Shuai Wang
, Xinyuan Qian
, Haizhou Li
:
TPEech: Target Speaker Extraction and Noise Suppression With Historical Dialogue Text Cues. IEEE Signal Process. Lett. 33: 351-355 (2026)
[j207]Zeyang Song
, Shimin Zhang, Yuhong Chou, Jibin Wu
, Haizhou Li
:
IML-Spikeformer: Input-Aware Multilevel Spiking Transformer for Speech Processing. IEEE Trans. Neural Networks Learn. Syst. 37(3): 1377-1389 (2026)
[c794]Rui Ke, Jiahui Xu, Shenghao Yang, Kuang Wang, Feng Jiang, Haizhou Li:
CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation. AAAI 2026: 31419-31428
[i282]Zhixian Zhao, Shuiyuan Wang, Guojian Li, Hongfei Xue, Chengyou Wang, Shuai Wang, Longshuai Xiao, Zihan Zhang, Hui Bu, Xin Xu, Xinsheng Wang, Hexin Liu, Eng Siong Chng, Hung-yi Lee, Haizhou Li, Lei Xie:
The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era. CoRR abs/2601.05564 (2026)
[i281]Li Zhou, Hao Jiang, Junjie Li, Tianrui Wang, Haizhou Li:
EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis. CoRR abs/2601.22873 (2026)- 2025
[j206]Rui Liu
, Zhenqi Jia, Feilong Bao, Haizhou Li:
Retrieval-Augmented Dialogue Knowledge Aggregation for expressive conversational speech synthesis. Inf. Fusion 118: 102948 (2025)
[j205]Rui Liu
, Hongyu Yuan
, Guanglai Gao, Haizhou Li:
Listening and seeing again: Generative error correction for audio-visual speech recognition. Inf. Fusion 120: 103077 (2025)
[j204]Rui Liu
, Jinhua Zhang, Haizhou Li:
Hierarchical multi-source cues fusion for mono-to-binaural based Audio Deepfake Detection. Inf. Fusion 120: 103097 (2025)
[j203]Xinyuan Qian
, Jiaran Gao, Yaodan Zhang, Qiquan Zhang
, Hexin Liu
, Leibny Paola García-Perera
, Haizhou Li
:
SAV-SE: Scene-Aware Audio-Visual Speech Enhancement With Selective State Space Model. IEEE J. Sel. Top. Signal Process. 19(4): 623-634 (2025)
[j202]Wenxuan Wu
, Xueyuan Chen, Shuai Wang
, Jiadong Wang, Lingwei Meng, Xixin Wu
, Helen Meng
, Haizhou Li
:
$C^{2}$AV-TSE: Context and Confidence-Aware Audio Visual Target Speaker Extraction. IEEE J. Sel. Top. Signal Process. 19(4): 646-657 (2025)
[j201]Kristen Grauman
, Andrew Westbury, Eugene Byrne
, Vincent Cartillier
, Zachary Chavis, Antonino Furnari
, Rohit Girdhar
, Jackson Hamburger
, Hao Jiang, Devansh Kukreja
, Miao Liu
, Xingyu Liu
, Miguel Martin, Tushar Nagarajan
, Ilija Radosavovic
, Santhosh Kumar Ramakrishnan
, Fiona Ryan, Jayant Sharma, Michael Wray
, Mengmeng Xu
, Eric Zhongcong Xu, Chen Zhao
, Siddhant Bansal
, Dhruv Batra
, Sean Crane, Tien Do, Morrie Doulaty
, Akshay Erapalli, Christoph Feichtenhofer
, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie
, Cristina González
, James Hillis, Xuhua Huang, Yifei Huang
, Wenqi Jia, Weslie Khoo, Jáchym Kolár, Satwik Kottur
, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li
, Karttikeya Mangalam
, Raghava Modhugu, Jonathan Munro, Tullie Murrell
, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes
, Merey Ramazanova
, Leda Sari
, Kiran K. Somasundaram
, Audrey Southerland
, Yusuke Sugano
, Ruijie Tao
, Minh Vo, Yuchen Wang
, Xindi Wu, Takuma Yagi
, Ziwei Zhao, Yunyi Zhu, Pablo Arbeláez
, David Crandall
, Dima Damen
, Giovanni Maria Farinella
, Christian Fuegen, Bernard Ghanem
, Vamsi Krishna Ithapu, C. V. Jawahar
, Hanbyul Joo
, Kris Kitani
, Haizhou Li
, Richard A. Newcombe
, Aude Oliva, Hyun Soo Park
, James M. Rehg
, Yoichi Sato
, Jianbo Shi, Mike Zheng Shou
, Antonio Torralba, Lorenzo Torresani
, Mingfei Yan, Jitendra Malik
:
Ego4D: Around the World in 3,600 Hours of Egocentric Video. IEEE Trans. Pattern Anal. Mach. Intell. 47(11): 9468-9509 (2025)
[j200]Xinyuan Qian
, Xianghu Yue
, Jiadong Wang
, Huiping Zhuang
, Haizhou Li
:
Analytic Class Incremental Learning for Sound Source Localization With Privacy Protection. IEEE Signal Process. Lett. 32: 726-730 (2025)
[j199]Yi Ma
, Shuai Wang
, Tianchi Liu
, Haizhou Li
:
ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification. IEEE Signal Process. Lett. 32: 731-735 (2025)
[j198]Qiyuan Sun, Haolin Zuo
, Rui Liu
, Haizhou Li
:
Connecting Cross-Modal Representations for Compact and Robust Multimodal Sentiment Analysis With Sentiment Word Substitution Error. IEEE Trans. Affect. Comput. 16(3): 1265-1276 (2025)
[j197]Sho Inoue
, Kun Zhou
, Shuai Wang
, Haizhou Li
:
Hierarchical Control of Emotion Rendering in Speech Synthesis. IEEE Trans. Affect. Comput. 16(4): 3316-3328 (2025)
[j196]Qianhui Liu
, Jiadong Wang
, Yang Wang
, Xin Yang, Gang Pan
, Haizhou Li
:
Human-Inspired Computing for Robust and Efficient Audio-Visual Speech Recognition. IEEE Trans. Computers 74(9): 2950-2961 (2025)
[j195]Siqi Cai
, Ran Zhang, Hongxu Zhu
, Haizhou Li
:
Modeling the Temporal Dynamics of EEG Signals in Selective Listening. IEEE Trans. Consumer Electron. 71(1): 1115-1124 (2025)
[j194]Hui Tian
, Yiqin Qiu
, Haizhou Li
, Xinpeng Zhang
, Athanasios V. Vasilakos
:
Universal Low Bit-Rate Speech Steganalysis Integrating Domain-Specific and Domain-Shared Knowledge. IEEE Trans. Dependable Secur. Comput. 22(5): 5382-5396 (2025)
[j193]Tianchi Liu
, Duc-Tuan Truong
, Rohan Kumar Das
, Kong Aik Lee
, Haizhou Li
:
Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-Spoofing. IEEE Trans. Inf. Forensics Secur. 20: 12005-12018 (2025)
[j192]Jiqing Zhang
, Malu Zhang
, Yuanchen Wang
, Qianhui Liu
, Baocai Yin
, Haizhou Li
, Xin Yang:
Spiking Neural Networks With Adaptive Membrane Time Constant for Event-Based Tracking. IEEE Trans. Image Process. 34: 1009-1021 (2025)
[j191]Ruijie Tao
, Xinyuan Qian
, Rohan Kumar Das
, Xiaoxue Gao, Jiadong Wang
, Haizhou Li
:
Enhancing Real-World Active Speaker Detection With Multi-Modal Extraction Pre-Training. IEEE Trans. Multim. 27: 2362-2373 (2025)
[j190]Ruihang Ji
, Dongyu Li
, Shuzhi Sam Ge
, Haizhou Li
:
Tunnel Prescribed Control of Nonlinear Systems With Unknown Control Directions. IEEE Trans. Neural Networks Learn. Syst. 36(1): 1383-1395 (2025)
[j189]Malu Zhang
, Xiaoling Luo
, Jibin Wu
, Ammar Belatreche
, Siqi Cai
, Yang Yang
, Haizhou Li
:
Toward Building Human-Like Sequential Memory Using Brain-Inspired Spiking Neural Models. IEEE Trans. Neural Networks Learn. Syst. 36(6): 10143-10155 (2025)
[j188]Yan Xiao
, Yaochu Jin
, Bin Wang
, Yan Zhang, Kuangrong Hao
, Haizhou Li
:
Zero-Shot Relation Classification Through Inference on Category Attributes. IEEE Trans. Neural Networks Learn. Syst. 36(7): 13135-13148 (2025)
[c793]Rui Liu
, Shuwei He, Yifan Hu, Haizhou Li:
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech. AAAI 2025: 24632-24640
[c792]Chenyu Yang, Shuai Wang, Hangting Chen, Jianwei Yu, Wei Tan, Rongzhi Gu, Yaoxun Xu, Yizhi Zhou
, Haina Zhu, Haizhou Li:
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor. AAAI 2025: 25597-25605
[c791]Chen Zhang, Dading Chong, Feng Jiang, Chengguang Tang, Anningzhe Gao, Guohua Tang, Haizhou Li:
Aligning Language Models Using Follow-up Likelihood as Reward Signal. AAAI 2025: 25832-25841
[c790]Yifan Hu, Rui Liu, Yi Ren, Xiang Yin, Haizhou Li:
Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis. ACL (Findings) 2025: 1988-2003
[c789]Jianqing Zhu, Huang Huang, Zhihang Lin, Juhao Liang, Zhengyang Tang, Khalid Almubarak, Mosen Alharthi, Bang An, Juncai He, Xiangbo Wu, Fei Yu, Junying Chen, Zhuoheng Ma, Yuhao Du, He Zhang, Saied Alshahrani, Emad A. Alghamdi, Lian Zhang, Ruoyu Sun, Haizhou Li, Benyou Wang, Jinchao Xu:
Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion. ACL (1) 2025: 2025-2042
[c788]Yuhao Zhang, Zhiheng Liu, Fan Bu, Ruiyu Zhang, Benyou Wang, Haizhou Li:
Soundwave: Less is More for Speech-Text Alignment in LLMs. ACL (1) 2025: 18718-18738
[c787]Yidi Jiang, Qian Chen, Shengpeng Ji, Yu Xi, Wen Wang, Chong Zhang, Xianghu Yue, Shiliang Zhang, Haizhou Li:
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook. ACL (1) 2025: 19112-19124
[c786]Kuang Wang, Xianfei Li, Shenghao Yang, Li Zhou, Feng Jiang, Haizhou Li:
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles. ACL (1) 2025: 21082-21107
[c785]Tianchi Liu, Ruijie Tao, Qiongqiong Wang, Yidi Jiang, Hardik B. Sailor, Ke Zhang, Jingru Lin, Haizhou Li:
Interpolating Speaker Identities in Embedding Space for Data Expansion. APSIPA 2025: 589-594
[c784]Mehmet Sinan Yildirim, Ruijie Tao, Wupeng Wang, Junyi Ao, Haizhou Li:
Leveraging Language Information for Target Language Extraction. APSIPA 2025: 837-842
[c783]Huhong Xian, Rui Liu, Berrak Sisman, Haizhou Li:
NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation. APSIPA 2025: 2199-2204
[c782]Ruijie Tao, Zhan Shi, Yidi Jiang, Tianchi Liu, Haizhou Li:
Voice Conversion Augmentation for Speaker Recognition on Defective Datasets. APSIPA 2025: 2529-2534
[c781]Liping Chen, Kong-Aik Lee, Zhen-Hua Ling, Xin Wang, Rohan Kumar Das, Tomoki Toda, Haizhou Li:
Speaker Privacy and Security in the Big Data Era: Protection and Defense Against Deepfake. APSIPA 2025: 2570-2575
[c780]Zhenqi Jia, Rui Liu, Berrak Sisman, Haizhou Li:
Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis. EMNLP 2025: 8852-8858
[c779]Simin Chen, Yiming Chen, Zexin Li, Yifan Jiang, Zhongwei Wan, Yixin He, Dezhi Ran, Tianle Gu, Haizhou Li, Tao Xie, Baishakhi Ray:
Benchmarking Large Language Models Under Data Contamination: A Survey from Static to Dynamic Evaluation. EMNLP 2025: 10080-10098
[c778]Xunlian Dai, Li Zhou, Benyou Wang, Haizhou Li:
From Word to World: Evaluate and Mitigate Culture Bias in LLMs via Word Association Test. EMNLP 2025: 24510-24526
[c777]Li Zhou, Lutong Yu, Dongchu Xie, Shaohuan Cheng, Wenyan Li, Haizhou Li:
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation. EMNLP 2025: 24616-24638
[c776]Rui Liu
, Xiaofen Xing, Zheng Lian, Haizhou Li, Björn W. Schuller, Haolin Zuo:
MEIJU - The 1st Multimodal Emotion and Intent Joint Understanding Challenge. ICASSP 2025: 1-2
[c775]Marvin Borsdorf
, Zexu Pan, Pascal Himmelmann, Haizhou Li, Tanja Schultz:
Speech Separation for Low-Resource Languages. ICASSP 2025: 1-5
[c774]Sho Inoue, Shuai Wang, Wanxing Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou Li:
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion. ICASSP 2025: 1-5
[c773]Zhijun Liu, Shuai Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou Li:
E1 TTS: Simple and Fast Non-Autoregressive TTS. ICASSP 2025: 1-5
[c772]Saurav Pahuja
, Gabriel Ivucic, Siqi Cai, Dashanka De Silva, Tanja Schultz, Haizhou Li:
ATGnet: Adaptive Temporal Graph Network for EEG-enabled Sound Source Tracking in Cocktail Party Scenarios. ICASSP 2025: 1-5
[c771]Ke Zhang, Junjie Li, Shuai Wang, Yangjie Wei, Yi Wang, Yannan Wang, Haizhou Li:
Multi-Level Speaker Representation for Target Speaker Extraction. ICASSP 2025: 1-5
[c770]Xuerui Qiu, Malu Zhang, Jieyuan Zhang, Wenjie Wei, Honglin Cao, Junsheng Guo, Rui-Jie Zhu, Yimeng Shan, Yang Yang, Haizhou Li:
Quantized Spike-driven Transformer. ICLR 2025
[c769]Junjie Li, Ke Zhang, Shuai Wang, Kong Aik Lee, Man-Wai Mak, Haizhou Li:
MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues. ICME 2025: 1-6
[c768]Honglin Cao, Zijian Zhou, Wenjie Wei, Yu Liang, Ammar Belatreche, Dehao Zhang, Malu Zhang, Yang Yang, Haizhou Li:
Binary Event-Driven Spiking Transformer. IJCAI 2025: 4110-4118
[c767]Rui Liu, Pu Gao, Jiatian Xi, Berrak Sisman, Carlos Busso, Haizhou Li:
Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset. INTERSPEECH 2025
[c766]Qibing Bai, Sho Inoue, Shuai Wang, Zhongjie Jiang, Yannan Wang, Haizhou Li:
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data. INTERSPEECH 2025
[c765]Sho Inoue, Shuai Wang, Haizhou Li:
PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs. INTERSPEECH 2025
[c764]Shaole Li, Shuai Wang, Jiangyu Han, Ke Zhang, Wupeng Wang, Haizhou Li:
REAL-T: Real Conversational Mixtures for Target Speaker Extraction. INTERSPEECH 2025
[c763]Sirui Li, Shuai Wang, Zhijun Liu, Zhongjie Jiang, Yannan Wang, Haizhou Li:
SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms. INTERSPEECH 2025
[c762]Zheyuan Lin, Siqi Cai, Haizhou Li:
Decoding Listener's Identity: Person Identification from EEG Signals Using a Lightweight Spiking Transformer. INTERSPEECH 2025
[c761]Saurav Pahuja
, Gabriel Ivucic, Siqi Cai, Dashanka De Silva, Haizhou Li, Tanja Schultz:
GTAnet: Geometry-Guided Temporal Attention for EEG-Based Sound Source Tracking in Cocktail Party Scenarios. INTERSPEECH 2025
[c760]Dashanka De Silva, Siqi Cai, Saurav Pahuja
, Tanja Schultz, Haizhou Li:
NeuroSpex+: Dual-Task Training of Neuro-Guided Speaker Extraction with Speech Envelope and Waveform. INTERSPEECH 2025
[c759]Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li:
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction. INTERSPEECH 2025
[c758]Chenyu Yang, Hangting Chen, Shuai Wang, Haina Zhu, Haizhou Li:
TVC-MusicGen: Time-Varying Structure Control for Background Music Generation via Self-Supervised Training. INTERSPEECH 2025
[c757]Xueyi Zhang
, Peiyin Zhu
, Yuan Liao
, Xiyu Wang
, Mingrui Lao
, Siqi Cai
, Yanming Guo
, Haizhou Li
:
TrustCLIP: Learning from Noisy Labels via Semantic Label Verification and Trust-aligned Gradient Projection. ACM Multimedia 2025: 4388-4397
[c756]Xueyi Zhang
, Jialu Sun
, Chengwei Zhang
, Xianghu Yue
, Tianfang Xiao
, Siqi Cai
, Mingrui Lao
, Haizhou Li
:
EventLip: Enhancing Event-Based Lip Reading via Frequency-Aware Spatiotemporal Hypergraph Modeling. ACM Multimedia 2025: 8263-8272
[c755]Yifan Hu
, Rui Liu
, Yi Ren
, Xiang Yin
, Haizhou Li
:
UniTalker: Conversational Speech-Visual Synthesis. ACM Multimedia 2025: 10248-10257
[c754]Chuang Li, Yang Deng
, Hengchang Hu, Min-Yen Kan, Haizhou Li:
ChatCRS: Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems. NAACL (Findings) 2025: 295-312
[c753]Ziche Liu, Rui Ke, Yajiao Liu, Feng Jiang, Haizhou Li:
Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models. NAACL (Long Papers) 2025: 6595-6611
[c752]Li Zhou, Taelin Karidi, Wanlong Liu, Nicolas Garneau, Yong Cao, Wenyu Chen, Haizhou Li, Daniel Hershcovich:
Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge. NAACL (Long Papers) 2025: 9840-9867
[c751]Mingrui Lao
, Zheng Li
, Yanming Guo
, Xueyi Zhang
, Siqi Cai
, Zhaoyun Ding
, Haizhou Li
:
Boosting Discriminability for Robust Multimodal Entity Linking with Visual Modality Missing. SIGIR 2025: 989-999
[c750]Qiquan Zhang, Moran Chen, Zeyang Song, Hexin Liu, Xiangyu Zhang, Haizhou Li:
Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study. WASPAA 2025: 1-5
[c749]Zihao Cheng
, Li Zhou
, Feng Jiang
, Benyou Wang
, Haizhou Li
:
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement. WWW 2025: 2677-2688
[e25]Haizhou Li
, Tanja Schultz
, Yalei Bi, Jian Zhu, Hongsheng He
, Jun Ma, Siqi Cai
, Wanyue Jiang
, Shuzhi Sam Ge
:
Social Robotics - 16th International Conference, ICSR + InnoBiz 2024, Shenzhen, China, September 25-28, 2024, Proceedings. Lecture Notes in Computer Science 15170, Springer 2025, ISBN 978-981-96-1150-8 [contents]
[i280]Rui Liu
, Hongyu Yuan, Haizhou Li:
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition. CoRR abs/2501.04038 (2025)
[i279]Yi Ma, Shuai Wang, Tianchi Liu, Haizhou Li:
ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification. CoRR abs/2501.05729 (2025)
[i278]Honglin Cao, Zijian Zhou, Wenjie Wei, Ammar Belatreche, Yu Liang, Dehao Zhang, Malu Zhang, Yang Yang, Haizhou Li:
Binary Event-Driven Spiking Transformer. CoRR abs/2501.05904 (2025)
[i277]Rui Liu
, Zhenqi Jia, Feilong Bao, Haizhou Li:
Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis. CoRR abs/2501.06467 (2025)
[i276]Xianghu Yue, Yiming Chen, Xueyi Zhang, Xiaoxue Gao, Mengling Feng, Mingrui Lao, Huiping Zhuang, Haizhou Li:
PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning. CoRR abs/2501.09352 (2025)
[i275]Xuerui Qiu, Jieyuan Zhang, Wenjie Wei, Honglin Cao, Junsheng Guo, Rui-Jie Zhu, Yimeng Shan, Yang Yang, Malu Zhang, Haizhou Li:
Quantized Spike-driven Transformer. CoRR abs/2501.13492 (2025)
[i274]Qiquan Zhang, Buddhi Wickramasinghe, Eliathamby Ambikairajah, Vidhyasaharan Sethu, Haizhou Li:
Should Audio Front-ends be Adaptive? Comparing Learnable and Adaptive Front-ends. CoRR abs/2502.03260 (2025)
[i273]Li Zhou, Ruijie Zhang, Xunlian Dai, Daniel Hershcovich, Haizhou Li:
Large Language Models Penetration in Scholarly Writing and Peer Review. CoRR abs/2502.11193 (2025)
[i272]Yuhao Zhang, Zhiheng Liu, Fan Bu, Ruiyu Zhang, Benyou Wang, Haizhou Li:
Soundwave: Less is More for Speech-Text Alignment in LLMs. CoRR abs/2502.12900 (2025)
[i271]Simin Chen, Yiming Chen, Zexin Li, Yifan Jiang, Zhongwei Wan, Yixin He, Dezhi Ran, Tianle Gu, Haizhou Li, Tao Xie, Baishakhi Ray:
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation. CoRR abs/2502.17521 (2025)
[i270]Kuang Wang, Xianfei Li, Shenghao Yang, Li Zhou, Feng Jiang, Haizhou Li:
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles. CoRR abs/2502.18968 (2025)
[i269]Yidi Jiang, Qian Chen, Shengpeng Ji, Yu Xi, Wen Wang, Chong Zhang
, Xianghu Yue, Shiliang Zhang, Haizhou Li:
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook. CoRR abs/2502.20067 (2025)
[i268]Feng Jiang, Zhiyu Lin, Fan Bu, Yuhao Du, Benyou Wang, Haizhou Li:
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information. CoRR abs/2503.05085 (2025)
[i267]Wupeng Wang, Zexu Pan, Jingru Lin, Shuai Wang, Haizhou Li:
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation. CoRR abs/2503.12589 (2025)
[i266]Junyi Ao, Dekun Chen, Xiaohai Tian, Wenjie Feng, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu:
Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context. CoRR abs/2503.15338 (2025)
[i265]Wenxuan Wu, Xueyuan Chen, Shuai Wang, Jiadong Wang, Lingwei Meng, Xixin Wu, Helen Meng, Haizhou Li:
C2/AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction. CoRR abs/2504.00750 (2025)
[i264]Wupeng Wang, Zexu Pan, Xinke Li
, Shuai Wang, Haizhou Li:
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation. CoRR abs/2504.02302 (2025)
[i263]Tianchi Liu, Duc-Tuan Truong, Rohan Kumar Das, Kong Aik Lee, Haizhou Li:
Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing. CoRR abs/2504.05657 (2025)
[i262]Renjie Li, Wenjie Wei, Qi Xin, Xiaoli Liu, Sixuan Mao, Erik Ma, Zijian Chen, Malu Zhang, Haizhou Li, Zhaoyu Zhang:
What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips. CoRR abs/2505.05794 (2025)
[i261]Yifan Hu, Rui Liu, Yi Ren, Xiang Yin, Haizhou Li:
Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis. CoRR abs/2505.12597 (2025)
[i260]Sho Inoue, Shuai Wang, Haizhou Li:
PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs. CoRR abs/2505.14356 (2025)
[i259]Xunlian Dai, Li Zhou, Benyou Wang, Haizhou Li:
From Word to World: Evaluate and Mitigate Culture Bias via Word Association Test. CoRR abs/2505.18562 (2025)
[i258]Rui Liu
, Pu Gao, Jiatian Xi, Berrak Sisman, Carlos Busso, Haizhou Li:
Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset. CoRR abs/2505.20341 (2025)
[i257]Li Zhou, Lutong Yu, Dongchu Xie, Shaohuan Cheng, Wenyan Li, Haizhou Li:
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation. CoRR abs/2506.01565 (2025)
[i256]Chenyu Yang, Shuai Wang, Hangting Chen, Wei Tan, Jianwei Yu, Haizhou Li:
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement. CoRR abs/2506.07634 (2025)
[i255]Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li:
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction. CoRR abs/2506.09792 (2025)
[i254]Sirui Li, Shuai Wang, Zhijun Liu, Zhongjie Jiang, Yannan Wang, Haizhou Li:
SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms. CoRR abs/2506.13709 (2025)
[i253]Li Zhou, Hao Jiang, Junjie Li, Zefeng Zhao, Feng Jiang, Wenyu Chen, Haizhou Li:
Do We Really Need GNNs with Explicit Structural Modeling? MLPs Suffice for Language Model Representations. CoRR abs/2506.21682 (2025)
[i252]Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li:
Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis. CoRR abs/2507.04598 (2025)
[i251]Yu Chen, Xinyuan Qian, Hongxu Zhu, Jiadong Wang, Kainan Chen, Haizhou Li:
VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching. CoRR abs/2507.07384 (2025)
[i250]Zeyang Song, Shimin Zhang, Yuhong Chou, Jibin Wu, Haizhou Li:
IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing. CoRR abs/2507.07396 (2025)
[i249]Junjie Li, Wenxuan Wu, Shuai Wang, Zexu Pan, Kong Aik Lee, Helen Meng, Haizhou Li:
MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions. CoRR abs/2507.15294 (2025)
[i248]Qibing Bai, Sho Inoue, Shuai Wang, Zhongjie Jiang, Yannan Wang, Haizhou Li:
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data. CoRR abs/2507.17735 (2025)
[i247]Chuang Li, Yang Deng, Hengchang Hu, See-Kiong Ng, Min-Yen Kan, Haizhou Li:
CARE: Contextual Adaptation of Recommenders for LLM-based Conversational Recommendation. CoRR abs/2508.13889 (2025)
[i246]Junying Chen, Zhenyang Cai, Zhiheng Liu
, Yunjin Yang, Rongsheng Wang, Qingying Xiao, Xiangyi Feng, Zhan Su, Jing Guo, Xiang Wan, Guangjun Yu, Haizhou Li, Benyou Wang:
ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine. CoRR abs/2508.14706 (2025)
[i245]Tianchi Liu, Ruijie Tao, Qiongqiong Wang, Yidi Jiang, Hardik B. Sailor, Ke Zhang, Jingru Lin, Haizhou Li:
Interpolating Speaker Identities in Embedding Space for Data Expansion. CoRR abs/2508.19210 (2025)
[i244]Huhong Xian, Rui Liu, Berrak Sisman, Haizhou Li:
NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation. CoRR abs/2509.03829 (2025)
[i243]Zhenqi Jia, Rui Liu, Berrak Sisman, Haizhou Li:
Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis. CoRR abs/2509.06074 (2025)
[i242]Yuhao Zhang, Yuhao Du, Zhanchen Dai, Xiangnan Ma, Kaiqi Kou, Benyou Wang, Haizhou Li:
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs. CoRR abs/2509.09174 (2025)
[i241]Mingchen Shao, Bingshen Mu, Chengyou Wang, Haizhou Li, Ying Yan, Zhonghua Fu, Lei Xie:
Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages. CoRR abs/2509.14804 (2025)
[i240]Dehao Zhang, Malu Zhang, Shuai Wang, Jingya Wang, Wenjie Wei, Zeyu Ma, Guoyin Wang, Yang Yang, Haizhou Li:
Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling. CoRR abs/2509.17186 (2025)
[i239]Wenjie Wei, Malu Zhang, Jieyuan Zhang, Ammar Belatreche, Shuai Wang, Yimeng Shan, Hanwen Liu, Honglin Cao, Guoqing Wang, Yang Yang, Haizhou Li:
S2NN: Sub-bit Spiking Neural Networks. CoRR abs/2509.24266 (2025)
[i238]Suli Wang, Yang-yang Li, Siqi Cai, Haizhou Li:
A Robust Multi-Scale Framework with Test-Time Adaptation for sEEG-Based Speech Decoding. CoRR abs/2509.24700 (2025)
[i237]Ziyi Zeng, Zhenyang Cai, Yixi Cai, Xidong Wang, Junying Chen, Rongsheng Wang, Yipeng Liu, Siqi Cai, Benyou Wang, Zhiguo Zhang, Haizhou Li:
WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities. CoRR abs/2510.00032 (2025)
[i236]Jingru Lin, Chen Zhang, Stephen Y. Liu, Haizhou Li:
RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems. CoRR abs/2510.13910 (2025)
[i235]Zheyuan Lin, Siqi Cai, Haizhou Li:
Decoding Listeners Identity: Person Identification from EEG Signals Using a Lightweight Spiking Transformer. CoRR abs/2510.17879 (2025)
[i234]Hanyu Meng, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Qiquan Zhang, Haizhou Li:
Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing. CoRR abs/2510.18206 (2025)
[i233]Jieyuan Zhang, Xiaolong Zhou
, Shuai Wang, Wenjie Wei, Hanwen Liu, Qian Sun, Malu Zhang, Yang Yang, Haizhou Li:
Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks. CoRR abs/2510.21403 (2025)
[i232]Li Zhou, Lutong Yu, You Lyu, Yihang Lin, Zefeng Zhao, Junyi Ao, Yuhao Zhang, Benyou Wang, Haizhou Li:
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models. CoRR abs/2510.22758 (2025)
[i231]Mehmet Sinan Yildirim, Ruijie Tao, Wupeng Wang, Junyi Ao, Haizhou Li:
Leveraging Language Information for Target Language Extraction. CoRR abs/2511.01652 (2025)
[i230]Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li:
ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction. CoRR abs/2511.06288 (2025)
[i229]Qianhui Liu, Jing Yang, Miao Yu, Trevor E. Carlson, Gang Pan, Haizhou Li, Zhumin Chen:
Efficient Eye-based Emotion Recognition via Neural Architecture Search of Time-to-First-Spike-Coded Spiking Neural Networks. CoRR abs/2512.02459 (2025)
[i228]Rui Ke, Jiahui Xu, Shenghao Yang, Kuang Wang, Feng Jiang, Haizhou Li:
CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation. CoRR abs/2512.21715 (2025)- 2024
[j187]Qianhui Liu
, Meng Ge, Haizhou Li:
Intelligent event-based lip reading word classification with spiking neural networks using spatio-temporal attention features and triplet loss. Inf. Sci. 675: 120660 (2024)
[j186]Jiaqi Yan, Qianhui Liu
, Malu Zhang
, Lang Feng
, De Ma, Haizhou Li, Gang Pan:
Efficient spiking neural network design via neural architecture search. Neural Networks 173: 106172 (2024)
[j185]Xinyi Chen
, Qu Yang
, Jibin Wu
, Haizhou Li
, Kay Chen Tan
:
A Hybrid Neural Coding Approach for Pattern Recognition With Spiking Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 46(5): 3064-3078 (2024)
[j184]Shuai Wang
, Zhengyang Chen, Bing Han, Hongji Wang, Chengdong Liang, Binbin Zhang, Xu Xiang
, Wen Ding, Johan Rohdin, Anna Silnova, Yanmin Qian, Haizhou Li:
Advancing speaker embedding learning: Wespeaker toolkit for research and production. Speech Commun. 162: 103104 (2024)
[j183]Jingru Lin
, Meng Ge
, Wupeng Wang
, Haizhou Li
, Mengling Feng
:
Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech. IEEE Signal Process. Lett. 31: 1014-1018 (2024)
[j182]Duo Ma
, Xianghu Yue
, Junyi Ao
, Xiaoxue Gao
, Haizhou Li
:
Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks. IEEE Signal Process. Lett. 31: 2055-2059 (2024)
[j181]Xiaoxue Gao
, Zexin Li
, Yiming Chen
, Cong Liu, Haizhou Li
:
Transferable Adversarial Attacks Against ASR. IEEE Signal Process. Lett. 31: 2200-2204 (2024)
[j180]Rui Liu
, Haolin Zuo
, Zheng Lian
, Björn W. Schuller
, Haizhou Li
:
Contrastive Learning Based Modality-Invariant Feature Acquisition for Robust Multimodal Emotion Recognition With Missing Modalities. IEEE Trans. Affect. Comput. 15(4): 1856-1873 (2024)
[j179]Qu Yang
, Malu Zhang
, Jibin Wu
, Kay Chen Tan
, Haizhou Li
:
LC-TTFS: Toward Lossless Network Conversion for Spiking Neural Networks With TTFS Coding. IEEE Trans. Cogn. Dev. Syst. 16(5): 1626-1639 (2024)
[j178]Siqi Cai
, Ran Zhang
, Malu Zhang
, Jibin Wu
, Haizhou Li
:
EEG-Based Auditory Attention Detection With Spiking Graph Convolutional Network. IEEE Trans. Cogn. Dev. Syst. 16(5): 1698-1706 (2024)
[j177]Koichiro Yoshino
, Yun-Nung Chen
, Paul A. Crook
, Satwik Kottur, Jinchao Li, Behnam Hedayatnia, Seungwhan Moon, Zhengcong Fei, Zekang Li
, Jinchao Zhang, Yang Feng
, Jie Zhou
, Seokhwan Kim
, Yang Liu, Di Jin
, Alexandros Papangelis, Karthik Gopalakrishnan, Dilek Hakkani-Tur
, Babak Damavandi, Alborz Geramifard, Chiori Hori
, Ankit Shah, Chen Zhang, Haizhou Li
, João Sedoc, Luis F. D'Haro
, Rafael E. Banchs, Alexander Rudnicky
:
Overview of the Tenth Dialog System Technology Challenge: DSTC10. IEEE ACM Trans. Audio Speech Lang. Process. 32: 765-778 (2024)
[j176]Lei Liu
, Li Liu
, Haizhou Li
:
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1559-1572 (2024)
[j175]Xuehao Zhou
, Mingyang Zhang
, Yi Zhou
, Zhizheng Wu
, Haizhou Li
:
Accented Text-to-Speech Synthesis With Limited Data. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1699-1711 (2024)
[j174]Rui Liu
, Berrak Sisman
, Guanglai Gao
, Haizhou Li
:
Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2188-2201 (2024)
[j173]Tianchi Liu
, Kong Aik Lee
, Qiongqiong Wang
, Haizhou Li
:
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2324-2337 (2024)
[j172]Congcong Sun
, Hui Tian
, Peng Tian, Haizhou Li
, Zhenxing Qian
:
Multi-Agent Deep Learning for the Detection of Multiple Speech Steganography Methods. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2957-2972 (2024)
[j171]Mingyang Zhang
, Yi Zhou, Yi Ren, Chen Zhang
, Xiang Yin, Haizhou Li
:
RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4146-4156 (2024)
[j170]Wupeng Wang
, Zexu Pan
, Xinke Li
, Shuai Wang
, Haizhou Li
:
Speech Separation With Pretrained Frontend to Minimize Domain Mismatch. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4184-4198 (2024)
[j169]Zexu Pan
, Marvin Borsdorf
, Siqi Cai
, Tanja Schultz
, Haizhou Li
:
NeuroHeed: Neuro-Steered Speaker Extraction Using EEG Signals. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4456-4470 (2024)
[j168]Yicheng Gu
, Xueyao Zhang, Liumeng Xue, Haizhou Li
, Zhizheng Wu
:
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoders. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4569-4579 (2024)
[j167]Shuai Wang
, Zhengyang Chen, Kong Aik Lee
, Yanmin Qian
, Haizhou Li
:
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4971-4998 (2024)
[j166]Siqi Cai
, Tanja Schultz
, Haizhou Li
:
Brain Topology Modeling With EEG-Graphs for Auditory Spatial Attention Detection. IEEE Trans. Biomed. Eng. 71(1): 171-182 (2024)
[j165]Miao Liu
, Jing Wang
, Xinyuan Qian
, Haizhou Li
:
Audio-Visual Temporal Forgery Detection Using Embedding-Level Fusion and Multi-Dimensional Contrastive Loss. IEEE Trans. Circuits Syst. Video Technol. 34(8): 6937-6948 (2024)
[j164]Zhenyu Weng
, Huiping Zhuang
, Fulin Luo
, Haizhou Li
, Zhiping Lin
:
Few-Shot Contrastive Transfer Learning With Pretrained Model for Masked Face Verification. IEEE Trans. Multim. 26: 3871-3883 (2024)
[j163]Xinyuan Qian
, Wei Xue
, Qiquan Zhang
, Ruijie Tao
, Haizhou Li
:
Deep Cross-Modal Retrieval Between Spatial Image and Acoustic Speech. IEEE Trans. Multim. 26: 4480-4489 (2024)
[j162]Siqi Cai
, Peiwen Li, Haizhou Li
:
A Bio-Inspired Spiking Attentional Neural Network for Attentional Selection in the Listening Brain. IEEE Trans. Neural Networks Learn. Syst. 35(12): 17387-17397 (2024)
[j161]Ruihang Ji
, Shuzhi Sam Ge
, Kai Zhao
, Haizhou Li
:
Event-Triggered Tracking Control for Nonlinear Systems With Prescribed Performance. IEEE Trans. Syst. Man Cybern. Syst. 54(6): 3547-3557 (2024)
[c748]Shimin Zhang
, Qu Yang, Chenxiang Ma
, Jibin Wu, Haizhou Li, Kay Chen Tan:
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling. AAAI 2024: 16838-16847
[c747]Rui Liu
, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li:
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling. AAAI 2024: 18698-18706
[c746]Jiadong Wang, Zexu Pan, Malu Zhang, Robby T. Tan, Haizhou Li:
Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition. AAAI 2024: 19144-19152
[c745]Chen Zhang, Luis Fernando D'Haro
, Yiming Chen
, Malu Zhang, Haizhou Li:
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators. AAAI 2024: 19515-19524
[c744]Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D'Haro
, Robby T. Tan, Haizhou Li:
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models. ACL (Findings) 2024: 1359-1375
[c743]Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li:
Fine-Grained Quantitative Emotion Editing for Speech Generation. APSIPA 2024: 1-6
[c742]Feng Jiang, Weihao Liu, Xiaomin Chu, Peifeng Li, Qiaoming Zhu, Haizhou Li:
Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark. LREC/COLING 2024: 495-506
[c741]Danqing Luo, Chen Zhang, Yan Zhang, Haizhou Li:
CrossTune: Black-Box Few-Shot Classification with Label Enhancement. LREC/COLING 2024: 4185-4197
[c740]Yaxin Fan
, Feng Jiang, Peifeng Li, Haizhou Li:
Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study. LREC/COLING 2024: 16998-17010
[c739]Gabriel Ivucic, Saurav Pahuja
, Felix Putze, Siqi Cai, Haizhou Li, Tanja Schultz:
The Impact of Cross-Validation Schemes for EEG-Based Auditory Attention Detection with Deep Neural Networks. EMBC 2024: 1-4
[c738]Chen Zhang, Chengguang Tang, Dading Chong, Ke Shi, Guohua Tang, Feng Jiang, Haizhou Li:
TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models. EMNLP (Findings) 2024: 8926-8946
[c737]Yiming Chen, Xianghu Yue, Xiaoxue Gao, Chen Zhang, Luis Fernando D'Haro
, Robby T. Tan, Haizhou Li:
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models. EMNLP (Findings) 2024: 10917-10930
[c736]Jiabao Pan, Yan Zhang, Chen Zhang, Zuozhu Liu, Hongwei Wang, Haizhou Li:
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models. EMNLP 2024: 14686-14695
[c735]Qu Yang, Qianhui Liu
, Nan Li, Meng Ge, Zeyang Song, Haizhou Li:
SVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks. ICASSP 2024: 221-225
[c734]Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li:
Spiking-Leaf: A Learnable Auditory Front-End for Spiking Neural Networks. ICASSP 2024: 226-230
[c733]Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li:
An Empirical Study on the Impact of Positional Encoding in Transformer-Based Monaural Speech Enhancement. ICASSP 2024: 1001-1005
[c732]Siqi Cai, Ran Zhang, Haizhou Li:
Robust Decoding of the Auditory Attention from EEG Recordings Through Graph Convolutional Networks. ICASSP 2024: 2320-2324
[c731]Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li:
LOCSELECT: Target Speaker Localization with an Auditory Selective Hearing Mechanism. ICASSP 2024: 8696-8700
[c730]Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li:
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis. ICASSP 2024: 10601-10605
[c729]Junjie Li, Ruijie Tao, Zexu Pan, Meng Ge, Shuai Wang, Haizhou Li:
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech. ICASSP 2024: 10666-10670
[c728]Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li:
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition. ICASSP 2024: 10901-10905
[c727]Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li:
Prompt-Driven Target Speech Diarization. ICASSP 2024: 11086-11090
[c726]Yi Ma, Kong Aik Lee, Ville Hautamäki, Meng Ge, Haizhou Li:
Gradient Weighting for Speaker Verification in Extremely Low Signal-to-Noise Ratio. ICASSP 2024: 11311-11315
[c725]Feng Jiang, Lingyi Yang, Yu Lu, Haizhou Li:
Tailored Domain-Specific Summaries: A Two-Stage Method Combining Extractive and Abstractive Summarization Models. ICONIP (9) 2024: 347-362
[c724]Qianhui Liu, Jiaqi Yan, Malu Zhang, Gang Pan, Haizhou Li:
LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization. IJCAI 2024: 3097-3105
[c723]Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang:
Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition. IJCAI 2024: 3160-3168
[c722]Wenxuan Wu, Xueyuan Chen, Xixin Wu, Haizhou Li, Helen Meng:
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy. IJCNN 2024: 1-8
[c721]Tianchi Liu
, Lin Zhang, Rohan Kumar Das
, Yi Ma, Ruijie Tao, Haizhou Li:
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio? INTERSPEECH 2024
[c720]Rui Liu
, Jiatian Xi, Ziyue Jiang, Haizhou Li:
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency. INTERSPEECH 2024
[c719]Marvin Borsdorf
, Zexu Pan, Haizhou Li, Tanja Schultz:
wTIMIT2mix: A Cocktail Party Mixtures Database to Study Target Speaker Extraction for Normal and Whispered Speech. INTERSPEECH 2024
[c718]Iva Ewert
, Marvin Borsdorf
, Haizhou Li, Tanja Schultz:
Does the Lombard Effect Matter in Speech Separation? Introducing the Lombard-GRID-2mix Dataset. INTERSPEECH 2024
[c717]Jingru Lin, Meng Ge, Junyi Ao, Liqun Deng, Haizhou Li:
SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech. INTERSPEECH 2024
[c716]Zijie Lin, Tianyu He, Siqi Cai, Haizhou Li:
ASA: An Auditory Spatial Attention Dataset with Multiple Speaking Locations. INTERSPEECH 2024
[c715]Saurav Pahuja
, Gabriel Ivucic, Pascal Himmelmann, Siqi Cai, Tanja Schultz, Haizhou Li:
Leveraging Graphic and Convolutional Neural Networks for Auditory Attention Detection with EEG. INTERSPEECH 2024
[c714]Zeyang Song, Qianhui Liu
, Qu Yang, Yizhou Peng
, Haizhou Li:
ED-sKWS: Early-Decision Spiking Neural Networks for Rapid, and Energy-Efficient Keyword Spotting. INTERSPEECH 2024
[c713]Shuai Wang, Ke Zhang, Shaoxiong Lin, Junjie Li
, Xuefei Wang, Meng Ge, Jianwei Yu, Yanmin Qian, Haizhou Li:
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction. INTERSPEECH 2024
[c712]Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li:
An Exploration of Length Generalization in Transformer-Based Speech Enhancement. INTERSPEECH 2024
[c711]Qibing Bai, Shuai Wang, Zhijun Liu, Mingyang Zhang, Wei Rao, Yannan Wang, Haizhou Li:
Diffusion-Based Method with TTS Guidance for Foreign Accent Conversion. ISCSLP 2024: 284-288
[c710]Yifan Hu, Rui Liu
, Guanglai Gao, Haizhou Li:
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis. ISCSLP 2024: 299-303
[c709]Peng Zhao, Ruicong Wang, Zijie Lin, Zexu Pan, Haizhou Li, Xueyi Zhang:
Ensemble Deep Learning Models for EEG-Based Auditory Attention Decoding. ISCSLP 2024: 339-343
[c708]Xianghu Yue
, Xueyi Zhang
, Yiming Chen
, Chengwei Zhang
, Mingrui Lao
, Huiping Zhuang
, Xinyuan Qian
, Haizhou Li
:
MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks. ACM Multimedia 2024: 2428-2437
[c707]Weizhi Liu
, Yue Li
, Dongdong Lin
, Hui Tian
, Haizhou Li
:
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis. ACM Multimedia 2024: 3294-3302
[c706]Rui Liu
, Yifan Hu
, Yi Ren
, Xiang Yin
, Haizhou Li
:
Generative Expressive Conversational Speech Synthesis. ACM Multimedia 2024: 4187-4196
[c705]Miao Liu
, Jing Wang
, Xinyuan Qian
, Haizhou Li
:
ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers. ACM Multimedia 2024: 7094-7103
[c704]Ruijie Tao
, Zhan Shi
, Yidi Jiang
, Duc-Tuan Truong
, Eng Siong Chng
, Massimo Alioto
, Haizhou Li
:
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization. ACM Multimedia 2024: 11342-11347
[c703]Chuang Li, Yan Zhang, Min-Yen Kan, Haizhou Li:
UNO-DST: Leveraging Unlabelled Data in Zero-Shot Dialogue State Tracking. NAACL-HLT (Findings) 2024: 2972-2983
[c702]Xidong Wang, Guiming Chen, Dingjie Song
, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Junying Chen, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li:
CMB: A Comprehensive Medical Benchmark in Chinese. NAACL-HLT 2024: 6184-6205
[c701]Huang Huang
, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song
, Zhihong Chen, Mosen Alharthi, Bang An, Juncai He, Ziche Liu, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu:
AceGPT, Localizing Large Language Models in Arabic. NAACL-HLT 2024: 8139-8163
[c700]Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu:
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words. NeurIPS 2024
[c699]Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, Benyou Wang, Jinchao Xu:
Alignment at Pre-training! Towards Native Alignment for Arabic LLMs. NeurIPS 2024
[c698]Xueyi Zhang, Mingrui Lao, Peng Zhao, Jun Tang, Yanming Guo, Siqi Cai, Xianghu Yue, Haizhou Li:
Language Without Borders: A Dataset and Benchmark for Code-Switching Lip Reading. NeurIPS 2024
[c697]Kun Zhou, Berrak Sisman, Carlos Busso, Bin Ma, Haizhou Li:
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion. Odyssey 2024: 180-186
[c696]Hongli Yang, Xinyi Chen, Junjie Li
, Hao Huang, Siqi Cai, Haizhou Li:
Listen to the Speaker in Your Gaze. CIS-RAM 2024: 380-385
[c695]Junjie Li, Ke Zhang, Shuai Wang, Haizhou Li, Man-Wai Mak, Kong Aik Lee:
On the Effectiveness of Enrollment Speech Augmentation For Target Speaker Extraction. SLT 2024: 325-332
[c694]Dashanka De Silva, Siqi Cai, Saurav Pahuja
, Tanja Schultz, Haizhou Li:
Neurospex: Neuro-Guided Speaker Extraction With Cross-Modal Fusion. SLT 2024: 341-348
[c693]Jiahe Wang, Shuai Wang, Junjie Li
, Ke Zhang, Yanmin Qian, Haizhou Li:
Enhancing Speaker Extraction Through Rectifying Target Confusion. SLT 2024: 349-356
[c692]Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Jiaqi Li, Haorui He, Chaoren Wang, Songting Liu, Xi Chen, Junan Zhang, Zihao Fang, Haopeng Chen, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu:
Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit. SLT 2024: 879-884
[c691]Lichuan Jiang, Jiani Zhong, Muqing Jian, Xuanzhuo Liu, Siqi Cai, Haizhou Li:
The Impact of Synchronized Visual and Auditory Attention on Human Perception. ICSR + InnoBiz 2024: 41-50
[c690]Xinyuan Qian, Chen Lu, Yating Zhang, Kainan Chen, Haizhou Li:
Semi-supervised Speaker Localization with Gaussian-Like Pseudo-labeling. ICSR + InnoBiz 2024: 146-155
[c689]Shuai Wang
, Pengcheng Zhu, Haizhou Li
:
M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions. ICSR + InnoBiz 2024: 303-311
[c688]Ganjun Liu, Xiaohui Hou, Meng Ge, Tao Zhang, Haizhou Li:
A Non-Intrusive Approach to Assessing Dysarthria Severity: Advancing Clinical Diagnosis. WWW (Companion Volume) 2024: 1134-1137
[i227]Yi Ma, Kong Aik Lee, Ville Hautamäki, Meng Ge, Haizhou Li:
Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio. CoRR abs/2401.02626 (2024)
[i226]Feng Jiang, Kuang Wang, Haizhou Li:
Bridging Research and Readers: A Multi-Modal Automated Academic Papers Interpretation System. CoRR abs/2401.09150 (2024)
[i225]Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li:
An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement. CoRR abs/2401.09686 (2024)
[i224]Xianghu Yue, Xiaohai Tian, Malu Zhang, Zhizheng Wu, Haizhou Li:
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing. CoRR abs/2401.12264 (2024)
[i223]Qianhui Liu, Jiaqi Yan, Malu Zhang, Gang Pan, Haizhou Li:
LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization. CoRR abs/2401.14652 (2024)
[i222]Lei Liu, Li Liu
, Haizhou Li:
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition. CoRR abs/2401.17604 (2024)
[i221]Wenjie Wei, Malu Zhang, Jilin Zhang, Ammar Belatreche, Jibin Wu, Zijing Xu, Xuerui Qiu, Hong Chen, Yang Yang, Haizhou Li:
Event-Driven Learning for Spiking Neural Networks. CoRR abs/2403.00270 (2024)
[i220]Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li:
Fine-Grained Quantitative Emotion Editing for Speech Generation. CoRR abs/2403.02002 (2024)
[i219]Xidong Wang, Nuo Chen
, Junyin Chen, Yan Hu, Yidong Wang, Xiangbo Wu, Anningzhe Gao, Xiang Wan, Haizhou Li, Benyou Wang:
Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People. CoRR abs/2403.03640 (2024)
[i218]Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li:
sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks. CoRR abs/2403.05772 (2024)
[i217]Danqing Luo, Chen Zhang, Yan Zhang, Haizhou Li:
CrossTune: Black-Box Few-Shot Classification with Label Enhancement. CoRR abs/2403.12468 (2024)
[i216]Wenxuan Wu, Xueyuan Chen, Xixin Wu, Haizhou Li, Helen Meng:
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy. CoRR abs/2403.16078 (2024)
[i215]Yicheng Gu, Xueyao Zhang, Liumeng Xue, Haizhou Li, Zhizheng Wu:
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder. CoRR abs/2404.17161 (2024)
[i214]Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li:
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention. CoRR abs/2404.18501 (2024)
[i213]Chuang Li, Yang Deng
, Hengchang Hu, Min-Yen Kan, Haizhou Li:
Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems. CoRR abs/2405.01868 (2024)
[i212]Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li:
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis. CoRR abs/2405.09171 (2024)
[i211]Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps:
Mamba in Speech: Towards an Alternative to Self-Attention. CoRR abs/2405.12609 (2024)
[i210]Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D'Haro
, Robby T. Tan, Haizhou Li:
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models. CoRR abs/2405.14646 (2024)
[i209]Jiahui Xu, Feng Jiang, Anningzhe Gao, Haizhou Li:
Unsupervised Mutual Learning of Dialogue Discourse Parsing and Topic Segmentation. CoRR abs/2405.19799 (2024)
[i208]Chen Zhang, Chengguang Tang, Dading Chong, Ke Shi, Guohua Tang, Feng Jiang, Haizhou Li:
TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models. CoRR abs/2405.20215 (2024)
[i207]Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li:
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio? CoRR abs/2406.02483 (2024)
[i206]Zhijun Liu, Shuai Wang, Sho Inoue, Qibing Bai, Haizhou Li:
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis. CoRR abs/2406.05551 (2024)
[i205]Yidi Jiang, Ruijie Tao, Zhengyang Chen, Yanmin Qian, Haizhou Li:
Target Speech Diarization with Multimodal Prompts. CoRR abs/2406.07198 (2024)
[i204]Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhiwu Li, Haizhou Li:
Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis. CoRR abs/2406.10844 (2024)
[i203]Zeyang Song, Qianhui Liu, Qu Yang, Yizhou Peng, Haizhou Li:
ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting. CoRR abs/2406.12726 (2024)
[i202]Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu:
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words. CoRR abs/2406.13340 (2024)
[i201]Ziche Liu, Rui Ke, Feng Jiang, Haizhou Li:
Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models. CoRR abs/2406.14115 (2024)
[i200]Jiabao Pan, Yan Zhang, Chen Zhang, Zuozhu Liu, Hongwei Wang, Haizhou Li:
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models. CoRR abs/2407.01009 (2024)
[i199]Rui Liu
, Haolin Zuo, Zheng Lian, Xiaofen Xing, Björn W. Schuller, Haizhou Li:
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset. CoRR abs/2407.02751 (2024)
[i198]Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang:
Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition. CoRR abs/2407.09521 (2024)
[i197]Weizhi Liu, Yue Li, Dongdong Lin, Hui Tian, Haizhou Li:
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis. CoRR abs/2407.10471 (2024)
[i196]Shuai Wang, Zhengyang Chen, Kong Aik Lee, Yanmin Qian, Haizhou Li:
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning. CoRR abs/2407.15188 (2024)
[i195]Rui Liu
, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li:
Generative Expressive Conversational Speech Synthesis. CoRR abs/2407.21491 (2024)
[i194]Qianhui Liu, Jiadong Wang, Yang Wang, Xin Yang, Gang Pan, Haizhou Li:
Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing. CoRR abs/2408.16564 (2024)
[i193]Dashanka De Silva, Siqi Cai, Saurav Pahuja, Tanja Schultz, Haizhou Li:
NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention. CoRR abs/2409.02489 (2024)
[i192]Xinyuan Qian, Xianghu Yue, Jiadong Wang, Huiping Zhuang, Haizhou Li:
Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection. CoRR abs/2409.07224 (2024)
[i191]Zhijun Liu, Shuai Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou Li:
E1 TTS: Simple and Fast Non-Autoregressive TTS. CoRR abs/2409.09351 (2024)
[i190]Sho Inoue, Shuai Wang, Wanxing Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou Li:
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion. CoRR abs/2409.09352 (2024)
[i189]Junjie Li, Ke Zhang, Shuai Wang, Haizhou Li, Man-Wai Mak, Kong Aik Lee:
On the effectiveness of enrollment speech augmentation for Target Speaker Extraction. CoRR abs/2409.09589 (2024)
[i188]Chen Zhang, Dading Chong, Feng Jiang, Chengguang Tang, Anningzhe Gao, Guohua Tang, Haizhou Li:
Aligning Language Models Using Follow-up Likelihood as Reward Signal. CoRR abs/2409.13948 (2024)
[i187]Shuai Wang, Pengcheng Zhu, Haizhou Li:
M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions. CoRR abs/2409.15782 (2024)
[i186]Shuai Wang, Ke Zhang, Shaoxiong Lin, Junjie Li, Xuefei Wang, Meng Ge, Jianwei Yu, Yanmin Qian, Haizhou Li:
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction. CoRR abs/2409.15799 (2024)
[i185]Yiming Chen, Xianghu Yue, Xiaoxue Gao, Chen Zhang, Luis Fernando D'Haro
, Robby T. Tan, Haizhou Li:
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models. CoRR abs/2409.18680 (2024)
[i184]Rui Liu
, Jiatian Xi, Ziyue Jiang, Haizhou Li:
FluentEditor+: Text-based Speech Editing by Modeling Local Hierarchical Acoustic Smoothness and Global Prosody Consistency. CoRR abs/2410.03719 (2024)
[i183]Rui Liu
, Zhenqi Jia, Jie Yang, Yifan Hu, Haizhou Li:
Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling. CoRR abs/2410.09524 (2024)
[i182]Fan Bu, Yuhao Zhang, Xidong Wang, Benyou Wang, Qun Liu, Haizhou Li:
Roadmap towards Superhuman Speech Understanding using Large Language Models. CoRR abs/2410.13268 (2024)
[i181]Shuwei He, Rui Liu
, Haizhou Li:
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech. CoRR abs/2410.14101 (2024)
[i180]


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID