


default search action
Hung-yi Lee
Hung-Yi Lee
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2026
[j30]Pei-Jun Liao
, Hung-Yi Lee, Hsin-Min Wang
:
Cross-Attention Reprogramming for ASR: Bridging Discrete Speech Units and Pretrained Language Models. IEEE Access 14: 662-678 (2026)
[i352]Zhixian Zhao, Shuiyuan Wang, Guojian Li, Hongfei Xue, Chengyou Wang, Shuai Wang, Longshuai Xiao, Zihan Zhang, Hui Bu, Xin Xu, Xinsheng Wang, Hexin Liu, Eng Siong Chng, Hung-yi Lee, Haizhou Li, Lei Xie:
The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era. CoRR abs/2601.05564 (2026)
[i351]Jeff Chan-Jan Sju, Liang-Hsuan Tseng, Yi-Cheng Lin, Yen-Chun Kuo, Ju-Chieh Chou, Kai-Wei Chang, Hung-yi Lee, Carlos Busso:
On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation. CoRR abs/2601.06329 (2026)
[i350]Chun-Yi Kuan, Hung-yi Lee:
AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering. CoRR abs/2601.12248 (2026)
[i349]Chun-Yi Kuan, Kai-Wei Chang, Hung-yi Lee:
AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering. CoRR abs/2601.14728 (2026)- 2025
[j29]Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, Yifan Peng, Haibin Wu, Yossi Adi, Emmanuel Dupoux, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
On The Landscape of Spoken Language Models: A Comprehensive Survey. Trans. Mach. Learn. Res. 2025 (2025)
[j28]Pooneh Mousavi, Gallil Maimon, Adel Moumen, Darius Petermann, Jiatong Shi, Haibin Wu, Haici Yang, Anastasia Kuznetsova, Artem Ploujnikov, Ricard Marxer, Bhuvana Ramabhadran, Benjamin Elizalde, Loren Lugosch, Jinyu Li, Cem Subakan, Philip C. Woodland, Minje Kim, Hung-yi Lee, Shinji Watanabe, Yossi Adi, Mirco Ravanelli:
Discrete Audio Tokens: More Than a Survey! Trans. Mach. Learn. Res. 2025 (2025)
[c284]Chen-An Li, Tzu-Han Lin, Yun-Nung Chen, Hung-yi Lee:
Transferring Textual Preferences to Vision-Language Understanding through Model Merging. ACL (2) 2025: 923-943
[c283]Cheng-Han Chiang, Hung-yi Lee, Michal Lukasik:
TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge. ACL (1) 2025: 2934-2952
[c282]Guan-Ting Lin, Prashanth Gurunath Shivakumar, Aditya Gourav, Yile Gu, Ankur Gandhe, Hung-yi Lee, Ivan Bulyko:
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback. ACL (1) 2025: 20395-20411
[c281]Tzu-Quan Lin, Hsi-Chun Cheng, Hung-yi Lee, Hao Tang:
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers. APSIPA 2025: 525-530
[c280]Cheng-Han Chiang, Xiaofei Wang, Chung-Ching Lin, Kevin Lin, Linjie Li, Radu Kopetz, Yao Qian, Zhendong Wang, Zhengyuan Yang, Hung-yi Lee, Lijuan Wang:
Audio-Aware Large Language Models as Judges for Speaking Styles. EMNLP (Findings) 2025: 467-480
[c279]Chih-Kai Yang, Neo S. Ho, Hung-yi Lee:
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey. EMNLP 2025: 10144-10170
[c278]Hua Farn, Hsuan Su, Shachi H. Kumar, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee:
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging. EMNLP (Findings) 2025: 16589-16602
[c277]Yi-Cheng Lin, Kang-Chieh Chen, Zhe-Yan Li, Tzu-Heng Wu, Tzu-Hsuan Wu, Kuan-Yu Chen, Hung-yi Lee, Yun-Nung Chen:
Creativity in LLM-based Multi-Agent Systems: A Survey. EMNLP 2025: 27584-27607
[c276]Shao-Syuan Huang, Kuan-Po Huang, Andy T. Liu, Hung-Yi Lee:
Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling. ICASSP 2025: 1-5
[c275]Chien-Yu Huang, Min-Han Shih
, Ke-Han Lu, Chi-Yuan Hsiao, Hung-Yi Lee:
SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning. ICASSP 2025: 1-5
[c274]Chun-Yi Kuan, Hung-Yi Lee:
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning. ICASSP 2025: 1-5
[c273]Zhe Li, Man-Wai Mak, Mert Pilanci, Hung-yi Lee, Helen Meng:
Spectral-Aware Low-Rank Adaptation for Speaker Verification. ICASSP 2025: 1-5
[c272]Hsi-Che Lin, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee:
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection. ICASSP 2025: 1-5
[c271]Ke-Han Lu
, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-Yi Lee:
Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data. ICASSP 2025: 1-5
[c270]Kuan-Po Huang, Shu-Wen Yang, Huy Phan, Bo-Ru Lu, Byeonggeun Kim, Sashank Macha, Qingming Tang, Shalini Ghosh, Hung-yi Lee, Chieh-Chi Kao, Chao Wang:
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling. ICML 2025
[c269]Shu-Wen Yang, Byeonggeun Kim, Kuan-Po Huang, Qingming Tang, Huy Phan, Bo-Ru Lu, Harshavardhan Sundar, Shalini Ghosh, Hung-yi Lee, Chieh-Chi Kao, Chao Wang:
Generative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction. ICML 2025
[c268]Xuanjun Chen, I-Ming Lin, Lin Zhang, Jiawei Du
, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang:
Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy. INTERSPEECH 2025
[c267]William Chen, Chutong Meng, Jiatong Shi, Martijn Bartelds, Shih-Heng Wang, Hsiu-Hsuan Wang, Rafael Mosquera, Sara Hincapie, Dan Jurafsky, Antonis Anastasopoulos, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties. INTERSPEECH 2025
[c266]Shi-Xin Fang, Liang-Yeh Shen, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee:
Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning. INTERSPEECH 2025
[c265]Fabian Ritter Gutierrez, Yi-Cheng Lin, Jui-Chiang Wei, Jeremy H. M. Wong, Eng Siong Chng, Nancy F. Chen, Hung-yi Lee:
Distilling a speech and music encoder with task arithmetic. INTERSPEECH 2025
[c264]Chi-Yuan Hsiao, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Wei-Chih Chen, Hung-yi Lee:
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models. INTERSPEECH 2025
[c263]Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Pin-Jui Ku, Ante Jukic, Huck Yang, Yu Tsao
, Yu-Chiang Frank Wang, Hung-yi Lee, Szu-Wei Fu:
VoiceNoNG: Robust High-Quality Speech Editing Model without Hallucinations. INTERSPEECH 2025
[c262]Chun-Yi Kuan, Hung-yi Lee:
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples. INTERSPEECH 2025
[c261]Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee:
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach. INTERSPEECH 2025
[c260]Ke-Han Lu, Chun-Yi Kuan, Hung-yi Lee:
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models. INTERSPEECH 2025
[c259]Yu-Xiang Luo, Yi-Cheng Lin, Ming-To Chuang, Jia-Hung Chen, I-Ning Tsai, Pei Xing Kiew, Yueh-Hsuan Huang, Chien-Feng Liu, Yu-Chen Chen, Bo-Han Feng, Wenze Ren, Hung-yi Lee:
ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality. INTERSPEECH 2025
[c258]Chih-Kai Yang, Neo Ho, Yen-Ting Piao, Hung-yi Lee:
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information. INTERSPEECH 2025
[c257]Chun-Yi Kuan, Hung-yi Lee:
Gender Bias in Instruction-Guided Speech Synthesis Models. NAACL (Findings) 2025: 5387-5413
[c256]Shensian Syu, Hung-yi Lee:
Hierarchical Speculative Decoding with Dynamic Window. NAACL (Findings) 2025: 8260-8273
[i348]Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Chao-Han Huck Yang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee, Szu-Wei Fu:
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits. CoRR abs/2501.03805 (2025)
[i347]Zhe Li, Man-Wai Mak, Mert Pilanci, Hung-yi Lee, Helen Meng:
Spectral-Aware Low-Rank Adaptation for Speaker Verification. CoRR abs/2501.03829 (2025)
[i346]Jiawei Du
, Xuanjun Chen, Haibin Wu, Lin Zhang, I-Ming Lin, I-Hsiang Chiu, Wenze Ren, Yuan Tseng, Yu Tsao, Jyh-Shing Roger Jang, Hung-yi Lee:
CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset. CoRR abs/2501.08238 (2025)
[i345]Chao-Chung Wu, Zhi Rui Tam, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen:
Clear Minds Think Alike: What Makes LLM Fine-tuning Robust? A Study of Token Perplexity. CoRR abs/2501.14315 (2025)
[i344]Chan-Jan Hsu, Yi-Cheng Lin, Chia-Chun Lin, Wei-Chih Chen, Ho-Lam Chung, Chen-An Li, Yi-Chang Chen, Chien-Yu Yu, Ming-Ji Lee, Chien-Cheng Chen, Ru-Heng Huang, Hung-yi Lee, Da-Shan Shiu:
BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation - Challenges and Insights. CoRR abs/2501.17790 (2025)
[i343]Chun-Yi Kuan, Hung-yi Lee:
Gender Bias in Instruction-Guided Speech Synthesis Models. CoRR abs/2502.05649 (2025)
[i342]Yu-Xiang Lin, Chih-Kai Yang, Wei-Chih Chen, Chen-An Li, Chien-yu Huang, Xuanjun Chen, Hung-yi Lee:
A Preliminary Exploration with GPT-4o Voice Mode. CoRR abs/2502.09940 (2025)
[i341]Tzu-Quan Lin, Wei-Ping Huang, Hao Tang, Hung-yi Lee:
Speech-FT: A Fine-tuning Strategy for Enhancing Speech Representation Models Without Compromising Generalization Ability. CoRR abs/2502.12672 (2025)
[i340]Chen-An Li, Tzu-Han Lin, Yun-Nung Chen, Hung-yi Lee:
Transferring Textual Preferences to Vision-Language Understanding through Model Merging. CoRR abs/2502.13487 (2025)
[i339]Cheng-Kuang Wu, Zhi Rui Tam, Chieh-Yen Lin, Yun-Nung Chen, Hung-yi Lee:
Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models. CoRR abs/2503.01332 (2025)
[i338]Cheng-Han Chiang, Hung-yi Lee, Michal Lukasik:
TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge. CoRR abs/2503.04381 (2025)
[i337]Guan-Ting Lin, Jiachen Lian, Tingle Li, Qirui Wang, Gopala Anumanchipalli, Alexander H. Liu, Hung-yi Lee:
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities. CoRR abs/2503.04721 (2025)
[i336]Liang-Hsuan Tseng, Yi-Chang Chen, Kuan-Yi Lee, Da-Shan Shiu, Hung-yi Lee:
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling. CoRR abs/2504.07053 (2025)
[i335]Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, Yifan Peng, Haibin Wu, Yossi Adi, Emmanuel Dupoux, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
:
On The Landscape of Spoken Language Models: A Comprehensive Survey. CoRR abs/2504.08528 (2025)
[i334]Xuanjun Chen, I-Ming Lin, Lin Zhang, Jiawei Du, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang:
Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy. CoRR abs/2505.12994 (2025)
[i333]Chih-Kai Yang, Neo Ho, Yen-Ting Piao, Hung-yi Lee:
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information. CoRR abs/2505.13237 (2025)
[i332]Fabian Ritter Gutierrez, Yi-Cheng Lin, Jui-Chiang Wei, Jeremy H. M. Wong, Eng Siong Chng, Nancy F. Chen, Hung-yi Lee:
Distilling a speech and music encoder with task arithmetic. CoRR abs/2505.13270 (2025)
[i331]Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee:
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach. CoRR abs/2505.14449 (2025)
[i330]Chun-Yi Kuan, Hung-yi Lee:
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples. CoRR abs/2505.14518 (2025)
[i329]Yu-Xiang Luo, Yi-Cheng Lin, Ming-To Chuang, Jia-Hung Chen, I-Ning Tsai, Pei Xing Kiew, Yueh-Hsuan Huang, Chien-Feng Liu, Yu-Chen Chen, Bo-Han Feng, Wenze Ren, Hung-yi Lee:
ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality. CoRR abs/2505.15773 (2025)
[i328]Chih-Kai Yang, Neo S. Ho, Hung-yi Lee:
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey. CoRR abs/2505.15957 (2025)
[i327]Liang-Yeh Shen, Shi-Xin Fang, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee:
Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning. CoRR abs/2505.16220 (2025)
[i326]Zhi Rui Tam, Cheng-Kuang Wu, Yu Ying Chiu, Chieh-Yen Lin, Yun-Nung Chen, Hung-yi Lee:
Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models? CoRR abs/2505.17407 (2025)
[i325]Chi-Yuan Hsiao, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Wei-Chih Chen, Hung-yi Lee:
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models. CoRR abs/2505.17496 (2025)
[i324]Ke-Han Lu, Chun-Yi Kuan, Hung-yi Lee:
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models. CoRR abs/2505.19037 (2025)
[i323]Chun-Yi Kuan, Hung-yi Lee:
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data. CoRR abs/2505.20166 (2025)
[i322]Yi-Cheng Lin, Kang-Chieh Chen, Zhe-Yan Li, Tzu-Heng Wu, Tzu-Hsuan Wu, Kuan-Yu Chen, Hung-yi Lee, Yun-Nung Chen:
Creativity in LLM-based Multi-Agent Systems: A Survey. CoRR abs/2505.21116 (2025)
[i321]Kuan-Po Huang, Shu-Wen Yang, Huy Phan, Bo-Ru Lu, Byeonggeun Kim, Sashank Macha, Qingming Tang, Shalini Ghosh, Hung-yi Lee, Chieh-Chi Kao, Chao Wang:
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling. CoRR abs/2506.00736 (2025)
[i320]Yi-Cheng Lin, Huang-Cheng Chou, Yu-Hsuan Li Liang, Hung-yi Lee:
EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition. CoRR abs/2506.04652 (2025)
[i319]Chih-Kai Yang, Neo Ho, Yi-Jyun Lee, Hung-yi Lee:
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models. CoRR abs/2506.05140 (2025)
[i318]Cheng-Han Chiang, Xiaofei Wang, Chung-Ching Lin, Kevin Lin, Linjie Li, Radu Kopetz, Yao Qian, Zhendong Wang, Zhengyuan Yang, Hung-yi Lee, Lijuan Wang:
Audio-Aware Large Language Models as Judges for Speaking Styles. CoRR abs/2506.05984 (2025)
[i317]Yun-Shao Tsai, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee:
CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition. CoRR abs/2506.06071 (2025)
[i316]Tzu-Wen Hsu, Ke-Han Lu, Cheng-Han Chiang, Hung-yi Lee:
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding. CoRR abs/2506.07233 (2025)
[i315]Xuanjun Chen, I-Ming Lin, Lin Zhang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang:
Towards Generalized Source Tracing for Codec-Based Deepfake Speech. CoRR abs/2506.07294 (2025)
[i314]Pooneh Mousavi, Gallil Maimon, Adel Moumen, Darius Petermann, Jiatong Shi, Haibin Wu, Haici Yang, Anastasia Kuznetsova, Artem Ploujnikov, Ricard Marxer, Bhuvana Ramabhadran, Benjamin Elizalde, Loren Lugosch, Jinyu Li, Cem Subakan, Philip C. Woodland, Minje Kim, Hung-yi Lee, Shinji Watanabe
, Yossi Adi, Mirco Ravanelli:
Discrete Audio Tokens: More Than a Survey! CoRR abs/2506.10274 (2025)
[i313]Wei-Ping Huang, Guan-Ting Lin, Hung-yi Lee:
SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR. CoRR abs/2506.11121 (2025)
[i312]Cheng-Kang Chou, Chan-Jan Hsu, Ho-Lam Chung, Liang-Hsuan Tseng, Hsi-Chun Cheng, Yu-Kuan Fu, Kuan-Po Huang, Hung-Yi Lee:
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data. CoRR abs/2506.11130 (2025)
[i311]Fabian Ritter Gutierrez, Yi-Cheng Lin, Jeremy H. M. Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen:
A correlation-permutation approach for speech-music encoders model merging. CoRR abs/2506.11403 (2025)
[i310]Tzu-Quan Lin, Heng-Cheng Kuo, Tzu-Chieh Wei, Hsi-Chun Cheng, Chun-Wei Chen, Hsien-Fu Hsiao, Yu Tsao, Hung-yi Lee:
An Exploration of Mamba for Speech Self-Supervised Models. CoRR abs/2506.12606 (2025)
[i309]Tzu-Quan Lin, Hsi-Chun Cheng, Hung-yi Lee, Hao Tang:
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers. CoRR abs/2506.21712 (2025)
[i308]Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Sung-Feng Huang, Chih-Kai Yang, Chee-En Yu, Chun-Wei Chen, Wei-Chih Chen, Chien-yu Huang, Yi-Cheng Lin, Yu-Xiang Lin, Chi-An Fu, Chun-Yi Kuan, Wenze Ren, Xuanjun Chen, Wei-Ping Huang, En-Pei Hu, Tzu-Quan Lin, Yuan-Kuei Wu, Kuan-Po Huang, Hsiao-Ying Huang, Huang-Cheng Chou, Kai-Wei Chang, Cheng-Han Chiang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee:
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment. CoRR abs/2507.02768 (2025)
[i307]Yi-Cheng Lin, Jia-Hung Chen, Hung-yi Lee:
MMMOS: Multi-domain Multi-axis Audio Quality Assessment. CoRR abs/2507.04094 (2025)
[i306]Shu-Wen Yang, Byeonggeun Kim, Kuan-Po Huang, Qingming Tang, Huy Phan, Bo-Ru Lu, Harsha Sundar, Shalini Ghosh, Hung-yi Lee, Chieh-Chi Kao, Chao Wang:
Generative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction. CoRR abs/2507.09834 (2025)
[i305]Fabian Ritter Gutierrez, Yi-Cheng Lin, Jui-Chiang Wei, Jeremy H. M. Wong, Nancy F. Chen, Hung-yi Lee:
ASTAR-NTU solution to AudioMOS Challenge 2025 Track1. CoRR abs/2507.09904 (2025)
[i304]Hongchao Jiang, Yiming Chen, Yushi Cao, Hung-yi Lee, Robby T. Tan:
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks. CoRR abs/2507.10535 (2025)
[i303]Cheng-Han Chiang, Xiaofei Wang, Linjie Li, Chung-Ching Lin, Kevin Lin, Shujie Liu, Zhendong Wang, Zhengyuan Yang, Hung-yi Lee, Lijuan Wang:
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models. CoRR abs/2507.15375 (2025)
[i302]Xuanjun Chen, Shih-Peng Cheng, Jiawei Du
, Lin Zhang, Xiaoxiao Miao, Chung-Che Wang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang:
Localizing Audio-Visual Deepfakes via Hierarchical Boundary Modeling. CoRR abs/2508.02000 (2025)
[i301]Juncheng Xie, Hung-yi Lee:
Prompt-Based One-Shot Exact Length-Controlled Generation with LLMs. CoRR abs/2508.13805 (2025)
[i300]Ting-Chun Liu, Ching Yu Hsu, Kuan-Yi Lee, Chi-An Fu, Hung-yi Lee:
AEGIS : Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema. CoRR abs/2509.00088 (2025)
[i299]William Chen, Chutong Meng, Jiatong Shi, Martijn Bartelds, Shih-Heng Wang, Hsiu-Hsuan Wang, Rafael Mosquera, Sara Hincapie, Dan Jurafsky, Antonis Anastasopoulos, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties. CoRR abs/2509.07139 (2025)
[i298]Xuanjun Chen, Chia-Yu Hu, I-Ming Lin, Yi-Cheng Lin, I-Hsiang Chiu, You Zhang, Sung-Feng Huang, Yi-Hsuan Yang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang:
How Does Instrumental Music Help SingFake Detection? CoRR abs/2509.14675 (2025)
[i297]Hsiao-Ying Huang, Yi-Cheng Lin, Hung-yi Lee:
MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model. CoRR abs/2509.20706 (2025)
[i296]Yi-Cheng Lin, Yu-Hua Chen, Jia-Kai Dong, Yueh-Hsuan Huang, Szu-Chi Chen, Yu-Chen Chen, Chih-Yao Chen, Yu-Jung Lin, Yu-Ling Chen, Zih-Yu Chen, I-Ning Tsai, Hsiu-Hsuan Wang, Ho-Lam Chung, Ke-Han Lu, Hung-yi Lee:
TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics. CoRR abs/2509.26329 (2025)
[i295]Kai-Wei Chang, En-Pei Hu, Chun-Yi Kuan, Wenze Ren, Wei-Chih Chen, Guan-Ting Lin, Yu Tsao, Shao-Hua Sun, Hung-yi Lee, James Glass:
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models. CoRR abs/2509.26388 (2025)
[i294]Chen-An Li, Tzu-Han Lin, Hung-yi Lee:
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models. CoRR abs/2510.00626 (2025)
[i293]Yu-Xiang Lin, Chen-An Li, Sheng-Lun Wei, Po-Chun Chen
, Hsin-Hsi Chen, Hung-yi Lee:
Hearing the Order: Investigating Selection Bias in Large Audio-Language Models. CoRR abs/2510.00628 (2025)
[i292]Cheng-Han Chiang, Xiaofei Wang, Linjie Li, Chung-Ching Lin, Kevin Lin, Shujie Liu, Zhendong Wang, Zhengyuan Yang, Hung-yi Lee, Lijuan Wang:
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models. CoRR abs/2510.06917 (2025)
[i291]Yi-Cheng Lin, Yu-Hsuan Li Liang, Hsuan Su, Tzu-Quan Lin, Shang-Tse Chen, Yun-Nung Chen, Hung-yi Lee:
Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition. CoRR abs/2510.08047 (2025)
[i290]Tsung-Min Pai, Jui-I Wang, Li-Chun Lu, Shao-Hua Sun, Hung-Yi Lee, Kai-Wei Chang:
BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation. CoRR abs/2510.10157 (2025)
[i289]Kuan-Yi Lee, Tsung-En Lin, Hung-Yi Lee:
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning. CoRR abs/2510.11454 (2025)
[i288]Tsung-En Lin, Kuan-Yi Lee, Hung-Yi Lee:
Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models. CoRR abs/2510.12851 (2025)
[i287]Ming-Hao Hsu, Liang-Hsuan Tseng, Hung-yi Lee, Zhizheng Wu:
TASLA: Text-Aligned Speech Tokens with Multiple Layer-Aggregation. CoRR abs/2510.14934 (2025)
[i286]Bo-Han Feng, Chien-Feng Liu, Yu-Hsuan Li Liang, Chih-Kai Yang, Szu-Wei Fu, Zhehuai Chen, Ke-Han Lu, Sung-Feng Huang, Chao-Han Huck Yang, Yu-Chiang Frank Wang, Yun-Nung Chen, Hung-yi Lee:
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations. CoRR abs/2510.16893 (2025)
[i285]Chih-Kai Yang, Yen-Ting Piao, Tzu-Wen Hsu, Szu-Wei Fu, Zhehuai Chen, Ke-Han Lu, Sung-Feng Huang, Chao-Han Huck Yang, Yu-Chiang Frank Wang, Yun-Nung Chen, Hung-yi Lee:
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models. CoRR abs/2510.16917 (2025)
[i284]Claire Lin, Bo-Han Feng, Xuanjun Chen, Te-Lun Yang, Hung-yi Lee, Jyh-Shing Roger Jang:
A Preliminary Study of RAG for Taiwanese Historical Archives. CoRR abs/2511.07445 (2025)
[i283]Tzu-Han Lin, Wei-Lin Chen, Chen-An Li, Hung-yi Lee, Yun-Nung Chen, Yu Meng:
AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning. CoRR abs/2512.16883 (2025)
[i282]Yu-Xiang Lin, Cheng-Han Chiang, Hung-yi Lee:
Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models. CoRR abs/2512.23578 (2025)- 2024
[j27]Shu-Wen Yang
, Heng-Jui Chang
, Zili Huang, Andy T. Liu
, Cheng-I Lai
, Haibin Wu
, Jiatong Shi
, Xuankai Chang, Hsiang-Sheng Tsai
, Wen-Chin Huang
, Tzu-hsun Feng, Po-Han Chi
, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe
, Hung-yi Lee:
A Large-Scale Evaluation of Speech Foundation Models. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2884-2899 (2024)
[j26]Kai-Wei Chang
, Haibin Wu
, Yu-Kai Wang
, Yuan-Kuei Wu
, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-wen Li
, Hung-Yi Lee
:
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3730-3744 (2024)
[j25]Shensian Syu
, Juncheng Xie
, Hung-yi Lee
:
Improving Non-Autoregressive Translation Quality With Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4121-4133 (2024)
[c255]Cheng-Han Chiang, Hung-yi Lee:
Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations. ACL (Findings) 2024: 2734-2751
[c254]Guan-Ting Lin, Cheng-Han Chiang, Hung-yi Lee:
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations. ACL (1) 2024: 6626-6642
[c253]Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-yi Lee:
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models. ACL (Findings) 2024: 10330-10348
[c252]Shih-Cheng Huang, Pin-Zu Li, Yu-Chi Hsu, Kuang-Ming Chen, Yu-Tung Lin, Shih-Kai Hsiao, Richard Tzong-Han Tsai, Hung-yi Lee:
Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages. ACL (1) 2024: 10943-10959
[c251]Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan S. Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe
:
On the Evaluation of Speech Foundation Models for Spoken Language Understanding. ACL (Findings) 2024: 11923-11938
[c250]Wenze Ren, Yi-Cheng Lin, Huang-Cheng Chou, Haibin Wu, Yi-Chiao Wu, Chi-Chun Lee, Hung-Yi Lee, Hsin-Min Wang
, Yu Tsao
:
EMO-Codec: An In-Depth Look at Emotion Preservation Capacity of Legacy and Neural Codec Models with Subjective and Objective Evaluations. APSIPA 2024: 1-6
[c249]Haibin Wu, Huang-Cheng Chou, Kai-Wei Chang, Lucas Goncalves, Jiawei Du
, Jyh-Shing Roger Jang, Chi-Chun Lee, Hung-Yi Lee:
Empower Typed Descriptions by Large Language Models for Speech Emotion Recognition. APSIPA 2024: 1-6
[c248]Cheng-Han Chiang, Hung-yi Lee:
Over-Reasoning and Redundant Calculation of Large Language Models. EACL (2) 2024: 161-169
[c247]Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen:
Let Me Speak Freely? A Study On The Impact Of Format Restrictions On Large Language Model Performance. EMNLP (Industry Track) 2024: 1218-1236
[c246]Cheng-Kuang Wu, Zhi Rui Tam, Chao-Chung Wu, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen:
I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation. EMNLP 2024: 2191-2199
[c245]Cheng-Han Chiang, Wei-Chih Chen, Chun-Yi Kuan, Chienchou Yang, Hung-yi Lee:
Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course. EMNLP 2024: 2489-2513
[c244]Hsuan Su, Hua Farn, Fan-Yun Sun, Shang-Tse Chen, Hung-yi Lee:
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition. EMNLP 2024: 8905-8915
[c243]Guan-Ting Lin, Hung-yi Lee:
Can LLMs Understand the Implication of Emphasized Sentences in Dialogue? EMNLP (Findings) 2024: 13391-13401
[c242]Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang Qian Shi, Yulei Niu, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu:
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses. EMNLP (Findings) 2024: 14839-14854
[c241]Tzu-Han Lin, Chen-An Li, Hung-yi Lee, Yun-Nung Chen:
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging. EMNLP 2024: 15506-15524
[c240]Guan-Ting Lin, Wei Huang, Hung-yi Lee:
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech. EMNLP 2024: 20003-20015
[c239]Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-Yi Lee, Hsin-Min Wang
, David Harwath:
SpeechCLIP+: Self-Supervised Multi-Task Representation Learning for Speech Via Clip and Speech-Image Data. ICASSP Workshops 2024: 465-469
[c238]Fabian Ritter Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H. M. Wong, Hung-Yi Lee, Eng Siong Chng, Nancy F. Chen:
Noise Robust Distillation of Self-Supervised Speech Models via Correlation Metrics. ICASSP Workshops 2024: 495-499
[c237]Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-Yi Lee:
Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR And Speech-to-Text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision. ICASSP Workshops 2024: 540-544
[c236]Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-Yi Lee, David Harwath:
Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model. ICASSP Workshops 2024: 645-649
[c235]Tzu-Han Lin, How-Shing Wang, Hao-Yung Weng, Kuang-Chen Peng, Zih-Ching Chen, Hung-Yi Lee:
PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques. ICASSP Workshops 2024: 705-709
[c234]Haibin Wu, Heng-Cheng Kuo, Yu Tsao
, Hung-Yi Lee:
Scalable Ensemble-Based Detection Method Against Adversarial Attacks For Speaker Verification. ICASSP 2024: 4670-4674
[c233]Yuan Tseng, Layne Berry, Yiting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu
, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Poyao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao
, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee:
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models. ICASSP 2024: 6890-6894
[c232]Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-Yi Lee, Jyh-Shing Roger Jang:
Multimodal Transformer Distillation for Audio-Visual Synchronization. ICASSP 2024: 7755-7759
[c231]Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-Yi Lee:
Zero Resource Code-Switched Speech Benchmark Using Speech Utterance Pairs for Multiple Spoken Languages. ICASSP 2024: 10006-10010
[c230]Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko:
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue. ICASSP 2024: 10316-10320
[c229]Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe
, Bhiksha Ramakrishnan, Shady Shehata
, Hung-Yi Lee:
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech. ICASSP 2024: 12136-12140
[c228]Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Lin-Shan Lee:
SpeechDPR: End-To-End Spoken Passage Retrieval For Open-Domain Spoken Question Answering. ICASSP 2024: 12476-12480
[c227]Kevin Everson, Yile Gu, Chao-Han Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke:
Towards ASR Robust Spoken Language Understanding Through in-Context Learning with Word Confusion Networks. ICASSP 2024: 12856-12860
[c226]Kai-Wei Chang, Ming-Hao Hsu, Shang-Wen Li, Hung-yi Lee:
Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks. INTERSPEECH 2024
[c225]Xuanjun Chen, Jiawei Du
, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee:
Neural Codec-based Adversarial Sample Detection for Speaker Verification. INTERSPEECH 2024
[c224]Xuanjun Chen, Haibin Wu, Roger Jang, Hung-yi Lee:
Singing Voice Graph Modeling for SingFake Detection. INTERSPEECH 2024
[c223]Fabian Ritter Gutierrez, Kuan-Po Huang, Jeremy H. M. Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng:
Dataset-Distillation Generative Model for Speech Emotion Recognition. INTERSPEECH 2024
[c222]Chun-Yi Kuan, Wei-Ping Huang, Hung-yi Lee:
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models. INTERSPEECH 2024
[c221]


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID