


default search action
14th ISCSLP 2024: Beijing, China
- Yanmin Qian, Qin Jin, Zhijian Ou, Zhenhua Ling, Zhiyong Wu, Ya Li, Lei Xie, Jianhua Tao:

14th IEEE International Symposium on Chinese Spoken Language Processing, ISCSLP 2024, Beijing, China, November 7-10, 2024. IEEE 2024, ISBN 979-8-3315-1682-6 - Jingchen Li, Xin Liu, Xueliang Zhang:

OPC-KWS: Optimizing Keyword Spotting with Path Retrieval Decoding and Contrastive Learning. 1-5 - Binqiang Wang, Gang Dong, Yaqian Zhao, Rengang Li:

Personalized Multimodal Emotion Recognition: Integrating Temporal Dynamics and Individual Traits for Enhanced Performance. 408-412 - Shuoyi Zhou, Yixuan Zhou, Weiqing Li, Jun Chen, Runchuan Ye, Weihao Wu, Zijian Lin, Shun Lei, Zhiyong Wu:

The Codec Language Model-Based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024. 496-500 - Fang Hu:

Gradiency in Obstruent Devoicing in the Varieties of Wu Chinese. 111-115 - Qibing Bai, Shuai Wang, Zhijun Liu, Mingyang Zhang, Wei Rao, Yannan Wang, Haizhou Li:

Diffusion-Based Method with TTS Guidance for Foreign Accent Conversion. 284-288 - Yiwen Lu, Zhen Ye, Wei Xue, Xu Tan

, Qifeng Liu, Yike Guo:
COMOSVC: Consistency Model-Based Singing Voice Conversion. 184-188 - Xiyao Lu, Yukai Wan, Ruishan Li, Jinsong Zhang:

Statistical Analysis of F0 Characteristics of "Grade A Level 1" Mandarin Tones: On the Application of the T-Value Method. 1-5 - Xinrui Yan, Jiangyan Yi, Jianhua Tao, Yujie Chen, Hao Gu, Guanjun Li, Junzuo Zhou, Yong Ren, Tao Xu:

Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio. 476-480 - Guolun Sun, Li Wang:

Constant Q Transform for Audio-Visual Dysarthria Severity Assessment. 146-150 - Zhiyong Wang

, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li:
A Noval Feature via Color Quantisation for Fake Audio Detection. 1-5 - Aijun Li, Zhiwei Wang, Sichen Zhang

, Jun Gao, Xin Zhou:
The Development of Speech Rhythm in Mandarin-Speaking Children. 274-278 - Juan Liu, Yudong Yang, Xiaokang Liu, Xiaoyi Zuo, Junfeng Li, Lan Wang, Nan Yan:

The ISCSLP 2024 Multimodal Dysarthria Severity Assessment (MDSA) Challenge: Dataset, Tracts, Baseline and Results. 136-140 - Sabrina Chow, Lilian Guo, Jonathan Chow, Chelsea Chia, Sarah Li, Dong-Yan Huang:

Semantic Search Using LLM-Aided Topic Generation on Knowledge Graphs for Paper Discovery. 353-357 - Zhongxuan Mao, Chenyu Li, Shanpeng Li:

Speech Rate Influence on Rhythm Alterations in Mandarin. 521-525 - Chengyuan Qin, Wenmeng Xiong, Maoshen Jia, Haoyang Zhou, Jing Zhang, Xianhong Chen, Qi Wang:

Robust Coherent sources Localization Based on Hankel Matrix Reconstruction. 706-710 - Yubo Jiang, Zhihua Huang:

Fast Sampling Based on Policy Gradient for Diffusion-Based Speech Enhancement. 576-580 - Shuang Zhou, Yinghao Li:

Categorical Perception of Tone 2 and Tone 3 of Standard Chinese by Bilingual Korean Ethnic Speakers in China. 551-555 - Zhuojun Wu, Dong Liu, Ming Li:

Lightweight Language Model for Speech Synthesis: Attempts and Analysis. 501-505 - Zhihan Yang, Chunfeng Wang, Zhiyong Wu, Jia Jia:

Inferring Agent Speaking Styles for Auditory-Visual User-Agent Conversation. 421-425 - Jinghua Liang, Bo Wang, Xihong Wu, Jing Chen:

Encoding and Decoding of Chinese Phonemes Based on MEG Signals. 224-228 - Yifan Hu, Rui Liu

, Guanglai Gao, Haizhou Li:
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis. 299-303 - Haoxiang Hou, Xun Gong, Yanmin Qian:

ConMamba: A Convolution-Augmented Mamba Encoder Model for Efficient End-to-End ASR Systems. 711-715 - Tong Lee Chung, Jianxin Pang, Jun Cheng:

Empowering Robots with Multimodal Language Models for Task Planning with Interaction. 358-362 - Ruibo Fu, Rui Liu

, Chunyu Qiang, Yingming Gao, Yi Lu, Shuchen Shi, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Xin Qi, Guanjun Li:
ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024. 626-630 - Siyin Wang, Chao Zhang:

Speaker Diarization for Unlimited Number of Speakers Using Dynamic Linear. 368-372 - Yiwei Liang, Ming Li:

Vivid Background Audio Generation Based on Large Language Models and AudioLDM. 621-625 - Yuhang Yang, Yizhou Peng, Eng Siong Chng, Xionghu Zhong:

Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs. 646-650 - Wenqian Bao, Yuchen Yan, Jinsong Zhang:

Enhancing Mispronunciation Detection with WavLM and Mixture-of-Experts Network. 189-193 - Yifeng Sun, Yanlu Xie, Jinsong Zhang, Dengfeng Ke:

Arti-Invar: A Pre-trained Model for Enhancing Acoustic-to-Articulatory Inversion Performance. 154-158 - Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling:

Leveraging Prompt Learning and Pause Encoding for Alzheimer's Disease Detection. 486-490 - Xin Zhou, Wangyou Zhang, Chenda Li, Yanmin Qian:

Insights from Hyperparameter Scaling of Online Speech Separation. 561-565 - Qixin Li, Gaowu Wang:

Focus and Gender Affect Sentence Type Perception: Observations on Mandarin Sentence-Final Particle Ba. 511-515 - Xiaoke Qi, Hao Gu, Jiangyan Yi, Jianhua Tao, Yong Ren, Jiayi He, Siding Zeng:

MADD: A Multi-Lingual Multi-Speaker Audio Deepfake Detection Dataset. 466-470 - Xiangzhu Kong, Tianqi Ning, Hao Huang, Zhijian Ou:

Cuside-Array: A Streaming Multi-Channel End-to-End Speech Recognition System with Realistic Evaluations. 721-725 - Bin Zhao, Gaoyan Zhang, Jianwu Dang, Aijun Li:

Bi-Directional Oscillatory Interaction in the Neural Networks Engaged in Sentence Oral Reading. 56-60 - Hongwu Ding, Yiquan Zhou, Wenyu Wang, Jiacheng Xu, Jiaqi Mei:

Hola-TTS: A Cross-Lingual Zero-Shot Text-to-Speech System for Chinese, English, Japanese, and Korean. 601-605 - Wei Dai, Menglong Li, Yingqi He, Yongqiang Zhu:

Fine-Tuning Pre-Trained Audio Models for Dysarthria Severity Classification: A Second Place Solution in the Multimodal Dysarthria Severity Classification Challenge. 151-153 - Pincheng Lu, Liang Xu, Jing Wang:

A Differential Quantization Based End-to-End Neural Speech Codec. 71-75 - Yicong Jiang, Youjun Chen, Tianzi Wang, Zengrui Jin, Xurong Xie, Hui Chen, Xunying Liu, Feng Tian:

Investigation of Cross Modality Feature Fusion for Audio-Visual Dysarthric Speech Assessment. 141-145 - Yuancheng Wang, Haoran Zheng, Qi Sun, Yong Ma, Shihu Zhu, Le Zhang, Wei-Qiang Zhang:

Cross-Lingual Alzheimer's Disease Detection Based on Scale Criteria. 491-495 - Jingran Xie, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu, Helen Meng:

ERVQ: Leverage Residual Vector Quantization for Speech Emotion Recognition. 456-460 - Tao Zhuang, Jiaxin Zhong, Jing Lu:

The Feasibility of Sound Zone Control Using an Array of Parametric Array Loudspeakers. 66-70 - Juan Liu, Xiaokang Liu, Yudong Yang, Rukiye Ruzi, Xiaoyi Zuo, Changqing Xu, Chaojinzi Li, Xinyu Li, Rongfeng Su, An-Ming Hu, Yu-Mei Zhang, Shaofeng Zhao, Xiaoxia Du, Lan Wang, Nan Yan:

The Open-Access Mandarin Subacute Stroke Dysarthria Multimodal (MSDM) Database for Intelligent Assessment. 131-135 - Shuang Liang, Yu Gu:

Multi-Modal Dysarthria Severity Assessment Using Dual-Branch Feature Decoupling Network and Mixed Expert Framework. 126-130 - Wenyi Yu, Chao Zhang:

An Optimizer for Conformer Based on Conjugate Gradient Method. 1-5 - Peng Zhao, Ruicong Wang, Xueyi Zhang, Mingrui Lao, Siqi Cai:

Binary-Temporal Convolutional Neural Network for Multi-Class Auditory Spatial Attention Detection. 1-5 - Huijun Lian, Keqi Chen, Zekai Sun, Yingming Gao, Ya Li:

G2DiaR: Enhancing Commonsense Reasoning of LLMs with Graph-to-Dialogue & Reasoning. 214-218 - Yi Han

, Hang Chen, Jun Du, Chang-Qing Kong, Shifu Xiong, Jia Pan:
Layer-Adaptive Low-Rank Adaptation of Large ASR Model for Low-Resource Multilingual Scenarios. 696-700 - Kangxiang Xia, Dake Guo, Jixun Yao, Liumeng Xue, Hanzhao Li, Shuai Wang, Zhao Guo, Lei Xie, Qingqing Zhang, Lei Luo, Minghui Dong, Peng Sun:

The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings. 506-510 - Junan Li, Yunxiang Li, Yuren Wang, Xixin Wu, Helen Meng:

Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease. 471-475 - Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Guanjun Li:

Exploring the Role of Audio in Multimodal Misinformation Detection. 204-208 - Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhenhua Cheng, Zheng Lian, Bin Liu:

IERP 2024: Induced Emotion Recognition with Personality Characteristics Challenge 2024. 413-416 - Grace Wenling Cao, Vincent Hughes, Bruce Xiao Wang, Peggy Mok:

Cross-Language Forensic Voice Comparison of Hong Kong Trilingual Speakers using Filled Pauses and an Automatic Speaker Recognition System. 279-283 - Tong Lee Chung, Jun Cheng, Jianxin Pang:

Mitigating Hallucination in Visual Language Model Segmentation with Negative Sampling. 344-348 - Kang Zhu, Xuefei Liu, Heng Xie, Cong Cai, Ruibo Fu, Guanjun Li, Zhengqi Wen, Jianhua Tao, Cunhang Fan, Zhao Lv, Le Wang, Hao Lin:

Transferring Personality Knowledge to Multimodal Sentiment Analysis. 431-435 - Yuxun Tang, Jiatong Shi, Yuning Wu, Qin Jin:

An Exploration on Singing MOS Prediction. 651-655 - Jizhou Cui

, Xuefei Liu, Yongwei Li, Xiaoying Xu, Ruibo Fu, Jianhua Tao, Zhengqi Wen, Yukun Liu, Guanjun Li, Le Wang, Hao Lin:
Unlocking the Power of Emotions: Enhancing Personality Trait Recognition Through Utilization of Emotional Cues. 566-570 - Sinan Sun, Longxiang Zhang, Bo Wang, Xihong Wu, Jing Chen:

Representation of Articulatory Features in EEG During Speech Production Tasks. 219-223 - Yuetong Zhao, Hongyu Cao, Xianyu Zhao, Zhijian Ou:

An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought. 436-440 - Dongrui Han

, Mingyu Cui, Jiawen Kang, Xixin Wu, Xunying Liu, Helen Meng:
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models. 631-635 - Peng Zhao, Ruicong Wang, Zijie Lin, Zexu Pan, Haizhou Li, Xueyi Zhang:

Ensemble Deep Learning Models for EEG-Based Auditory Attention Decoding. 339-343 - Yuan Jia, Xintong Zuo:

Acoustic Features of Standard Chinese Consonants by Uyghur Primary School Teachers. 1-5 - Muhammad Sharif, Jiangyan Yi, Muhammad Shoaib:

Unification of Balti and Trans-Border Sister Dialects in the Essence of LLMs and AI Technology. 244-248 - Xinwen Yue, Yupei Zhang, Jianqian Zhang, Zhiyu Li, Jing Wang, Shenghui Zhao:

Non-Intrusive Audio Quality Assessment Based on Deep Neural Network for Subjective MOS Prediction. 76-80 - Jiawei Ru, Maoshen Jia, Yuhao Zhao, Liang Tao:

A Dual-path Conformer-Based Network for Neural Speech Coding. 661-665 - Di Zhou, Daisuke Mizuguchi, Takeshi Yamamoto, Yasuhiro Omiya:

A Study on Depression Detection Through Explainable Features of Speech. 36-40 - Dake Guo, Jixun Yao, Xinfa Zhu, Kangxiang Xia, Zhao Guo, Ziyu Zhang, Yao Wang, Jie Liu, Lei Xie:

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge. 616-620 - Chen Chen, Xiaolou Li, Zehua Liu, Lantian Li, Dong Wang:

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective. 426-430 - Yuting Zhang, Xiaoying Xu:

The Effect of Focus Position on Downstep in Chinese Non-Interrogative Sentences. 1-5 - Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li:

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech. 294-298 - Yu Chen, Yu Bai, Ju Zhang:

A Study on the Effectiveness of Mandarin Seven-Sound Test Across Multiple Speakers. 106-110 - Shuwen Chen, Jun Gao, Zixuan Jia:

Production of Mandarin Chinese R-Suffix by Mandarin-Speaking Children: A Preliminary Study. 101-105 - Hengzhi Zhou, Mingyue Shi, Qinglin Meng:

Evaluating Speech Intelligibility for Cochlear Implants Using Automatic Speech Recognition. 1-5 - Yumei Zhang, Maoshen Jia, Xuan Cao, Zichen Zhao:

Speech Emotion Recognition Based on Shallow Structure of Wav2vec 2.0 and Attention Mechanism. 398-402 - Tingxiao Zhou, Leying Zhang, Yanmin Qian:

Knowledge Distillation from Discriminative Model to Generative Model with Parallel Architecture for Speech Enhancement. 179-183 - Jin Li, Lirong Dai:

Optimizing Deep Speaker Embeddings with a Dynamic Cross Triplet Framework. 378-382 - Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie:

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets. 26-30 - Qilong Yuan

, Di Zhu, Enze Shi, Kui Zhao:
The NWPU-BBIC System for the ISCSLP 2024 Chinese Auditory Attention Decoding Challenge. 329-333 - Rui Niu, Changhe Song, Zhiyong Wu:

NLPP: A Natural Language Prosodic Prominence Dataset Assisted by ChatGPT. 441-445 - Yubang Zhang, Jie Zhang, Zhenhua Ling:

The NERCSLIP-USTC System for Track2 of the First Chinese Auditory Attention Decoding Challenge. 319-323 - Hongfei Xue

, Yuhao Liang, Bingshen Mu, Shiliang Zhang, Mengzhe Chen, Qian Chen, Lei Xie:
E-Chat: Emotion-Sensitive Spoken Dialogue System with Large Language Models. 586-590 - Zhiqiang Duan, Jian Zhou, Cunhang Fan, Liang Tao, Zhao Lv:

CATAD: Conformer-Based Adversarial Training with Adaptive Diffusion for Bone-Conducted Speech Enhancement. 159-163 - Yi Zhang, Lishan Li, Xiaoying Xu:

Individual Differences in Tone Perception and Production in Emerging Dialect: A Case Study of Elementary School Children in Changsha. 269-273 - Wenbo Zhao, Ziwei Li, Chuan Yu, Zhijian Ou:

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer Based Streaming ASR. 11-15 - Cheng Chi, Xiaoyu Li, Rui Zhang, Xiaodong Li, Chengshi Zheng:

The Impact of Dynamic Cue and Audio Stimulus Type on Subjective Localization in VR Headsets. 86-90 - Yubo Zhou, Weizhen Bian, Kaitai Zhang, Xiaohan Gu:

Advancing Music Therapy: Integrating Eastern Five-Element Music Theory and Western Techniques with AI in the Novel Five-Element Harmony System. 234-238 - Rui Feng, Yin-Long Liu, Zhen-Hua Ling, Jia-Hong Yuan:

Wav2f0: Exploring the Potential of Wav2vec 2.0 for Speech Fundamental Frequency Extraction. 169-173 - Yuning Wu, Yifeng Yu, Jiatong Shi, Tao Qian, Qin Jin:

A Systematic Exploration of Joint-Training for Singing Voice Synthesis. 289-293 - Yu-Fei Shi, Yang Ai, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling:

SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features. 199-203 - Nan Li, Yadong Niu, Liushuai Yuan, Xihong Wu, Jing Chen:

A Spectral Change Enhancement Method Based on Self-Supervised Learning Framework. 571-575 - Chenyu Li, Zhongxuan Mao, Shanpeng Li:

Analysis of Normal and Slow Speech Rate on the F0 Contour of Tones in Mandarin Broadcasting Speech. 526-530 - Ruishan Li, Yanlu Xie:

Acoustic Features at Intonational Phrase Boundaries: Comparative Study of Native Speakers and L2 Learners of Chinese Mandarin. 671-675 - Honghong Wang, Xupeng Jia, Jing Deng, Rong Zheng:

Speech Emotion Recognition using Fine-Tuned DWFormer: A Study on Track 1 of the IERP Challenge 2024. 403-407 - Shuanghong Liu, Zhida Song, Zhihua Fang

, Liang He
:
LE-CAM++: A Lighter and More Efficient CAM++ for Speaker Verification. 393-397 - Rui Feng, Yu-Ang Chen, Yin-Long Liu, Jia-Hong Yuan, Zhen-Hua Ling:

Wav2Nas: An Exploratory Approach to Nasalance Estimation in Speech. 1-5 - Linfeng Feng, Xiao-Lei Zhang, Xuelong Li

:
Quantization-Error-Free Soft Label for 2D Sound Source Localization. 194-198 - Yuejiao Wang, Xianmin Gong

, Xixin Wu, Patrick C. M. Wong, Hoi-lam Helene Fung, Man-Wai Mak, Helen Meng:
Naturalistic Language-Related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder. 31-35 - Shihao Chen, Yu Gu, Jianwei Cui, Jie Zhang, Rilin Chen, Lirong Dai:

LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation. 309-313 - Kaixin Yang, Gaofeng Cheng, Ta Li, Qingwei Zhao, Yonghong Yan:

Query-by-Example Speech Search using Mamba and Random Offset Mixed Padding. 726-730 - Shixin Jiang, Ming Liu, Bing Qin

:
Fusion Pruning for Large Language Models. 349-352 - Tianyou Cheng, Maokui He, Gaobin Yang, Shutong Niu, Yanqiang Lei, Limei Peng, Jun Du:

Online Neural Speaker Diarization with Spectral Clustering for Meeting Scenarios. 373-377 - Zhengyang Chen, Shuai Wang, Bing Han, Yanmin Qian:

Combining Self-Supervised Learning and Adversarial Training Based Domain Adaptation for Speaker Verification. 701-705 - Yuanyuan Zhu, Jiaxu He, Ruihao Jing, Yaodong Song, Jie Lian, Xiao-Lei Zhang, Jie Li:

LLM-Based Expressive Text-to-Speech Synthesizer with Style and Timbre Disentanglement. 596-600 - Jingyu Li, Aemon Yat Fei Chiu

, Tan Lee:
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems. 388-392 - Zelin Qiu, Junfeng Li, Yingyi Luo:

Spatial Attention in Interfering Speech Perception. 61-65 - Haoyu Wang

, Tianrui Wang, Cheng Gong, Yu Jiang, Qiuyu Liu, Longbiao Wang, Jianwu Dang:
Expressive Speech Synthesis with Theme-Oriented Few-Shot Learning in ICAGC 2024. 606-610 - Jiawen Kang, Junan Li, Jinchao Li, Xixin Wu, Helen Meng:

Not All Errors Are Equal: Investigation of Speech Recognition Errors in Alzheimer's Disease Detection. 254-258 - Jingran Xie, Changhe Song, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu, Helen Meng:

CMAST: Efficient Speech-Text Joint Training Method to Enhance Linguistic Features Learning of Speech Representations. 656-660 - Xingguang Dong, Cunhang Fan, Hongyu Zhang, Xiaoke Yang, Sheng Zhang, Jian Zhou, Zhao Lv:

CSDA: Cross-Session Domain Adaptation in Auditory Attention Decoding of EEG for a Single Subject. 451-455 - Wei Chen, Xintao Zhao, Jun Chen, Binzhu Sha, Zhiwei Lin, Zhiyong Wu:

RobustSVC: HuBERT-Based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion. 164-168 - Xiaowang Liu, Xiaolin Wu, Jinsong Zhang:

A Study on the Information Mechanism of the 3rd Tone Sandhi Rule in Mandarin Across Word Boundaries. 546-550 - Dekun Chen, Zhizheng Wu:

Zh-Paral: Benchmark Dataset for Comprehension of Chinese Paralinguistic Speech. 363-367 - Ruibo Liu, Shuai-Xin Wang, Zhuang-Zhuang Liu, Jiang-Jiang Zhao, Yuling Ren, Yu Liu:

Convincing Audio Generation Based on LLM and Speech Tokenization. 591-595 - Yu Jiang

, Tianrui Wang, Haoyu Wang, Cheng Gong, Qiuyu Liu, Zikang Huang, Longbiao Wang, Jianwu Dang:
Expressive Text-to-Speech with Contextual Background for ICAGC 2024. 611-615 - Fuqian Wu, Xiyu Wu:

Contributions of Acoustic Factors to Tone Identification in Whispered Mandarin. 516-520 - Dawei Xiang, Yong Ma, Yiming Yang:

A Study of Brain Mechanisms by Which Sound Source Location and Amount of Masking Affect Target Perception. 239-243 - Jiahao Li, Cunhang Fan, Enrui Liu, Jian Zhou, Zhao Lv:

Dual-Strategy Fusion Method in Noise-Robust Speech Recognition. 16-20 - Zhengshun Xia, Ziyang Ma, Zhisheng Zheng, Xie Chen:

Improving Emotion Recognition with Pre-Trained Models, Multimodality, and Contextual Information. 636-640 - Zelin Qiu, Dingding Yao, Junfeng Li:

StreamAAD: Decoding Spatial Auditory Attention with a Streaming Architecture. 1-5 - Ruofan Yan, Shu Peng, Zhige Chen, Zhi-An Huang

, Rui Liu, Kay Chen Tan, Jibin Wu:
Enhancing Spatio-Temporal Auditory Attention Decoding with ST-AADNet. 334-338 - Wenjun Ding, Xinsheng Wang, Lijian Gao, Qirong Mao:

TF-DiffuSE: Time-Frequency Prior-Conditioned Diffusion Model for Speech Enhancement. 581-585 - Siyi Zhao, Wei Wang, Yanmin Qian:

Band-Wise Front-End Distortion Suppression for Robust Speech Recognition. 681-685 - Yaqin Wu, Yan Chang, Yanzhang Geng, Xiaofeng Cao, Jiawei Zhao:

GM-LPC Based Multiband Analysis and Enhancement of Pathological Voice. 174-178 - Yuanming Zhang, Zeyan Song, Haoliang Du, Xia Gao, Jing Lu:

Robustness and Generalization Capability Validation of Convolutional Neural Network on a Chinese EEG Auditory Attention Decoding Dataset. 51-55 - Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye:

Does Current Deepfake Audio Detection Model Effectively Detect ALM-Based Deepfake Audio? 481-485 - Boda Xiao, Bo Wang, Xuning Chen, Xiran Xu, Xihong Wu, Jing Chen:

Comparing Human-Labeled and LLM-Generated Semantic Features via Cortical Neural Representation. 666-670 - Hui-Peng Du, Yang Ai, Rui-Chen Zheng, Zhen-Hua Ling:

APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm. 676-680 - Weizhen Bian, Yubo Zhou, Kaitai Zhang, Xiaohan Gu:

EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations. 417-420 - Haonan Cheng, Kangyue Li, Long Ye, Jingling Wang:

EnvFake: An Initial Environmental-Fake Audio Dataset for Scene-Consistency Detection. 81-85 - Chong Cao, Qian Li:

The Role of F0 in the Recognition of Aspiration Contrasts in Mandarin. 536-540 - Pei-Jun Liao, Hung-Yi Lee, Hsin-Min Wang

:
Ensemble Knowledge Distillation from Speech SSL Models Considering Inter-Teacher Differences. 716-720 - Shitong Fan, Wenbo Wang, Feiyang Xiao, Shiheng Zhang, Qiaoxi Zhu, Jian Guan:

Independent Feature Enhanced Crossmodal Fusion for Match-Mismatch Classification of Speech Stimulus and EEG Response. 209-213 - Hanzhe Xu, Xuefei Liu, Cong Cai, Kang Zhu, Jizhou Cui, Ruibo Fu, Heng Xie, Jianhua Tao, Zhengqi Wen, Ziping Zhao, Guanjun Li, Le Wang, Hao Lin:

Temporal Shift for Personality Recognition with Pre-Trained Representations. 446-450 - Zonghui Wang, Zhihua Fang

, Zhida Song, Liang He
:
Simplified Skip-Connected UNet for Robust Speaker Verification Under Noisy Environments. 691-695 - Fengping Wang, Bingsong Bai, Yayue Deng, Jinlong Xue, Yingming Gao, Ya Li:

ExpressiveSinger: Synthesizing Expressive Singing Voice as an Instrument. 304-308 - Lishan Li, Xiaoying Xu:

Multiple Patterns of Merging Guangzhou Cantonese Tones in Production and Perception: Study on Youth Groups. 1-5 - Yueqian Lin, Dong Liu, Yunfei Xu, Hongbin Suo, Ming Li:

Bridging Facial Imagery and Vocal Reality: Stable Diffusion-Enhanced Voice Generation. 229-233 - Wen Huang, Bing Han, Zhengyang Chen, Shuai Wang, Yanmin Qian:

Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification. 383-387 - Jia-Jyu Su, Chen-Yu Chiang, Yue-Shan Chang, Chao-Yin Lin, Jiunn-Horng Kang, Min-Yuh Day:

A Preliminary Study on Constructing Mandarin Personalized Speech Recognition Systems for the Speech Impaired. 21-25 - Xiaoming Liang, Zhihua Huang:

The Contributions of Formants to the Intelligibility in Uyghur Sine-Wave Sentences. 1-5 - Shaochuan Zhang

, Fengji Li, Li Wang, Jie Zhou, Haijun Niu:
Tongue Model-Driven Method Based on Fully Connected Neural Network. 121-125 - Mewlude Nijat, Dong Wang, Askar Hamdulla:

A Fresh Review on Chinese Pronunciation Acquisition: Insights and Recommendations for L2 Foreign Children. 91-95 - Xiaoke Yang, Cunhang Fan, Hongyu Zhang, Xingguang Dong, Jian Zhou, Xinhui Li, Zhao Lv:

Cross-Subject Domain Adaptation for EEG-Based Auditory Attention Decoding via Prototypical Representation. 461-465 - Lukuan Dong, Donghong Qin, Fengbo Bai, Fanhua Song, Yan Liu, Chen Xu, Zhijian Ou:

Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-Based Multilingual Pretraining. 264-268 - Yuhao Zhao, Maoshen Jia, Jiawei Ru, Junqi Tai:

A Hybrid DFSMN and Mamba Architecture for Low Bitrate Neural Speech Coding. 1-5 - Fengyu Xu, Yongxiong Xiao, Qiang Fu:

ViT-Based EEG Analysis Method for Auditory Attention Detection. 324-328 - Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Xu Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu:

The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge. 641-645

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














