


default search action
APSIPA 2025: Singapore
- Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025, Singapore, October 22-24, 2025. IEEE 2025, ISBN 979-8-3315-7206-8

- Akira Tanaka:

Equivalence of Graph Signal Processing Using a Hermitian Graph Laplacian and its Corresponding Graph Laplacian with Duplicated Nodes. 1-5 - Ryoki Yamaguchi, Satoshi Miyata, Suehiro Shimauchi, Eiji Mochida, Seiji Fujiwara:

On LSTM-Based Behavioral Modeling of Radio-Frequency Power Amplifiers with a Small Training Dataset. 1-5 - Yu Morinaga, Naoto Kotake, Iori Hashimoto, Suehiro Shimauchi, Shigeaki Aoki:

Single-Channel Speech Enhancement in Spherical-Mapped Short-Time Spectral Domain. 1-5 - Chu-Chun Yang, Gwo Giun Lee, Tsung-Ying Tsai, Jie-Ren Zheng, Yue-Cong Kuo, Wei-Chieh Lee, Ryan Karthik Pary:

Algorithm-Architecture Co-Exploration of Systolic Arrays Using High-Level Synthesis. 1-5 - Hanke Xie, Dake Guo, Chengyou Wang, Yue Li, Wenjie Tian, Xinfa Zhu, Xinsheng Wang, Xiulin Li, Guanqiong Miao, Bo Liu, Lei Xie:

Dialospeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching. 1-6 - Takao Kawamura, Nobutaka Ono:

Training Acoustic Scene Classification Models Robust to Asynchrony in Distributed Microphone Arrays. 1-6 - Yuan-Jin Lin, Yu-Jen Chang, Chin-Hao Liang, Sung-Tsun Wei, Jia-Hong Weng, Shih-Lun Chen, Wei-Chen Tu:

Evaluation of Low-Resource and High-Efficiency Deep Learning Accelerator for Clinical Dental Diagnosis. 1-4 - Jetsada Arnin, Danial Kahani, Bernard A. Conway:

Enhanced Sliding Discrete Fourier Transform (eSDFT) With Error-Bound Control for Real-Time Parallel Processing. 1-5 - Kyungjune Lee, Youngjin Shin, Jungwoo Huh, Sanghoon Lee:

You Only Touch Once: One-Touch System for Personalized 3D Music Video Generation. 1-5 - Naohiro Kubota, Hideyoshi Miura, Tomotaka Kimura, Kouji Hirata:

A Reinforcement Learning-Based Approach to Cooperative Multi-UAV Task Allocation. 1-4 - Jin Xuan Teh, Yusuke Hioka:

MVDR Beamforming for Underdetermined Sound Source Separation using Iterative PSD Estimation in Beamspace. 1-6 - Xiaohan Pan, Runsen Feng, Henan Wang, Yixin Gao, Zhibo Chen:

GoP-to-Frame Encoder Adaptation for Learned Video Compression. 1-5 - Li Li, Shogo Seki:

Exploring Dual-Mode Training for Real-Time Target Speaker Extraction. 7-12 - Changda Chen, Yichen Yang, Yuehao Zhao, Shoji Makino, Jingdong Chen:

Switching Constant Separating Vector for Moving Source Extraction with Geometric Constraints. 13-18 - Yichen Yang, Chao Pan, Qiang Gao, Jacob Benesty, Shoji Makino, Jingdong Chen:

Neural Network-Assisted Joint DOA Estimation and Beamforming with First-Order Reflection Modeling. 19-23 - Rashed Iqbal, Christian H. Ritz, Jack Yang, Sarah K. Howard:

Speaker Localization in Classroom Environments Using GCC-PHAT Features and Mamba State Space Models with Ad-Hoc Microphone Arrays. 24-29 - Ryunosuke Nihei, Yoshiaki Bando, Aditya Arie Nugraha, Diego Di Carlo, Hiroyuki Ueda, Yosuke Ito, Kazuyoshi Yoshii:

Joint Separation and Tracking of Moving Sources with Distributed Microphone Arrays Based on Time-Varying Inertial Spatial Models. 30-35 - Haruaki Asano, Ryunosuke Nihei, Yoshiaki Bando, Aditya Arie Nugraha, Diego Di Carlo, Hiroyuki Ueda, Yosuke Ito, Kazuyoshi Yoshii:

Visually-Informed Multichannel Sound Source Separation Based on 3D Gaussian Primitives. 36-41 - Hayato Takeuchi, Takao Kawamura, Nobutaka Ono, Shoko Araki:

Joint Optimization of Sampling Rate Offsets and Demixing Filters Using Auxiliary Function Method. 42-47 - Shun Kotsugi, Takao Kawamura, Nobutaka Ono:

First Demonstration of Acoustic Scene Classification Based on Trained Sound-to-Light Conversion. 48-53 - Kouei Yamaoka, Katsuhiro Morita, Norihiro Takamune, Hiroshi Saruwatari:

Auxiliary-Function-Based Decentralized Independent Vector Analysis for Distributed Microphone Arrays. 54-59 - Shravan Raghunath, Kanishk AL, Sailesh S, Rishabh Gupta, Saurav Gupta, Ramesh R:

Interactive Spatial Audio Rendering on Mobile Devices: A Two-Stage User Interface with Adaptive HRTF Selection and Real-Time Room Acoustics Simulation. 60-65 - Takao Kawamura, Nobutaka Ono:

Are Identical Sounds Present in Distributed Recordings to Serve as Spatio-Temporal Anchors? A Case Study using the SINS Database. 66-71 - Ryota Imanaka, Yuting Geng, Masato Nakayama, Takanobu Nishiura:

Evaluation of Auditory and Tactile Perception for Augmented Sound-Image Enhancement Using Pre-Virtual-Leading Hypersonic Signals. 72-77 - Kenta Iwai:

Improvement in Variance Estimation in Variable-Step-Size Shared-Error NLMS Algorithm for Acoustic Echo and Noise Canceller. 78-82 - Shunxi Xu, Craig T. Jin:

Hierarchical Sparse Sound Field Reconstruction with Spherical and Linear Microphone Arrays. 83-88 - Weilong Huang, Longfei Felix Yan, Emanuël A. P. Habets:

Robust Superdirective Beamforming Using a Uniform Circular Array with Directional Microphones. 89-94 - Junwei Yeow, Ee-Leng Tan, Santi Peksi, Woon-Seng Gan, Qirui Huang:

Towards Robust Stereo 3-D SELD: A Study of Perceptual Features and Data Augmentation. 95-100 - Xiaoyang Liu, Yuma Kinoshita:

Pre-training Autoencoder for Acoustic Event Classification via Blinky. 101-106 - Mingxue Song, Jin Xuan Teh, Yusuke Hioka, Benjamin Yen, Hiroshi Saruwatari:

Sound Source Enhancement Using Power Spectral Density Estimation in Beamspace for a Dual Unmanned Aerial Vehicle System. 107-112 - Shaoheng Xu, Wei-Ting Lai, Yile Angela Zhang, Jihui Zhang, Amy Bastine, Prasanga N. Samarasinghe, Thushara D. Abhayapala:

Three-Dimensional Gradient-Based Tracking of Multiple Sound Sources. 113-118 - Ryoya Ogura, Tomoya Nishida, Yohei Kawaguchi:

Retrieval-Augmented Difference Captioning to Explain Unsupervised Anomalous Sound Detection. 119-124 - Kimihiro Hattori, Wen-Chin Huang, Kazuya Takeda, Tomoki Toda:

An Evaluation of Supervised Virtual Microphone Estimators in Reverberant Sound Fields. 125-130 - Taisei Takano, Yuki Okamoto, Yusuke Kanamori, Yuki Saito, Ryotaro Nagase, Hiroshi Saruwatari:

Human-CLAP: Human-perception-based Contrastive Language-audio Pretraining. 131-136 - Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das:

DG-SED: Domain Generalization for Sound Event Detection with Heterogeneous Training Data. 143-148 - Xuemai Xie, Xianrui Wang, Liyuan Zhang, Yichen Yang, Shoji Makino:

Accelerated Convolutive Transfer Function-Based Multichannel NMF Using Iterative Source Steering. 149-153 - Hao Liang, Yichen Yang, Xiao Zhang, Shoji Makino, Jingdong Chen:

DySiME: Dynamic Single-Source Multichannel Enhancement Using Time-Varying Directional Cues. 154-159 - Soushi Taninomiya, Daichi Kitamura, Norihiro Takamune, Kouei Yamaoka, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo, Hayato Yamakawa:

Demixing Filter Estimation for Bleeding-Sound Reduction of a Vocal Microphone. 160-165 - Kukuru Koiso, Taishi Nakashima, Nobutaka Ono:

Prior-Guided Source Separation with Direct Update of Back-Projected Demixing Vectors. 166-171 - Haoxiang Wu, Zhengqiao Zhao, Jingdong Chen, Jacob Benesty:

Meta-Learning with Pretrained Audio Representations Enables One-Shot Acoustic Signal Classification. 172-176 - Junkang Yang, Hongqing Liu, Liming Shi, Lu Gan, Hiromitsu Nishizaki, Chee Siang Leow:

A Semi-Supervised Acoustic Scene Classification Network Based on Multi-Modal Information Fusion. 177-181 - Bochao Sun, Dong Wang, Zhanlong Yang, Jun Yang, Han Yin:

ASCMamba: Multimodal Time-Frequency Mamba for Acoustic Scene Classification. 182-187 - Takao Kawamura, Masayuki Sera, Nobutaka Ono:

Evaluation of Low-Frequency Restriction, Pitch-Shift Augmentation, and Average Pooling for Acoustic Scene Classification Under Unseen-City Conditions. 188-192 - Jisheng Bai, Mou Wang, Haohe Liu, Bin Xiang, Ying Liu, Jianfeng Chen, Dongyuan Shi, Mark D. Plumbley, Susanto Rahardja, Woon-Seng Gan:

The APSIPA ASC 2025 Grand Challenge on City and Time-Aware Semi-Supervised Acoustic Scene Classification: Summary and Results. 193-197 - Rinka Nobukawa, Makito Kitamura, Tomohiko Nakamura, Shinnosuke Takamichi, Hiroshi Saruwatari:

Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology. 198-203 - Rumi Hiraga, Yuhki Shiraishi, Keiichi Yasu:

How Do Deaf and Hard of Hearing People Listen to Music Instruments? Subjective Evaluation and Acoustic Features. 204-209 - Aneeka Azmat, Li Su, ChengHsin Hsu:

Quality Assessment of DNN-Based Algorithms for Music Boundary Detection. 210-215 - Yui Uehara, Satoshi Tojo:

Note-level Nonchord-tone Identification with Graph Neural Networks. 216-221 - Sosuke Nishimura, Eita Nakamura:

Evaluation Score Prediction for Japanese Songs Based on Melody Fitness to Lyrics. 222-227 - Zih-Syuan Lin, Jun-You Wang, Li Su:

A Comparative Study of Statistical Features and Deep Learning for Orchestral Texture Classification. 228-233 - Weixing Wei, Kazuyoshi Yoshii:

Efficient Transformer-Based Piano Transcription with Sparse Attention Mechanisms. 234-239 - Hsin Ai, Yi-Hsuan Yang:

Transformer-Based Unpaired Piano Accompaniment Style Transfer. 240-245 - Hikari Miyaji, Keito Sawada, Wen-Chin Huang, Tomoki Toda:

Designing a Music Difficulty Measure for Controllable Automatic Piano Rearrangement. 246-251 - Samuel D. Bellows, Sarabeth S. Mullins, Brian F. G. Katz:

Vocal Onset Detection and Pitch Segmentation in Medieval Choral Music Guided by Original Notational Sources. 252-257 - Takaaki Nagoshi, Tetsuro Kitahara:

MORTM: MoE-Optimized Rhythmic Transformer Model for Symbolic MIDI Generation. 258-263 - Jiahao Zhao, Yunjia Li, Kazuyoshi Yoshii:

TAPA-ICL: Taxonomy-Aware Prompt Augmentation for in-Context Learning in Music Understanding. 264-269 - Anders Riddersholm Bargum, Naotake Masuda, Bogdan Teleaga, Andrew Fyfe, Cumhur Erkut:

Unified Timbre Transfer: A Compact Model for Real-Time Multi-Instrument Sound Morphing. 270-275 - Seonghyeon Go:

Real-World Music Plagiarism Detection with Music Segment Transcription System. 276-281 - Yuan Liu, Lingqing Liu, Yichen Yang, Shoji Makino:

Attention-Based Adaptive Structured Patchout Spectrogram Transformer for Music Classification. 282-287 - Ayumu Mitoma, Ken'ichi Furuya:

Accuracy Improvement of Automatic Chord Recognition with Source Separation Preprocessing. 288-292 - Chenyu Li, Ying Chen, Ruizhe Wang, Yujia Zhang:

Effects of Music Training Experience on the Production of English Rhythm by Chinese Learners. 293-297 - Keito Sawada, Wen-Chin Huang, Tomoki Toda:

Hierarchical Symbolic Music Generation with Variational Autoencoder-Based Bar-Wise Feature Sequences. 299-304 - Yu Sugimoto, Jun-You Wang, Li Su, Eita Nakamura:

Singing MIDI Transcription with Music Language Models: Formulation and Comparison. 310-315 - Leekyung Kim, Jonghun Park:

Data-Efficient Music Captioning Via Contrastive and Semantic Alignment. 311-316 - Komei Naemura, Boyu Cao, Ryotaro Nagase, Ryoichi Takashima, Yoichi Yamashita:

GAN-Enhanced InpaintNet for Music Inpainting on Limited Data. 317-322 - Minami Kawahara, Tetsuro Kitahara:

An Analysis of Singing Accuracy Towards Quantifying the Melodic Singability. 323-328 - Kuan-Yu Chen, Kuan-Lin Chen, Yu-Chieh Yu, Jian-Jiun Ding:

Guitar Tone Morphing by Diffusion-Based Model. 329-333 - Tomoki Hashida, Yuting Geng, Masato Nakayama, Takanobu Nishiura:

Design of Speech Leakage-Suppressed Audio-Spot Based on Auditory Masking Area Control with Active Masker Cancellation Using Parametric Array Loudspeakers. 334-339 - Maoto Mizutani, Kenta Iwai, Masato Nakayama, Takanobu Nishiura, Yoshiharu Soeta:

Multichannel Feedforward Active Noise Control System with Optical Laser Microphone in Reverberant Environments. 340-345 - Siyuan Lian, Xiaofeng Zeng, Ruquan Sun, Jing Lu:

Frequency-Domain Online Modeling of Multiple Secondary Paths Without Auxiliary Noise for Active Noise Control. 346-351 - Xiaoyi Shen, Dongyuan Shi, Woon-Seng Gan, Jun Yang:

Applying Model-Agnostic Meta-Learning with Iterative Dichotomiser 3 for Alternating-Switching Active Noise Control Systems. 352-357 - Junwei Ji, Dongyuan Shi, Zhengding Luo, Boxiang Wang, Ziyi Yang, Haowen Li, Woon-Seng Gan:

A Robust Proactive Communication Strategy for Distributed Active Noise Control Systems. 358-363 - Boxiang Wang, Zhengding Luo, Haowen Li, Dongyuan Shi, Junwei Ji, Ziyi Yang, Woon-Seng Gan:

Directional Selective Fixed-Filter Active Noise Control Based on a Convolutional Neural Network in Reverberant Environments. 364-369 - Harold Alexis Lao, Cheng-Yuan Chang:

An Online Secondary Path Modeling Technique in a Hybrid Active Noise Control System. 370-375 - Tianyou Li, Sipei Zhao, Haowen Li, Xiaofeng Zeng, Ruquan Sun, Jing Lu:

A Diffusion Remote Microphone Technique for Distributed Active Noise Control. 376-381 - Michael Anthony, Chih-Yen Wang, Ching En Huang, You-Siang Chen, Mingsian R. Bai:

An Integrated Active Noise Control and Crosstalk Cancellation System Designed Under a Generalized Model-Matching Framework. 382-387 - Tatsuya Murao:

Improvement of Noise Reduction in a Panel Combined with Multiple Loudspeakers Using Active Noise Control. 388-393 - Shota Toyooka, Ryo Matsuura, Kenta Iwai, Yoshinobu Kajikawa:

Selective Fixed Filter Sub-Band Active Noise Control System Based on Reference Signal Power Estimation. 394-399 - Jihui Aimee Zhang, Thushara D. Abhayapala, Naoki Murata, Prasanga N. Samarasinghe, Yu Maeno, Yuki Mitsufuji:

Performance Analysis of Active Noise Control Over a Spatial Region. 400-405 - Yuhang Yang, Liquan Shi, Ningyuan Liang, Guoyong Jin:

Electro-Acoustic Component Placement Optimization for Helicopter Cabin Anc Systems. 406-411 - Meiling Hu, Jing Lu, Qingyu Ma:

Spatial-Correlation-Based Error Weighting Method for Efficient Application of Filtered Reference Algorithm in Multichannel Active Noise Control. 412-416 - Junqing Zhang, Jingli Xie, Dongyuan Shi, Wen Zhang, Jingdong Chen, Jacob Benesty:

An Alternating Mode Strategy for Adaptive Sound Field Control and Acoustic Path Tracking. 417-422 - Haowen Li, Zhengding Luo, Dongyuan Shi, Boxiang Wang, Junwei Ji, Ziyi Yang, Woon-Seng Gan:

DOA Estimation with Lightweight Network on LLM-Aided Simulated Acoustic Scenes. 423-428 - Hanwen Zhang, Xiruo Su, Zhijuan Zhu, Bin Wu, Lingyun Ye:

Unsupervised Spectrogram Enhancement Algorithm Based on Bi-LSTM. 429-434 - Jingsong Xiao, Qirui Huang:

Continual Learning-Based Selective Fixed-Filter Active Noise Control. 435-440 - Ziyi Yang, Zhengding Luo, Dongyuan Shi, Junwei Ji, Boxiang Wang, Haowen Li, Qirui Huang, Woon-Seng Gan:

Meta-Learned Regional Initialization of Control Filters for Headphone Active Noise Control. 441-446 - Yile Angela Zhang, Wei-Ting Lai, Amy Bastine, Xingyu Chen, Lachlan Ian Birnie, Thushara D. Abhayapala, Prasanga N. Samarasinghe:

Ramdc: Room-Aware Multi-Device Clustering for Large Scale Teleconferencing. 447-452 - Hucheng Wang, Tao Liu, Junqing Zhang, Wen Zhang:

Multi-Channel ANC with Adaptive Kernel Assisted on-Line Secondary Path Modeling. 453-458 - Aoi Haneda, Yosuke Sugiura, Tetsuya Shimamura:

A Laplace Distribution-Based Variable Step-Size FxlogLMS Algorithm for Active Impulsive Noise Control. 459-464 - Wangxiaoxu Chen, Jiancheng Tao, Shuping Wang, Kai Chen, Haishan Zou, Xiaojun Qiu:

Research Progress on Active Control of Road Noise in Vehicles. 465-470 - Iori Hashimoto, Yu Morinaga, Suehiro Shimauchi, Shigeaki Aoki:

Anomalous Sound Detection Based on Derivative Features of Short-Time Holomorphic Fourier Transform. 471-476 - Yihao Zhao, Yichen Yang, Xiao Zhang, Shoji Makino:

Elastic Additive Angular Margin Loss Integrated with Mixup for Anomalous Sound Detection. 477-482 - Hui-Peng Du, Yang Ai, Zhen-Hua Ling:

A Distilled Low-Latency Neural Vocoder with Explicit Amplitude and Phase Prediction. 483-488 - Ryo Murakami, Natsuki Ueno:

Directional Filtering of Sound Fields for Emphasizing Specific Directions of Arrival and Its Applications. 489-494 - Takumi Koga, Natsuki Ueno:

Sound Field Estimation Method Robust to Microphone Position and Directivity Errors. 495-500 - Tran-Quang-Tuan Vo, Quoc-Huy Nguyen, Masashi Unoki:

Anomalous Sound Detection Using Time-Frequency Derivative of Instantaneous Phase Features. 501-506 - Ryuichi Hatakeyama, Toru Nakashika, Takuya Takahashi:

Few-Step Diffusion-Based Voice Conversion Using Consistency Trajectory Models. 507-512 - Huawei Zhang, Jihui Zhang, Huiyuan Sun, Prasanga N. Samarasinghe:

Spatial Audio Signal Enhancement: A Multi-Output MVDR Method in the Spherical Harmonic-Domain. 513-518 - Shifu Xiong, Hengshun Zhou, Kai Shen, Shi Cheng, Hang Chen, Genshun Wan, Kewei Li, Jun Du, Lirong Dai:

Language Adaptation Wake Word Spotting via Latent Space from Pre-Trained Speech Models. 519-524 - Tzu-Quan Lin, Hsi-Chun Cheng, Hung-yi Lee, Hao Tang:

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers. 525-530 - Jiun-Ting Li, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen:

Multi-Task Pretraining for Enhancing Interpretable L2 Pronunciation Assessment. 531-536 - Natsuo Yamashita, Masaaki Yamamoto, Yohei Kawaguchi:

End-to-End Integration of Speech Emotion Recognition and Voice Activity Detection with a Self-Supervised Model for Noise Robustness. 537-542 - Bowen Zhang, Nur Afiqah Abdul Latiff, Rong Tong, Donny Soh, Ian McLoughlin:

Scsmt: a Multilingual Children's Speech Corpus for Singapore's Mother Tongues. 543-548 - Ryu Takeda, Kazunori Komatani:

Reducing Orthographic Dependency on Paired Data by Probabilistic Integration via Syllabogram for Japanese Dialogue Speech Recognition. 549-554 - Haoyu Wang, Chunyu Qiang, Tianrui Wang, Cheng Gong, Yu Jiang, Yuheng Lu, Chen Zhang, Longbiao Wang, Jianwu Dang:

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS. 555-560 - Yuki Sato, Sanae Yamashita, Shinnosuke Takamichi, Ryuichiro Higashinaka:

Constructing an In-the-Wild Spoken Dialogue Dataset Based on Youtube Dialogue Videos. 561-566 - Jianing Yang, Sheng Li, Takahiro Shinozaki, Yuki Saito, Hiroshi Saruwatari:

Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement. 567-572 - Atsushi Kojima, Yusuke Fujita, Hao Shi, Tomoya Mizumoto, Mengjie Zhao, Yui Sudo:

Conversation Context-Aware Direct Preference Optimization for Style-Controlled Speech Synthesis. 573-578 - Angela Catherina, Bima Prihasto, Boby Mugi Pratama, Li-Wei Kang, Jia-Ching Wang:

A Hybrid Attention Mechanism to Improve Tacotron 2 Performance for Indonesian Text-to-Speech Synthesis. 579-582 - Zhenghai You, Zhenyu Zhou, Lantian Li, Dong Wang:

SpkAugTSE: A Simple and Efficient Approach to Address Target Confusion in End-to-End Speaker Extraction. 583-588 - Tianchi Liu, Ruijie Tao, Qiongqiong Wang, Yidi Jiang, Hardik B. Sailor, Ke Zhang, Jingru Lin, Haizhou Li:

Interpolating Speaker Identities in Embedding Space for Data Expansion. 589-594 - Yibo Bai, Sizhou Chen, Michele Panariello, Xiao-Lei Zhang, Massimiliano Todisco, Nicholas W. D. Evans:

MDD: A Mask Diffusion Detector to Protect Speaker Verification Systems from Adversarial Perturbations. 595-600 - Yu Guan, Wu Guo, Jie Zhang, Zhijun Zhang:

Fusing Multi-Layer Features of the Pre-Trained Model with Grouped Cross Attention for Spoofing Speech Detection. 601-606 - Zhijun Zhang, Wu Guo, Jie Zhang, Yu Guan:

Fusing Blocked Deep Features of Pre-Trained Models for Short-Duration Speaker Verification. 607-612 - Xiaolei Zhang, Zhihua Fang, Liang He:

Multi-level Adversarial Training with Data Augmentation for Robust Speaker Verification. 613-618 - Nirmalya Mallick Thakur, Jia Qi Yip, Eng Siong Chng:

Analysis of Speaker Verification Performance Trade-Offs with Neural Audio Codec Transmission. 619-624 - Masataka Kaneko, Wen-Chin Huang, Tomoki Toda:

Estimating Speaker's Seating Position from Monaural Speech in a Simulated Vehicle Interior Sound Field. 625-629 - Tran The Anh, Azmat Adnan, Yihao Wu, Chng Eng Siong:

Ts-Vad+: Modularized Target-Speaker Voice Activity Detection for Robust Speaker Diarization. 630-635 - Mohd Mujtaba Akhtar, Girish, Orchid Chetia Phukan, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Ananda Chandra Nayak, Sanjib Kumar Nayak, Arun Balaji Buduru:

Are Multimodal Foundation Models All That Is Needed for Emofake Detection? 636-641 - Fei Liu, Yang Ai, Zhen-Hua Ling:

Neural Speech Separation with Parallel Amplitude and Phase Spectrum Estimation. 642-647 - Masato Nagase, Kazunori Kojima, Shi-wook Lee, Yosiaki Itoh:

Introducing Self-Supervised Learning Models for Spoken Query-Spoken Term Detection. 653-657 - Ting Dang, Trini Manoj Jeyaseelan, Eliathamby Ambikairajah, Vidhyasaharan Sethu:

Characterization of Speech Similarity Between Australian Aboriginal and High-Resource Languages: A Case Study on Dharawal. 658-663 - Yumin Kim, Seonghyeon Go:

Segment Transformer: AI-Generated Music Detection via Music Structural Analysis. 664-669 - Zirui Lin, Haris Gulzar, Monnika Roslianna Busto, Akiko Masaki, Takeharu Eda, Kazuhiro Nakadai:

Dialect Identification Using Resource-Efficient Fine-Tuning Approaches. 670-675 - En-Wei Zhang, Hui-Peng Du, Xiao-Hang Jiang, Yang Ai, Zhen-Hua Ling:

A High-Quality and Low-Complexity Streamable Neural Speech Codec with Knowledge Distillation. 676-681 - Mizuki Kurasawa, Yoshiko Arimoto:

Effectiveness of Streaming ASR for Real-Time Laughter and Screaming Detection. 682-687 - Fong-Chun Tsai, Kuan-Tang Huang, Bi-Cheng Yan, Tien-Hong Lo, Berlin Chen:

Mitigating Data Imbalance in Automated Speaking Assessment. 688-693 - Michael Evan Santoso, Bhone Tay Zar Kyaw, Valentinus Roby Hananto, Victor V. Kryssanov:

An Information-Theoretic Approach to Data Selection for Generative Topic Modeling. 694-699 - Sandipan Dhar, Md. Tousin Akhter, Nanda Dulal Jana, Swagatam Das, Monorama Swain, Saurav Chowdhury:

Collective Learning-Based Optimal Transport GAN with Multi-Level Fine-Grained and Global Discriminators for Voice Conversion. 700-705 - Zihan Zhong, Qianli Wang, Satwinder Singh, Clarion Mendes, Mark Hasegawa-Johnson, Waleed Abdulla, Seyed Reza Shahamiri:

Beyond Binary Detection: Multi-Etiology Dysarthria Classification with Pre-Trained Speech Models. 706-711 - Qiang Fang:

A Dual-Path Speaker-Independent Acoustic-Toarticulatory Inversion Model Based on Content and Speaker Information Disentanglement. 712-717 - Bagus Tris Atmaja, Sakriani Sakti:

Dementia Prediction From Speech Signal Using Optimized Prosodic Features. 718-723 - ChenYi Chua, JunKai Wong, Chengxin Chen, Xiaoxiao Miao:

Speech Emotion Recognition Via Entropy-Aware Score Selection. 724-729 - Fo-Rui Li, Hsin-Te Hwang, Ming-Chi Yen, Men-Tung Lo, Yu Tsao, Hsin-Min Wang:

Improving Exemplar-Based Electrolaryngeal Speech Voice Conversion Via Robust Content Representations. 730-735 - Haoyu Song, Ian McLoughlin, Qing Gu, Nan Jiang, Yan Song:

An Efficient Transfer Learning Method Based on Adapter with Local Attributes for Speech Emotion Recognition. 736-740 - Songting Liu, Deheng Ye, Wei Yang, Haoyang Li, Eng Siong Chng:

ASRQ-VC: ASR-Guided Speech Content Quantization for High-Fidelity Voice Conversion. 741-746 - Yu Hayashizaki, Takashi Nose, Sumiharu Kobayashi, Satoru Fukayama, Akinori Ito:

PUNSER: Large-Scale Pre-Trained and Unified Model for Practical Speech Emotion Recognition. 747-752 - Kiseki Niwa, Kazuhiro Kobayashi, Tomoki Toda:

Investigation of the Effectiveness of Converted Speech Auditory Feedback in Low-Latency Real-Time Voice Conversion. 753-758 - Nopparut Li, Candy Olivia Mawalim, Masashi Unoki:

Study on Signal Processing Techniques in Protecting Voice Personae Against Speech Synthesis Systems. 759-764 - Joonyong Park, Daisuke Saito, Nobuaki Minematsu:

MixedG2P-T5: G2P-Free Speech Synthesis for Mixed-Script Texts Using Speech Self-Supervised Learning and Language Model. 765-770 - Jiawei Zhang, Tian-Hao Zhang, Jun Wang, Jiaran Gao, Ruijie Tao, Xinyuan Qian, Xu-Cheng Yin:

I2TTS: Image-Indicated Immersive Text-to-Speech Synthesis with Spatial Perception. 771-776 - Shaomeng Yang, Jiaming Luo, Jinran Wang, Rongfeng Su, Yongjie Zhou, Lan Wang, Nan Yan:

Chain-of-Thought Distillation for ASR Error Correction with Multimodal Large Language Models. 777-782 - Shuai Nie, Yaran Chen, Shan Liang, Jiaming Xu, Runyu Shi:

Direction-Guided Spatial Attention for Multichannel Speech Enhancement. 783-788 - Issei Sakata, Tetsuo Kosaka:

A Study of Japanese Mixed Emotional Speech Synthesis Based on an End-to-End Emotional Speech Synthesis Model. 789-794 - Haoyu Wang, Jiale Chen, Jiaxun Li, Sizhe Shan, Yuehai Wang:

EFTTS: Zero-Shot Emotional Speech Synthesis via Conditional Flow Matching and Self-Supervised Representations. 795-800 - Rui Zhou, Akinori Ito, Takashi Nose:

Improving Speech-to-Speech Translation for Low-Resource Languages via Transfer Learning. 801-806 - Takuya Takahashi, Saki Kugimoto, Toru Nakashika:

VICNet: FaderNet-Based Voice Impression Conversion with Affective Dimensional Representation. 813-818 - Yuehai Zhang, Yang Li, Yuehao Zhao, Shoji Makino:

Strategic Re-Weighting of U-Net Components in Diffusion Models for Enhanced Speech Enhancement Without Retraining. 819-824 - Takaki Koshikawa, Akinori Ito, Takashi Nose:

Fast and Speaker-Independent Utterance Selection for ASR-Free CALL Systems of Minority Languages. 825-830 - Naoki Muto, Chee Siang Leow, Junichi Hoshino, Takehito Utsuro, Hiromitsu Nishizaki:

Speech-Content-Driven Highlighting of Translated Lecture Slides for Foreign Language Lecture Understanding. 831-836 - Mehmet Sinan Yildirim, Ruijie Tao, Wupeng Wang, Junyi Ao, Haizhou Li:

Leveraging Language Information for Target Language Extraction. 837-842 - Nguyen Quoc Anh, Bernard Cheng, Kelvin Soh:

VietLyrics: A Large-Scale Dataset and Models for Vietnamese Automatic Lyrics Transcription. 843-848 - Reiya Marukawa, Takeshi Yamada:

Autofocus Neural Beamformer Based on Steering Vector Estimation. 849-854 - Daichi Yukizawa, Kenta Yamamoto, Ryu Takeda, Kazunori Komatani:

Estimating User Sentiment at Sub-Exchange Granularity From Exchange-Level Annotations. 855-860 - Arth J. Shah, Hiya Chaudhari, Kavya Kumar, Arushi Srivastava, Priya J. Kaple, Ravindrakumar M. Purohit, Dharmendra H. Vaghera, Bhavna Singh, Aparna Walanj, Abhishek Srivastava, Hemant A. Patil:

DAU-KDAH Dysarthic Multi-Lingual and Multimodal Speech Corpora for Indic Languages. 861-866 - Nanako Imaichi, Takuya Takahashi, Toru Nakashika:

Gamma-VAE-VC: Voice Conversion based on VAE Assuming Gamma Distribution for Both Latent Variables and Observation. 867-872 - Changsong Liu, Yizhou Peng, Eng Siong Chng:

Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation. 873-878 - Narthana Sivalingam, Uthayasanker Thayasivam:

Dimension 414 and Minimal Embedding Dimensions for Phonetic Feature Encoding in WavLM. 879-884 - Rui Zhang, Yuxuan Ke, Qunping Ni, Ge Yao, Xiaodong Li, Chengshi Zheng:

Directional Hybrid Optimization of HRTFs for Low-Order Spherical Harmonics Binaural Rendering. 885-890 - Kota Suzuki, Yosuke Sugiura, Tetsuya Shimamura:

Speech Enhancement Network with Windowed Cross Attention Using Noise-Reference Microphone. 891-896 - Siddharth Kumar, Nisarg Trivedi, Ravindrakumar M. Purohit, Hemant A. Patil:

BAANI: A 296M-Parameter Neural Vocoder for End-To-End Punjabi Speech Synthesis. 897-902 - Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari:

Active Learning for Text-to-Speech Synthesis with Informative Sample Collection. 903-908 - Tomohiro Tanaka, Ryo Masumura, Naoki Makishima, Mana Ihori, Shota Orihashi, Satoshi Suzuki, Taiga Yamane:

Semi-Supervised End-to-End Speech-to-Text Translation with Joint Text-to-Text and Speech-to-Text Decoding. 909-914 - Yu-Chen Kuan, Kuan-Yu Chen:

UTRo-NAST: Non-Autoregressive Speech Translation via Understanding, Translation, and Reordering. 915-920 - Ashley Fang Cai Xian, Ng Chen Ting, Ashley Kok Siu Cheng, Wah Yang Tan, Mohan Raj Chanthran, Lay-Ki Soon, Meisin Lee:

Laughing Across Borders: A Culturally-Aware Joke Generator for Asian Regions. 921-925 - Kaori Hashimoto, Takao Kawamura, Nobutaka Ono:

Synthesizing Vowel-Like Tones with Pitch Circularity. 926-931 - Matsuri Iwasaki, Masanobu Abe, Sunao Hara:

Error Correction Using LLMs for Sentence Estimation from Ambiguous Inputs via Wearable Keyboards. 932-937 - Sunil Kumar Kopparapu, Chitralekha Bhat, Ashish Panda:

A Robust End to End Spoken Grammar Assessment System. 938-943 - Sandipan Dhar, Mayank Gupta, Preeti Rao:

LAPS-Diff: A Diffusion-Based Framework for Hindi Singing Voice Synthesis with Language Aware Prosody-Style Guided Learning. 944-949 - Cheng Chi, Xiaoyu Li, Yuxuan Ke, Qunping Ni, Ge Yao, Xiaodong Li, Chengshi Zheng:

End-To-End Multi-Channel Speaker Extraction and Binaural Speech Synthesis. 950-955 - Tamon Mikawa, Yasuhisa Fujii, Yukoh Wakabayashi, Kengo Ohta, Ryota Nishimura, Norihide Kitaoka:

Improving Listening Head Generation Performance Using Speech Representations from Self-Supervised Learning. 956-961 - Jae Hyun Park, Seung Jae Choi, Young-Sik Eom, Allison Shindell, Min-Gwan Seo, Gyeong-Hoon Lee:

ULF-TTS: An Uncluttered Hybrid TTS System Using Language and Flow Matching Models. 962-967 - Ryuga Sugano, Hiroaki Sato, Asahi Sakuma, Tadashi Kumano, Yoshihiko Kawai, Shinji Watanabe:

Phoneme-Grapheme Dictionary-Based Prompting for Robust Proper Noun Recognition in Japanese ASR. 968-973 - Chen-Han Wu, Kuan-Yu Chen:

LLM-Driven Hypothesis Set Refinement for Enhanced ASR Post-Processing. 974-979 - Jotaro Emoto, Ryota Nishimura, Kengo Ohta, Norihide Kitaoka:

Real-time VAD-less Speech Recognition by Fine-tuning SSL Model with Data Containing Tagged Non-speech Segments. 980-985 - Ryota Uematsu, Chee Siang Leow, Norihide Kitaoka, Hiromitsu Nishizaki:

Improving Automatic Speech Recognition Model for Super-Elderly Voice Using Speech Synthesis Model. 986-991 - Yue Heng Yeo, Yuchen Hu, Shreyas Gopal, Yizhou Peng, Hexin Liu, Eng Siong Chng:

Improving Code-Switching Speech Recognition with TTS Data Augmentation. 992-997 - Yingyi Luo, Yue Huang, Qingke Sun, Shuwen Chen:

PQSR: A Speech Corpus of Polar Questions and Spontaneous Responses in Standard Chinese with Complex Intentions Annotated. 998-1003 - Kazuya Tsubokura, Yurie Iribe, Norihide Kitaoka:

Toward Natural System Repair: An Analysis of Human Other-Initiated Self-Repair Patterns in Japanese Casual Conversations. 1004-1009 - Hiya Chaudhari, Kavya Kumar, Hemant A. Patil:

Self-Supervised Learning for Classification of Normal vs. Dysarthric Speech. 1010-1015 - Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Panchal Nayak, Priyabrata Mallick, Swarup Ranjan Beherall, Parabattina Bhagath, Pailla Balakrishna Reddy, Arun Balaji Buduru:

Investigating Polyglot Speech Foundation Models for Learning Collective Emotion from Crowds. 1016-1021 - Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Parabattina Bhagath, Pailla Balakrishna Reddy, Arun Balaji Buduru:

Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: are Paralinguistic Pre-Trained Representations Sufficient? 1022-1027 - Jen-Tzung Chien, Willianto Sulaiman, Chung-Hsuan Wang:

Probabilistic Language-Aware Speech Recognition. 1028-1032 - Ryutaro Oshima, Yuya Hosoda, Youji Iiguni:

LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models. 1033-1038 - Xiaoran Li, Zilu Guo, Jun Du:

Multi-Stage Speech Enhancement with Cascaded SNR Domain Shifts. 1039-1044 - Puneet Bawa, Virender Kadyan, Shareef Babu Kalluri:

Autoencoder-Driven Latent Representation Learning for Language-Agnostic Disordered Speech Classification Using a Universal Feature Set. 1045-1050 - Youngeun Kwon, Yeri Byun, Hyunsung Cho, Jongwon Choi:

FH-RestoreASR: Frequency-Hopping Robust Air Traffic Control Speech Restoration and Recognition. 1051-1056 - Ryandhimas E. Zezario, Dyah A. M. G. Wisnu, Hsin-Min Wang, Yu Tsao:

Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM. 1057-1061 - Keiichi Funaki:

GCI Detection and Glottal Wave Estimation based on TV-CAR Speech Analysis. 1062-1067 - Xun Lu, Xuyang Wang, Gaofeng Cheng, Lin Zheng, Pengyuan Zhang:

HIPA-MoE: A Parameter-Efficient Fine-Tuning Architecture with Hierarchical Adapter-Based Mixture-Of-Experts for Multilingual ASR. 1068-1073 - Yan-Lin Lai, Erh-Yun Chang, Yi-Wen Liu, Jung-Lung Hsu, Hui-Chuan Hsu:

Mild Cognitive Impairment Detection Via Linear Discriminant Analysis of Picture Description Speech Features: A Cross Corpus Comparison. 1074-1079 - Susmita Bhattacharjee, Jagabandhu Mishra, Hanumant Singh Shekhawat, S. R. Mahadeva Prasanna:

Parameter-Efficient Fine-Tuning of Foundation Models for CLP Speech Classification. 1080-1085 - Jen-Tzung Chien, Bobbi Aditya:

Language Awareness in Code-Switching Speech Recognition. 1086-1091 - Minghui Wu, Haitao Tang, Jiahuan Fan, Ruizhi Liao, Yanyong Zhang:

End-to-End Simultaneous Dysarthric Speech Reconstruction with Frame-Level Adaptor and Multiple Wait-k Knowledge Distillation. 1092-1097 - Yuuto Nakata, Daiki Yoshioka, Wen-Chin Huang, Tomoki Toda:

Disfluency Disentanglement Enhancement in Spoken-Text-Style Transfer for Spontaneous Speech Synthesis. 1098-1103 - Minghui Wu, Xueling Liu, Jiahuan Fan, Haitao Tang, Yanyong Zhang, Yue Zhang:

DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement. 1104-1109 - Jen-Tzung Chien, Bryan Gautama Ngo:

Emotion-Rich Cross-Speaker TTS via Contrastive Prosody Enhancement. 1110-1115 - Umi Okamoto, Sei Ueno, Akinobu Lee:

Face-Conditioned Large-Scale Text-to-Speech via Speaker Embedding Prediction from Facial Images. 1116-1121 - Masayori Okamura, Masanobu Abe, Sunao Hara:

Few-Shot Speaker Adaptation for Text-to-Speech Synthesis Using Non-Target Speaker Corpora for Glossectomy Patients. 1122-1127 - Pan Xu, Zhongyu Zhang, Zhonghua Fu:

Personalized Bone-Conduction Bandwidth Extension with Speaker Characteristics. 1128-1133 - Hiroki Mori:

Time-Aligned Laughter Sound Event Recognition for Conversational Laughter Analysis and Synthesis. 1134-1139 - Wenyao Ma, Jun Yang:

PALGAN: A Joint Optimization-Based Preprocessing method for Speech Restoration in Parametric Array Loudspeakers. 1140-1145 - Jan Meyer Saragih, Faisal Mehmood, Sakriani Sakti:

Beyond One-Shot Dubbing: Leveraging N-Best Translation and Prompted Paraphrasing with Synchrony-Aware Re-Ranking. 1146-1151 - Weihao Tang, Guyang Zhang, Waleed Abdulla:

Honey Adulteration Detection via Robust Diffusion Classifier and Hyperspectral Imaging. 1152-1157 - Byunghyun Kim:

Semantic-Fast-SAM: Efficient Semantic Segmenter. 1158-1163 - Yu-Chen Lin, Yi-Jing Chen, Chih-Chang Yu, Hsu-Yung Cheng:

And Regional Selective Mixup. 1164-1169 - Hitoshi Ito, Naoto Shirai, Kazutaka Kinugawa, Hideya Mino, Yoshihiko Kawai:

100× Monolingual Data Augmentation Using LLMs to Build a Parallel Corpus for Machine Translation. 1170-1175 - Songjiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Kaiwen Xue, Kwan-Ho Lin, Yan-Ming Choi, Vincent Ng, Kin-Man Lam:

Enhancing Technical Documents Retrieval for RAG. 1176-1181 - Yun-Ting Sun, Lo-Ya Li, Tien-Hong Lo, Jeih-Weih Hung, Shih-Chieh Huang, Berlin Chen:

Lightweight Zero-Shot Keyword Spotting via Multi-Granular Knowledge Distillation. 1182-1187 - Ozgur Soysal, Arda Ozdemir, Yigit Yildirim, Orhan Arikan:

Monomial Matrix Relocation on the Loss Function Level-Set of Feedforward Neural Networks. 1188-1193 - Arda Ozdemir, Ozgur Soysal, Ege Doganay, Yigit Yildirim, Orhan Arikan:

Low-Rank Compression of Neural Network Weights by Null-Space Encouragement. 1194-1199 - Jiayu Shen, Kalin Stefanov, Vee Yee Chong, Lay-Ki Soon, KokSheik Wong:

Sign-MExD: An Expert-Infused Diffusion Model for Sign Language Production. 1200-1205 - Pham Hai Anh, Tran Trong Duy, Do Hai Son, Karim Abed-Meraim, Nguyen Linh Trung:

FlowEKF: Flow-Based Extended Kalman Filter. 1206-1211 - Akira Tamamori:

Kernel Ridge Regression for Efficient Learning of High-Capacity Hopfield Networks. 1212-1217 - Ryuto Ito, Hiromu Kanauchi, Hiroyasu Yasuda, Masaaki Nagahara, Shogo Muramatsu:

Sparse-Coded Time-Delay DMD with Control for Nonlinear State-Space Modeling on Graphs. 1223-1228 - Haru Ogawa, Daichi Kitamura, Shoma Ayano:

Nonnegative Matrix Factorization Using Dirichlet-Distribution-Based Regularization. 1229-1234 - Nawara Mahmood Broti, Masaki Iwasaki, Yumie Ono:

Significance of Co-Occurring Biomarkers in Localization of Epileptic Seizure Onset Zone. 1235-1240 - Silan Hu, Yulin Huang, Arjun Agarwal, Tanya Warrier, Yuwen Wang, Haozhe Ma, Zhengding Luo:

Reinforcement Learning in Portfolio Management: A Survey of Methods and Trends. 1241-1246 - Fengpei Li, Ziping Zhao:

Large Sparse Covariance Matrix Estimation via Dual Proximal Gradient Method. 1247-1252 - Hongjun Sheng, Lanqing Guo, Xinggan Peng, Zhiping Lin, Bihan Wen:

An Improved Method for Image Shadow Removal by Combining Deterministic and Stochastic Models. 1253-1258 - Po-Chuan Chen, Jen-Tzung Chien:

Knowledge-Infused Topic Model for Empathetic Dialogue Response. 1259-1263 - Xuyang Zhao, Hidenori Sugano, Toshihisa Tanaka:

Cross-Patient Seizure Onset Zone Classification by Patient-Dependent Weight. 1264-1269 - Kun-Chih Chen, Pin-Ching Shen, Bo-Chun Chen:

NOCTUA: A High-Efficiency Reconfigurable NoC-Based Transformer Universal Accelerator. 1270-1273 - Wen-Nung Lie, Kien Truc Le, Veasna Vann, Jui-Chiu Chiang, Ngoc Dung Bui:

Skeleton-Sequence-Based Early Action Recognition by Using Graph Convolutional Neural Networks and Knowledge Distillation Techniques. 1279-1284 - Yuzhe Li, Hangjing Zhang, H. Vicky Zhao:

A State-Dependent Model for Identification of Time-Varying Directed Graphs. 1285-1290 - Haruki Yokota, Hiroshi Higashi, Yuichi Tanaka:

Unrolled Multimodal Signal Restoration with Signed Twofold Graph Learning. 1291-1296 - Jia-Hong Weng, Yuan-Jin Lin, Wan-Hsun Tsai, Yu-Jie Yang, Wei-Chen Tu:

Efficient Sparse Matrix Acceleration for Deep Learning via Two-Step Bitmap Tensor Architecture. 1297-1300 - Purui Zhang, Feng Ji, Yanan Zhao, Wee Peng Tay, Bihan Wen:

Distance-Based Laplacian Algebra for Effective Subgraph Filter Learning. 1301-1306 - Jun Hirano, Jonethe Tan Yang, Fathin Acyuta Makarim, Daham Jayasinghe, KokSheik Wong:

Quantization Index Modulation-Based Reversible Data Hiding in Compressed Neural Network. 1311-1316 - Amorntip Prayoonwong, Yang-Chun Hsu, Xin-Jie Ye, Po-Kai Lu, Chih-Hang Wang, Chih-Yi Chiu:

Dense Vector Retrieval in Data Federation. 1317-1322 - Jun-Hong Ou, Bo-Xian Wang, Yu-Hong Zheng, Sufal K. Chhabra, Guo-Shiang Lin, Shen-Lei Yan, Chen-Kuo Chiang:

Organ Detection Based on Vision-Language Model for Abdominal CT Images. 1323-1326 - Chongchong Yu, Xiaolong Xu, Zhaopeng Qian, Kejing Xiao, Yuchen Tan:

Audio-Visual Fusion Framework for Low-Resource Language Speech Recognition Based on Progressive Down-Sampling and Grouped Multi-Heads Attention Mechanism. 1332-1337 - Mei-Lin Huang, Ching-Hung Lee, Cheng-Ting Huang, Hsin-Han Chiang:

A Data-Driven Control Framework Using Deep Reinforcement Learning for Autonomous Driving. 1338-1343 - Weiyi Xia, Satoru Fujita:

Recipe Diffusion: Cross-Frame Attention and Region-Aware Diffusion for Coherent Visual Recipe Instruction Generation. 1344-1349 - Yu-Wen Tung, Mei-Chen Yeh:

Improving Few-Shot Classification via Feature-Aligned AI-Generated Images. 1350-1355 - Yiqing Li, Satoru Fujita:

Rotation Invariant Automatic Rigging for 3D Human Scan Data. 1356-1361 - Yifei Ni, Andong Li, Lingling Dai, Erwei Yin, Qunping Ni, Chengshi Zheng:

SinDiffPhase: High-Quality Phase Estimation with Ultra-Fast Single-Step Diffusion. 1362-1367 - Konosuke Kobayashi, Satoru Fujita:

MapCVAE: Probabilistic Prediction of Diverse Pedestrian Behaviors on General Roads. 1368-1373 - Guan-Yuan Tan, Arghya Pal, Sailaja Rajanala, Raphaël C.-W. Phan, Chee-Ming Ting:

Herald: Democratizing Compositional Reasoning for Visual Tasks without Any Training. 1374-1379 - Brenda Ru Yi Sim, Sue Han Lee, Chung Siung Choo, Yuen Peng Loh:

Canopy to Canopy: Evaluating Model Generalization in 3D Tropical Forest Semantic Segmentation. 1380-1385 - Anning Jiang, Dianfeng Qiao, Shun Liu, Yan Liang:

LSTM-Transformer Hybrid Network for UAV-Bird Classification Using Radar Track Information. 1386-1391 - Elias Isaac Huai-En Lim, Nicholas Heng-Loong Wong:

A Unified Framework for Interpretable and Uncertainty-Aware Battery State of Health Estimation Using Deep Neural Networks. 1392-1397 - Guyang Zhang, Iman Ardekani, Waleed Abdulla:

Class Incremental Learning Using Continual Backpropagation on Honey Botanic Origin Classification with Hyperspectral Imaging. 1398-1403 - Zexin Zhang, Chengbiao Fu, Hongwei Guo, Anhong Tian:

Multi-Strategy Improved Electric Eel Foraging Optimisation Algorithm For UAV Path Planning. 1404-1411 - Cheng-Yu Chen, Daniil Buryakov, Valentinus Roby Hananto, Victor V. Kryssanov:

A Deep Reinforcement Learning Approach to Roundabout Traffic Signal Control. 1412-1417 - Tatsuya Hasegawa, Toshiyuki Nakanishi, Koichi Fujiwara:

2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). 1418-1422 - Jingyang Mai, Zechen Guo, Zhengding Luo, Haozhe Ma:

HasRL Robot: A Heterogeneous Asynchronous Reinforcement Learning System for High-Dimensional Bipedal Control. 1423-1428 - Jinran Wang, Jiaming Luo, Shaomeng Yang, Yongjie Zhou, Xuefang Zhang, Rongfeng Su, Nan Yan, Lan Wang:

A Psychological Strategy Annotation Method Using Multiple LLMs with a Chain of Thought Based on Deductive Reasoning. 1429-1434 - Koki Nose, Hajime Yano, Tetsuya Takiguchi, Seiji Nakagawa:

Outlier Removal in MEG Data for Imagined Speech Classification. 1435-1440 - Vinitar Khettar, Nuntikorn Kitratporn, Sawarin Lerk-u-suke, Jirabhorn Chaiwongsai, Phaisarn Jeefoo, Chanika Sukawattanavijit:

Indices for Extreme Rainfall Risk Mapping in Thailand Using XGBoost. 1441-1445 - Seiyu Hitomi, Hiroyasu Yasuda, Kiyoshi Hayasaka, Shogo Muramatsu:

Riverbed Estimation Using Locally-Structured Unitary Network. 1446-1451 - Yuuki Tachioka:

Contrastive Learning of Temporal and Event-Based Behavioral Views for Universal User Embeddings. 1452-1457 - Teng-Chih Yu, Jian-Jiun Ding:

Market Forecasting Using LSTM-ARIMA Model with MACD Decomposition. 1458-1463 - Alan Dao, Norapat Buppodom:

VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation. 1464-1469 - Heng Li, Cheng Cai:

Active Multi-Object Tracking for 3D Reconstruction with Hierarchical Reinforcement Learning. 1470-1475 - Weide Liu, Huijing Zhan:

Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach. 1476-1480 - Jeffrey Wu, Gareth W. Peters:

Modeling Spatiotemporal Multimodal Data with Kernel Graph Regression Models and Copulas. 1481-1486 - Xiwei Yu, Guoshun He, Huijing Zhan:

CopeCap: A Lightweight Image Captioning Model with Collaborative Prompt Learning. 1487-1496 - Tomoki Ariga, Jun Taniguchi, Yosuke Higuchi, Sayaka Toma, Kunihiro Abe, Rie Shigyo, Tetsuji Ogawa:

Lyric-Aware Karaoke Background Video Selection Using Large Language Models and Moment Retrieval. 1497-1502 - Fumiya Kondo, Satoshi Tamura:

Audio-Visual Speech Recognition based on Cross-Lingual Transfer Learning. 1503-1508 - Javier Si Zhao Hong, Timothy Zoe Delaya, Sherwyn Chan Yin Kit, Pai Chet Ng, Xiaoxiao Miao:

Exploring Machine Learning and Language Models for Multimodal Depression Detection. 1509-1514 - Yida Wu, Caiyun Wang, Jianing Wang, Xiaofei Li, Ying Nan:

A Hierarchical Attention Model for Local and Global Feature Integration in RCS Classification. 1521-1526 - Weisi Hua, Yixin Yang, Yuxuan Chen, Xianghao Hou:

A Sliding-Window Range-Bearing Scan STAP for Underwater Active Sonar Target Detection. 1527-1531 - Yue Wang, Ruifeng Li, Changsong Liu, Liangrui Peng, Ning Ding, Gang Yao:

TH-LDV: Transformer-Based Hybrid Method for Signal Detection in Laser Doppler Velocimetry. 1532-1537 - Duc Thien Nguyen, Konstantinos Slavakis, Dimitris Pados:

Estimating Dynamic Graph Flows with Kernel Models and Hadamard-Structured Riemannian Constraints. 1538-1543 - Tsutahiro Fukuhara, Junya Hara, Hiroshi Higashi, Yuichi Tanaka:

Period Estimation for Time-Varying Graph Signals and its Application to Graph Wiener Filter. 1544-1549 - Tatsuki Tokumura, Ayano Nakai-Kasai, Tadashi Wadayama:

Computationally Efficient Sparse Signal Recovery by Deep Unfolded-Periodic Sketched ISTA. 1550-1555 - Do Nguyen Dang Thi, Le Quoc Anh, Tran Trong Duy, Le Vu Ha, Nguyen Linh Trung:

Fisher Information-Based Metrics for Representation Learning. 1556-1561 - Woramet Simrum, Paweena Kanokhong, Chakapat Chokchaisiri, Somrudee Deepaisarn, Kittipisut Chansri, Chanyut Lisawat, Waranrach Viriyavit, Akkharawoot Takhom, Phutphalla Kong, Didin Agustian Permadi, Sharifah Hafizah Syed Ariffin, Surasak Boonkla, Kasorn Galajit, Jessada Karnjana:

Wave Direction Estimation Based on Local Gradient Techniques from Satellite Imagery for Coastal Dynamics Monitoring. 1562-1567 - Yujin Han, Taewan Kim:

HIQA-DB: A Benchmark Dataset for Image Quality Assessment in Hospital Surveillance. 1568-1571 - Dipanita Chakraborty, Minoru Okada, Kosin Chamnongthai:

Semantic Neural View Synthesis for Key Content Preservation in Horizontal-to-Vertical Video Conversion. 1572-1577 - Pei-Cheng Yeh, Chieh-Li Wang, Yuan-Hao Huang:

Low-Complexity Total Variation-Based Signal Reconstruction with Adaptive Gradient Descent for Compressive Sensing. 1578-1583 - Natsuki Yoshino, Akira Tanaka:

Robust Initialization Strategies for Hankel Structured Low-Rank Approximation via Variable Projection. 1584-1589 - Jiabao Wang, Shuai Shao, Jiaqi Wei:

High-Resolution ISAR Imaging for High-Speed Targets Via Joint Intra-Pulse and Inter-Pulse Translational Motion Compensation. 1590-1595 - Mingming Jin, Jun Wang, Shaoming Wei, Peng Lei:

Sparse Echo Reconstruction of Micro-Motion Targets Under the Joint Constraints of Low-Rank and Periodic Consistency. 1596-1601 - Kaidi Yang, Wei Xia, Mengqing Zhou:

Distributed Extended Object Tracking with Adaptive Networks. 1602-1607 - Runhe Gan, Wei Xia:

Extended Object Tracking: A DNN-Aided Approach. 1608-1614 - Haruki Esaki, Towa Yasui, Seisuke Kyochi:

Non-negative Learned ISTA with Reflected-ReLU-Augmented $\ell_{1}$ Regularization. 1615-1620 - Denawati Junia, Candy Olivia Mawalim:

Phoneme-Specific Challenges to Intelligibility in Hearing Impairment Under Noisy Condition. 1621-1626 - Niteesh K. R, Pooja T. S:

Predicting Problematic Internet Use in Children Using Feature-Rich Structured Data with Ensemble Machine Learning and Bayesian Optimisation. 1627-1632 - Ira Puspasari, Tati L. R. Mengko, Agung W. Setiawan, Miftah Pramudyo, Nobuo Watanabe, Trio Adiono:

Phonocardiogram Signal Analysis for Myocardial Infarction Level Prediction using Deep Learning Model. 1633-1638 - Kotaro Nagayama, Shota Kato, Kana Eguchi, Masahide Hamaguchi, Hiroyuki Tominaga, Youji Hamaguchi, Michiaki Fukui, Manabu Kano:

Prediction of Maximum and Minimum Postprandial Blood Glucose Levels in People with Diabetes. 1639-1644 - Yifan Zhang, Yuting Ding, Fei Chen:

Towards Telepathic Communication: A Multi-Band EEG Model for Imaginary Speech Decoding. 1645-1650 - Sivaraj Nimishan, Selvarajah Thuseethan, Shanmuganathan Vasanthapriyan, Roshan G. Ragel:

Tiny-VRN: A Lightweight Variational Residual Network for EEG-Based Emotion Recognition. 1651-1656 - Aprianto Dwi Prasetyo, Bagus Tris Atmaja, Dhany Arifianto, Sakriani Sakti:

A Comparison of Solicited and Longitudinal Cough Sounds for Tuberculosis Detection. 1657-1662 - Shota Miyagawa, Toshitaka Yamakawa, Masayuki Tanabe, Kazushi Ikeda:

Detecting Defecation Premonition from the Acoustic Activity of Bowel Sounds. 1663-1668 - Asif M. S, Sagila Gangadharan K., Achutavarrier Prasad Vinod:

EegCNR: A Novel Feature for Attention Estimation From EEG. 1669-1674 - Eshan Pandey, Xiaomeng Wang, Julian Gan, Ying-Hwey Nai, Derek J. Hausenloy, Pek Lan Khong, Forest Su Lim Tan, Thiruneepan Selvakulasingam, Ryan Fraser Kirwan, Cheryl Pei Ling Lian:

Lower Limb Calf Muscle Segmentation from Diffusion-Weighted Magnetic Resonance Images Using Deep Learning. 1675-1680 - Nguyen Thi Thu, Quang-Huy Tran, Luong Thi Theu, Duc-Tan Tran:

Principal Component Regularization in Iterative Inversion of DBIM for Ultrasound Tomography. 1681-1687 - Takuma Bingo, Hajime Yano, Taichiro Ashizaki, Kazuma Koda, Masaya Togo, Riki Matsumoto, Tetsuya Takiguchi:

Reasoning Visualization for Critical Care EEG Classification with Prototypical Part Networks. 1688-1693 - Andy Desman Lo, Elvin Nur Furqon, Junaidul Islam, Isack Farady, Kahlil Muchtar, Ronnie Concepcion, Chih-Yang Lin:

Plant Species-Specific Anomaly Detection Based on Electrophysiological Signals. 1694-1699 - Arth J. Shah, Vishnu Vardhan G. V. S, Hemant A. Patil:

Freeze and Learn Using KAN for Infant Cry Classification. 1700-1705 - Wilson Tansil, Nur Ahmadi, Timothy G. Constandinou, Dessi Puji Lestari:

Investigation of Enhancement Strategies for Recurrent Spiking Neural Network based Brain-Machine Interface Decoding. 1706-1711 - Yuto Ashikawa, Yosuke Kurihara:

Detecting Deceptive Responses Due to Psychological Bias by the Probability Density Function of EEG Content Rate Dynamics During NEO-FFI Answering. 1712-1717 - Tri Huynh, Xuan Hoc Pham, Nhu Nguyen, Thi Thu Nguyen, Huong Ha, Lua Ngo:

A Comparative Analysis of Statistical, Regional CNN, and Sequential Transformer Approaches for Alzheimer's Disease Classification. 1718-1723 - Orchid Chetia Phukan, Swarup Ranjan Behera, Girish, Mohd Mujtaba Akhtar, Arun Balaji Buduru, Rajesh Sharma:

Beyond Speech and More: Investigating the Emergent Ability of Speech Pre-Trained Models for Classifying Physiological Time-Series Signals. 1724-1729 - Woo-Seok Ahn, Seung-Hwan Lee, Han-Jeong Hwang:

Channel Selection Guided by Layer-Wise Relevance Propagation for CNN-Based EEG Classification of Major Depressive Disorder. 1730-1733 - Ju-An Park, Jun-Seok Lee, Na-Ri Kim, Han-Jeong Hwang:

Development of HRV-Based Biomarkers for Predicting Blood Glucose Levels. 1734-1737 - Sang-Ho Lee, In-Su Park, Han-Jeong Hwang:

Development of 3D Textile Electrodes for Electrocardiography Measurement. 1738-1741 - Nao Maeda, Tomotaka Kimura, Kouji Hirata:

Trajectory Design of UAVs-Assisted Edge Computing Systems for Efficient Data Collection from Animal Herds. 1746-1749 - Funa Fukui, Yutaka Fukuchi, Kouji Hirata:

Priority-Based RCSA Method Considering Required Frequency Slot Width in Multi-Core Fiber Networks. 1750-1754 - Xiaoqing Tong, Kohei Mitani, Kazunori Hayashi, Koji Yamamoto, Takuto Arai, Shuki Wai, Tatsuhiko Iwakuni, Daisei Uchida:

Retraining-Free Blockage Prediction for Millimeter-Wave Communications Based on Minor Components of Angular Power Profiles. 1755-1760 - Takeru Nanjo, Osamu Takyu:

Modified Resource Allocation Algorithm Based on Co-Channel Interference Prediction in Local 5G Environments. 1761-1766 - Yuto Hayasaka, Koichi Adachi:

Implicit Interference Status Notification Through Time & Frequency Resource Selection in LoRaWAN. 1767-1771 - Kohei Yuzawa, Zhengdong Lin, Yu Kagaya, Yoshiaki Narusue, Takeo Fujii:

Wireless Environment Estimation with Directional Antennas using Radio Environment Database for Wireless Information and Power Transfer in Smart Factories. 1772-1777 - Ryoichi Kawaguchi, Shinsuke Ibi, Hisato Iwai:

Data-Driven Tuning of Neural Network Aided Least Squares for UWB-TDoA Indoor Positioning. 1778-1783 - Wei-Lin Chiang, Shu-Yu Lin, Jung-Chun Chi, Yuan-Hao Huang:

Low-Complexity Separate Channel Estimation for RIS-Aided MIMO Communications. 1784-1789 - Xianwen Ling, Kun Zhang, Rong Tong, Dianying Chen:

BFIS: Efficient Unknown Protocol Feature Extraction Method for Satellite Communication Systems. 1790-1795 - Tomoka Mori, Hiroshi Tatsukawa, Yuji Kawai, Yoshinori Shinohara, Hiroki Ikeda, Daisuke Hisano:

Outdoor Experiment of Deep Joint Source-Channel Coding Using FFT-Enabled Convolutional Neural Network for Image Transmission. 1796-1800 - Khushi Shah, Lakshit Pathak, Akshita Abrol, Kanak Jain, Rajesh Gupta, Parishi Shah, Sudeep Tanwar, Umesh Bodkhe, Tong Rong:

DL-Based Optical Fibre Fault Detection for Healthcare Telesurgery Communication System. 1806-1811 - Zhaohang Zhang, Chunzhe Wang, Zhen Huang, Yafeng Zhan:

Overcoming Imperfect Detection Limitations: Deep Learning-Based Calibration Strategy for Rotating Interferometric Arrays. 1812-1817 - Tatsuro Hidaka, Osamu Takyu, Kei Inage, Takeo Fujii, Kohei Yoshida, Masayuki Ariyoshi:

A Regional Clustering Method Based on Propagation Similarity for Modeling Cumulative Interference from Large Numbers of Terminals. 1818-1823 - Dinh Tuan Anh, Bui Tung Lam, Pham An Duy, Pham Minh Tuan, Tran Vinh Co, Nguyen Huu Tinh, Huynh Cong Bang:

Radio Frequency Fingerprinting-Based Device Identification Using Deep Metric Learning. 1824-1829 - Chaowen Tang, Tian Qin:

GNSS Spoofing Detection Based on LSTM-TNN-CVAE Network. 1830-1834 - Teh Kah Kuan, Hanwu Sun, Tran Huy Dat:

Enhancing Speech Quality in Scintillating Satellite Communications: A Rician Fading Modeling Approach. 1835-1840 - Minori Kondo, Masaki Aono, Kazuki Shimizu, Masashi Hashimoto, Takeshi Miyaji, Kei Nomura:

Ensemble Methods for Estimating the Localization of Coronary Stenosis from CT Images Using 3D CNN Models. 1841-1846 - Eliathamby Ambikairajah, Tharmakulasingam Sirojan, Vidhyasaharan Sethu:

Tiered Assessment for DSP Education: Exploring Students' Motivation and Performance. 1847-1852 - Taisei Kato, Ryo Hayakawa, Soma Furusawa, Kazunori Hayashi, Youji Iiguni:

An Investigation of Parameter Scheduling for Image Restoration in Optical Analog Circuits. 1853-1858 - Taishin Miura, Shunsuke Ono, Ryo Matsuoka:

Robust Cloud Removal from Optical Satellite Images Using Synthetic Aperture Radar and Multimodal Embedding Prior. 1859-1863 - Maharu Oda, Ryo Matsuoka:

Reflection and Noise Separation from Polarized Images Via Joint Nonnegative Matrix Factorization and Plug-And-Play Denoising. 1864-1867 - Yun Li, Hanmin Li, Kin-Man Lam:

Gated Probabilistic Diffusion for Temporal Action Segmentation. 1868-1873 - Hiroyuki Nishimoto, Toru Takahashi, Masakazu Yoshida:

Theory of Spherical VR Model for Landscape Representation. 1874-1879 - Dayan Perera, Fung Fung Ting, Vishnu Monn Baskaran:

HyTver: A Novel Loss Function for Longitudinal Multiple Sclerosis Lesion Segmentation. 1880-1885 - Nimol Thuon, Jun Du:

KH-FUNSD: A Hierarchical and Fine-Grained Layout Analysis Dataset for Low-Resource Khmer Business Document. 1886-1891 - Ming-Hsun Mo, Pin-Wen Huang, Jian-Jiun Ding:

Effective Speckle Noise Reduction Using Transformed Bayesian Likelihood with Wiener-Based and Sketch-Based Geometric Priors. 1892-1897 - Rui-Yang Ju, KokSheik Wong, Jen-Shiun Chiang:

Efficient Generative Adversarial Networks for Color Document Image Enhancement and Binarization Using Multi-Scale Feature Extraction. 1898-1903 - Zehua Liu, Xiaolou Li, Li Guo, Lantian Li, Dong Wang:

Leveraging Large Language Models in Visual Speech Recognition: Model Scaling, Context-Aware Decoding, and Iterative Polishing. 1904-1909 - Yonghui Tao, Mathis Quere, Yusuke Hioka, Stephen Marsland:

Computationally-Efficient Call Classification of New Zealand Birds Using Texture-Based Features. 1910-1915 - Yoshiaki Tanabe, Shuntaro Masuda, Gakumatsu Ryu, Naoto Tanji, Hiroyuki Seshime, Ling Xiao, Toshihiko Yamasaki:

Incorporating Semantic Visual Content into Click-Through Rate Prediction for Video Advertisements. 1916-1921 - Ragib Amin Nihal, Benjamin Yen, Takeshi Ashizawa, Katsutoshi Itoyama, Kazuhiro Nakadai:

From Blurry to Brilliant Detection: YOLO-Based Aerial Object Detection with Super Resolution. 1922-1927 - Tian Qin, Lijing Bu, Zhengpeng Zhang, Mingjun Deng, Yin Yang, Jingxue Wang, Xinyu Lan, Wenjuan Peng, Yang Hu:

ATJO: Adaptive Three-Dimensional Joint Optimization for Remote Sensing Video Super-Resolution. 1928-1933 - Hongwei Guo, Yipeng Liu, Lei Luo, Chengbiao Fu, Ce Zhu:

Block-Level Lagrange Multiplier Adaptation Based on Distortion Propagation Factors. 1934-1939 - Ibuki Muta, Yoshimitsu Kuroki:

Distributed Compressed Video Sensing with Enhanced Boundary Handling Based on Extended Convolutional Sparse Representation. 1940-1945 - Ryo Masumura, Shota Orihashi, Mana Ihori, Tomohiro Tanaka, Naoki Makishima, Taiga Yamane, Naotaka Kawata, Satoshi Suzuki, Taichi Katayama:

Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-Trait Recognition. 1946-1951 - Jiyong Yu, Luheng Jia, Yifan Zang, Zhaoyang Yu, Shuyuan Zhu, Li Song, Kebin Jia:

Foreground-Background Segmentation Based Surveillance Video Coding. 1952-1957 - Yaya Huang, Litong Liu, Koksheik Wong:

Rain Removal Via Vae-Enhanced Transformer with Hierarchical Feature Integration. 1958-1963 - Jiaxiang Meng, Hardik B. Sailor, Qiongqiong Wang, Tianchi Liu, Kong Aik Lee, Xingmei Wang:

Exploring Audio-Visual Fusion Methods in Foundation Model-Based Deception Detection. 1964-1968 - Shintami Chusnul Hidayati, James Rafferty Lee, Kevin Davi Samuel:

Emot-CM-BERT: Adaptive Attention and Class-Aware Cross-Modal Learning for Emotion Recognition from Audio and Text. 1969-1974 - Bowen Gao, Zhicheng Lu, Mingyi He, Yuchao Dai:

DP-GS: Depth-prior & Perception-guided Gaussian Splatting for Sparse-view Novel View Synthesis. 1975-1980 - Mingjing Yi, Yuxi Wang, Ming Li:

Efficient Video to Audio Mapper with Visual Scene Detection. 1981-1985 - Yoga Tiara Wiguna, Bima Prihasto, Boby Mugi Pratama, Chia-Hung Yeh, Jia-Ching Wang:

Adversarial Learning for Duration Prediction in Indonesian Text-to-Speech: Modification to Stochastic and Deterministic Predictors. 1986-1990 - Shumpei Saito, Hiroyuki Ueda, Yosuke Ito, Kazuyoshi Yoshii:

Narrativity-Aware Video Summarization Based on Vision and Language Foundation Models. 1991-1996 - Yang Xiao, Ting Dang, Rohan Kumar Das:

RawTFNet: A Lightweight CNN Architecture for Speech Anti-Spoofing. 1997-2001 - Wenqiang Sun, Han Yin, Jisheng Bai, Jianfeng Chen:

Dynamic Fusion Multimodal Network for Speechwellness Detection. 2002-2007 - Yuanjian Chen, Han Yin:

Ensemble Confidence Calibration for Sound Event Detection in Open-Environment. 2014-2019 - Wenmiao Gao, Han Yin:

Enhancing Stereo Sound Event Detection with Bimamba and Pretrained PSELDNet. 2020-2025 - Lim Kit Michael Ye, Kaijian Zheng, N. F. Law, Jianping Li:

The Potential of LLMs for Generating Malicious Domain Names. 2026-2031 - Kosei Suayama, Kazuaki Nakamura:

Reducing Implicit Class Imbalance in Unlabeled Datasets Using Text-Specified Sensitive Attributes. 2032-2037 - Cheng-Yeh Yang, Kuan-Tang Huang, Chien-Chun Wang, Hung-Shin Lee, Hsin-Min Wang, Berlin Chen:

DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction. 2038-2043 - Haoran Sun, Chen Cai, Kong Aik Lee, Lap-Pui Chau, Yi Wang:

Multimodal Large Language Model for Deepfake Video Detection and Description. 2044-2049 - Parvathy Remesh, Jijomon Chettuthara Moncy, A. P. Vinod:

Biometric Identification Using Default Mode Network Features Extracted from Eyes-Open Resting-State EEG Data. 2050-2055 - Shota Iwamatsu, Koichi Ito, Takafumi Aoki:

Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods. 2056-2061 - Temma Tanaka, Kazuaki Nakamura:

Access Control for Diffusion Models by Random Masking the Covariance of Initial Noise Distribution. 2062-2067 - Shunya Ishikawa, Yuki Katsumata, Toru Nakashika:

Voice Privacy Protection with Adversarial Examples Using Anchor Speaker Embedding. 2063-2068 - Rui Wang, Liping Chen, Kong Aik Lee, Zhengpeng Zha, Zhenhua Ling:

Investigation of Perception Inconsistency in Speaker Embedding for Asynchronous Voice Anonymization. 2074-2079 - Ridwan Arefeen, Xiaoxiao Miao, Rong Tong, Aik Beng Ng, Simon See:

SegReConcat: A Data Augmentation Method for Voice Anonymization Attack. 2080-2085 - Arth J. Shah, Aniket Pandey, Satyam R. Tiwari, Hemant A. Patil:

An Enhanced Probabilistic Approach for Singfake Generation. 2086-2091 - Dohyun Yoon, Tomoki Toda:

Neural Semi-Fragile Watermarking for Proactive Deepfake Speech Detection. 2092-2097 - Takuo Yamaguchi, Sayaka Shiota, Naohiro Tawara:

Investigating Self-Supervised Learning-Based Front-End for Multi-Channel Replay Attack Detection. 2098-2103 - Kotaro Nakamura, Takuya Takahashi, Toru Nakashika:

Transferability of Adversarial Examples Across Speaker Embedding Models for Voice Privacy Protection. 2104-2109 - Kohei Tanaka, Hitoshi Kiya, Sayaka Shiota:

Voice Privacy Preservation with Multiple Random Orthogonal Secret Keys: Attack Resistance Analysis. 2110-2115 - Sumiharu Kobayashi, Takashi Nose, Akinori Ito:

CycleSiFiNF-VC: Controllable Non-Parallel Voice Conversion by Neural Formant Manipulation with Improved Cycle-Consistency Loss. 2116-2121 - Chenshuai Shu, Tianpeng Zheng, Yanxiang Chen:

Recoverable Audio Adversarial Examples for Voice Protection in One-shot Voice Conversion. 2122-2127 - Yangyang Qu, Michele Panariello, Massimiliano Todisco, Nicholas Evans:

Reference-Free Adversarial Sex Obfuscation in Speech. 2128-2133 - Yusaku Kato, Shoko Imaizumi:

Reversible Data Hiding in EtC Images with Flexible Access Privileges. 2134-2139 - Teruki Sano, Minoru Kuribayashi, Masao Sakai, Shuji Isobe, Eisuke Koizumi, Zhang Zhang:

Robust Ownership Verification of DNN Models Against JPEG Compression via Probability-Controlled Adversarial Attacks. 2140-2145 - Junsuke Takano, Kazuaki Nakamura:

Detoxification of Poisoned Recognition Models by Fine-Tuning with Out-of-Distribution Samples. 2146-2151 - Alexander Berns, Reon Akai, Minoru Kuribayashi, Rémi Cogranne:

Layer-Wise Weight Statistics for Node Classification and Defense of Federated Large Language Models. 2152-2157 - Keiichi Mori, Masaki Kawamura:

Robustness Evaluation Against Fine-Tuning in Associative Watermarking Method for CNN. 2158-2163 - Anna Yamaguchi, Shoko Imaizumi:

Lossless Image Processing for OpenEXR Images with Flexible Functions. 2164-2169 - Ou Egami, Masaki Kawamura:

Proposal of a Random Encoding Layer Compatible with Arbitrary Message Lengths for Diffusetrace. 2170-2175 - Darren Kah Hou Quek, Guang Hua, Zhiping Lin:

Automatic Dependent Surveillance-Broadcast Preamble Classification for Spoofing Detection. 2176-2180 - Hayato Shoji, Kazuaki Nakamura:

Model Extraction Attack and Its Countermeasure for Denoising Diffusion Implicit Models. 2181-2186 - Mei Hashimoto, Michiharu Niimi:

Content-Aware Dominant Color Extraction and its Application to Mltiple-Key-Color Image Retrieval. 2187-2192 - Jing Liang, Yuxuan Wang, Tingting Song, Ce Zheng, Peiya Li:

Privacy-Preserving Image Retrieval Scheme Using Combined Features in Cloud Computing. 2193-2198 - Huhong Xian, Rui Liu, Berrak Sisman, Haizhou Li:

NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation. 2199-2204 - Hieu-Thi Luong, Inbal Rimon, Haim H. Permuter, Kong Aik Lee, Eng Siong Chng:

Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation. 2205-2210 - Janne Laakkonen, Ivan Kukanov, Ville Hautamäki:

Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection. 2211-2216 - Wangjie Li, Lin Li, Qingyang Hong:

Continual Audio Deepfake Detection via Universal Adversarial Perturbation. 2217-2222 - Suresh Veesa, Badugu Vamsi Krishna, Madhusudan Singh:

Exploring Source Features with Deep Residual Neural Networks for Replay Attack Detection. 2223-2228 - Shaoqi Tang, Zeyan Liu, Liping Chen, Kong Aik Lee, Tomoki Toda, Zhenhua Ling:

A Preliminary Study on Sectional Voice Anonymization and Detection. 2229-2234 - Soham Gangopadhyay, Inderpreet Singh, Prateek Pandya, Ashish Mani, Sumit Goswami:

ArcticEcho: A Novel Speaker-Controlled Voice Cloning Dataset for Modern Deepfake Detection Benchmarking. 2235-2240 - Siqing Qin, Kong Aik Lee, Man-Wai Mak, Pasquale Lisena, Massimiliano Todisco:

Variational Regularization for End-to-End Speech Deepfake Detection. 2241-2246 - Arth J. Shah, Aniket Pandey, Manav A. Gaikwad, Hemant A. Patil:

A Wavelet Tour of Audio Deepfake Detection. 2247-2252 - Haorui He, Yuchen Song, Yuancheng Wang, Haoyang Li, Xueyao Zhang, Li Wang, Gongping Huang, Eng Siong Chng, Zhizheng Wu:

Noro: Noise-Robust One-Shot Voice Conversion with Hidden Speaker Representation Learning. 2247-2251 - Rishith Sadashiv T. N., Abhishek Bedge, Saisha Suresh Bore, Jagabandhu Mishra, Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna:

Fusion of Modulation Spectrogram and ssl with Multi-Head Attention for Fake Speech Detection. 2253-2258 - Taejun Roh, Yejin Cho, Duong Hai Nguyen, Chul Lee:

Single-Image Pupil Localization via Implicit 3D Eye Reconstruction. 2264-2269 - Jaeseok Jang, Chang-Su Kim:

Flow-Guided Consistent Video Depth Estimation for Cross-Dataset Generalization. 2270-2275 - Tianxiang Lan, Mingyi He, Yuchao Dai:

DCB: An Efficient Approach for Building Long-Range Dependencies in CNNs. 2276-2281 - Tianwen Zhang, Ju-Won Seo, Kang-Min Kim, Keunsoo Ko:

A User-Guided and Local Motion-Adaptive Framework for Virtual Product Placement in Video. 2282-2286 - Jaekyung Ryu, Nam Ik Cho:

Shallow yet Perceptual Decoding for Neural Image Compression Through Minimal Nonlinearity. 2287-2292 - Khai Pin Ang, Iven Zi Yin Low, Yumun Hooi, Yuen Peng Loh:

Syncscore: A Framework for Synchronization Scoring in Group Sports Via Human Pose Estimation. 2293-2298 - Thanh-Phuc Dao, Huyen-Trang To, Hoang-Son Bui, Thi-Lan Le:

Data Augmentation-Driven Segmentation of Ovarian Tumor Ultrasound Images Using Vision Mamba. 2299-2304 - Shumin Jiang, Hao Qin, Tianyi Liu, Yi Wang:

Optimizing JPEG Decoder for Bitstream-Corrupted Image Restoration. 2305-2310 - Jiun Yen Ching, Lai-Kuan Wong, Fabian Wai-Lee Kung:

Semantic Scene Completion from a Single Depth Image with Coarse-Grained Segmentation. 2311-2316 - Shunta Kimura, Handie Shao, Shogo Matsumoto, Daiki Yamada, Toshihiro Kitajima, Hideki Nakayama:

Pixel-Weighted Domain Adaptation for Agricultural Segmentation. 2317-2322 - Nhat-Tuong Do-Tran, Ngoc-Hoang-Lam Le, Ian Chiu, Po-Tsun Paul Kuo, Ching-Chun Huang:

TRUST: Token-dRiven Ultrasound Style Transfer for Cross-Device Adaptation. 2323-2329 - Wo-Yen Li, Chia-Ming Lee, Chih-Chung Hsu, Volodymyr Khylenko, Li-Wei Kang:

Two-Stage Transformer-Based Deep Hyperspectral and Multispectral Image Fusion Network for Hyperspectral Image Super-Resolution. 2330-2335 - Lien-Chieh Huang, Ching-Te Chiu, Yung-Cheng Su:

Pedestrian Detection Based on Visible Guided Occlusion Handling. 2336-2341 - Chen Lo, Chia-Hung Yeh:

Spatial-Frequency Guided Moiré Removal with Multi-Stage Feature Fusion. 2342-2346 - Si Ting Lin, Chih-Hung Han, Chieh-Ling Lee, Po-Chyi Su, Feng-Tsun Chien, Min-Kuan Chang:

Registration of Infrared and Visible Images Using Style Transfer-Based Semantic Segmentation. 2347-2352 - Xian He, Wei Zeng, Ye Wang:

Peransformer: Improving Low-informed Expressive Performance Rendering with Score-aware Discriminator. 2353-2358 - Po-Kai Su, Pei-Rong Jiang, Kai-Xuan Xu, Meng-Lei Su, Jiannher Lin, Hsin-Han Chiang, Hsiao-Chi Li:

Prompt-Based Vertebral Segmentation Using a Generative Ai Approach in OVCF Spinal Radiographs. 2359-2364 - Cheng-Wei Hsu, Ming-Sui Lee:

A Dual-Stream Diffusion Model with Physically-Based Rendering for Single Image Reflection Removal. 2365-2370 - Yudhistira Arditya Pratama, Theophilus Ezra Nugroho Pandin, Yi-Zeng Hsieh:

Dynamic Facial Expression Recognition in the Wild Using Mambastyle Selective SSM and Facial Attention Mechanism. 2371-2375 - Axel Päivänsalo, Ching-Chun Chang, Hanrui Wang, Futa Waseda, Isao Echizen:

Allegory of the Cave: Breakdown of Illusions in Multimodal Perception with Neural Radiance Fields. 2376-2381 - Isack Farady, Alifya Febriana, Chih-Yang Lin:

Overlapped Coffee Beans Detection and Localization Using a Low-Cost 3D Monocular Point Cloud Clustering Method. 2382-2387 - Tsung-Shan Yang, Yun-Cheng Wang, Chengwei Wei, Suya You, C.-C. Jay Kuo:

Interpretable Video-Text Alignment (VTA) for Cross-Modal Retrieval. 2388-2393 - Yuxin He, Hui Deng, Mingyi He, Yuchao Dai:

Sequence Modeling and Generative Model Driven Non-Rigid 3D Reconstruction. 2394-2399 - Akshita Abrol, Ridwan Arefeen, Haotong Yu, Alexi George, Kelvin Zhenghao Li, Zhengkui Wang, Rong Tong:

Robust Audio-Visual Speech Recognition in Noisy Clinical Environments. 2400-2405 - Xin Hui Lor, Chern Hong Lim:

Integrating Visual XAI and LLMs for Interpretable Medical Image Analysis. 2406-2411 - Zhi Hu, Liang Liao, Weisi Lin:

InternVL-VPR: Hierarchical Zero-Shot Visual Place Recognition with VLM-Driven Re-Ranking. 2412-2417 - Phat Nguyen, Ngai-Man Cheung:

Token Compression Meets Compact Vision Transformers: a Survey and Comparative Evaluation for Edge AI. 2418-2423 - Anh-Dung Do, Thanh-Ha Do:

Adapting Vision-Language Models for Information Extraction from Bilingual Medical Invoices. 2424-2429 - Tien Do, Thuyen Tran, Duy-Dinh Le, Thanh Duc Ngo:

Zero-shot Artistic Text Recognition with Multimodal Language Models. 2430-2435 - Linchen Xu, Zhikai Liu, Fan Liang:

Attention Based Deep Reference Frame Enhancement for VVC Inter Prediction. 2436-2441 - Yeoneui Kim, Je-Won Kang:

Neural Implicit Representations for Object-Centric Machine Vision Tasks. 2442-2447 - Jun Kurihara, Heming Sun:

Efficient Adversarial Attack and Training on Learned Image Compression. 2453-2458 - Jui-Chen Luo, Jiann-Jone Chen, Tien-Ying Kuo, Yi-Fan Wu, Kai-Jie Zhang:

Accelerating VVC Inter-Frame Coding: A Lightweight CNN for Fast QTMT Partitioning. 2459-2464 - Muhammad Bilal, Waleed Abdulla, Gary Cheung, Lynette Tippett, Seyed Reza Shahamiri:

Multimodal Speech Analysis for Early Detection of Mild Cognitive Impairment: A Scalable Approach. 2465-2470 - Rong Chen, Stephen Karungaru, Kenji Terada, Linhuang Wang:

Boundary-Enhanced Attention Network for Breast Mass Segmentation. 2471-2476 - Shinji Yamashita, Yuma Kinoshita, Hitoshi Kiya:

Scale and Rotation Estimation of Similarity-Transformed Images via Cross-Correlation Maximization Based on Auxiliary Function Method. 2477-2481 - Kaito Kosaki, Teppei Nakano, Mari Wakabayashi, Tomomi Sato, Tetsuji Ogawa:

Strong Eye Closure Detection in Children with Profound Intellectual and Multiple Disabilities Using Robust Temporal Difference Features. 2482-2487 - Sang NguyenQuang, Cheng-Wei Chen, Xiem HoangVan, Wen-Hsiao Peng:

A Rate-Quality Model for Learned Video Coding. 2488-2493 - Shugo Yamashita, Masaaki Ikehara:

Low-Light RAW Image Enhancement with Additive Parameterization and State Space Model. 2494-2498 - Young-Ho Go, Sung-Hak Lee:

Synthesizing and Restoring Weather-Corrupted Images with Conditional Diffusion Models. 2499-2504 - Muhammad Adi Nugroho, Jinyoung Park, Yeeun Seong, Changick Kim:

Integrating Semantic Knowledge for Enhanced Weakly-Supervised Group Activity Recognition. 2505-2510 - Hiromu Kanauchi, Ryuto Ito, Hiroyasu Yasuda, Masaaki Nagahara, Yuichi Tanaka, Shogo Muramatsu:

Directed Graph Dynamic Mode Decomposition for Nonlinear State-Space Modeling. 2511-2516 - Takumi Nishiyama, Lantian Wei, Tadashi Wadayama:

Digital-Optical Hybrid Computation for Deep Unfolding-Aided MIMO Signal Detection. 2517-2522 - Yuki Nii, Futa Waseda, Ching-Chun Chang, Isao Echizen:

Uncolorable Examples: Preventing Unauthorized AI Colorization via Perception-Aware Chroma-Restrictive Perturbation. 2523-2528 - Ruijie Tao, Zhan Shi, Yidi Jiang, Tianchi Liu, Haizhou Li:

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets. 2529-2534 - Shreyas Gopal, Ashutosh Anshul, Haoyang Li, Yue Heng Yeo, Hexin Liu, Eng Siong Chng:

Explainable Disentanglement on Discrete Speech Representations for Noise-Robust ASR. 2535-2540 - Benita Angela Titalim, Faisal Mehmood, Sakriani Sakti:

Rethinking Robust ASR Strategies: Can Textual in-Context Learning Improve Acoustic Robustness? 2541-2546 - Wen-Chin Huang:

Advancing Speech Quality Assessment Through Scientific Challenges and Open-Source Activities. 2552-2557 - Erica Cooper:

Progress and Challenges in DNN-Based Objective Quality Assessment of Synthesized Speech. 2558-2563 - Huy H. Nguyen, Pride Kavumba, Tomoya Kurosawa, Koki Wataoka:

Foundation Models as Guardrails: LLM-and VLM-Based Approaches to Safety and Alignment. 2564-2569 - Liping Chen, Kong-Aik Lee, Zhen-Hua Ling, Xin Wang, Rohan Kumar Das, Tomoki Toda, Haizhou Li:

Speaker Privacy and Security in the Big Data Era: Protection and Defense Against Deepfake. 2570-2575 - Ryandhimas E. Zezario:

Non-Intrusive Intelligibility Prediction for Hearing Aids: Recent Advances, Trends, and Challenges. 2576-2581 - Yu Tsao:

From Evaluation to Optimization: Neural Speech Assessment for Downstream Applications. 2582-2586 - Yiming Wang, Jiahong Yuan:

Normalization Through Fine-Tuning: Understanding Wav2vec2.0 Embeddings for Phonetic Analysis. 2587-2591 - Bo-Hao Su, Shinji Watanabe, Chi-Chun Lee:

Enabling Internationalization of Affective Speech Technology Using LLMs. 2592-2597 - Eun-Seo Park, Xianghong Liu, Chang-Hee Han:

BiGaitNet: Deep CNN-Based Classification of Parkinson's Disease Gait Abnormalities Using a Smart Insole Robust to Fewer Plantar Sensors. 2598-2599 - Ying-Ren Chien, En-Ting Lin:

Nonlinear System Identification Approach Under Noisy Input Signals and Impulse Observed Noise by Kernel Adaptive Filtering Algorithm. 2600-2601 - Jing-Ming Guo, De-Yu Guu, Yih-Ping Luh, Yi-Chong Zeng:

Retinal Artery-Vein Segmentation via Attention-Guided W-Net and GAN-Based Boundary Refinement. 2602-2603 - Seong-Hyun Jin, Dong-Min Son, Young-Ho Go, Sung-Hak Lee:

Local Contrast Enhancement in LDR Images via Adaptive Distribution of Clipped-histogram Excess. 2604-2605 - Xuping Huang, Akinori Ito:

A Lightweight and Reversible Audio Watermarking Scheme Based on Integer Wavelet Transform. 2606-2607 - Chang-Woo Son, Young-Ho Go, Seung-Hwan Lee, Sung-Hak Lee:

Variance-Driven U-Net Training and Chroma-Scale-Based Multi-Exposure Image Fusion. 2608-2609 - Haoqian Rong, Shaojie Wang, Zining Zhao, Jiawei Zhang:

Joint Design of Low Sidelobe Radar Waveform and Filter with Hardware Platform Verification. 2610-2611

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














