


default search action
20th Interspeech 2019: Graz, Austria
- Gernot Kubin, Zdravko Kacic:

20th Annual Conference of the International Speech Communication Association, Interspeech 2019, Graz, Austria, September 15-19, 2019. ISCA 2019
ISCA Medal 2019 Keynote Speech
- Keiichi Tokuda:

Statistical Approach to Speech Synthesis: Past, Present and Future.
Spoken Language Processing for Children’s Speech
- Fei Wu, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:

Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network. 1-5 - Gary Yeung, Abeer Alwan:

A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of fo in Vowel Perception. 6-10 - Robert Gale, Liu Chen, Jill Dolata, Jan P. H. van Santen, Meysam Asgari:

Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques. 11-15 - Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond

, Steve Renals
:
Ultrasound Tongue Imaging for Diarization and Alignment of Child Speech Therapy Sessions. 16-20 - Anastassia Loukina, Beata Beigman Klebanov, Patrick L. Lange, Yao Qian, Binod Gyawali, Nitin Madnani, Abhinav Misra, Klaus Zechner, Zuowei Wang, John Sabatini

:
Automated Estimation of Oral Reading Fluency During Summer Camp e-Book Reading with MyTurnToRead. 21-25 - Vanessa Lopes, João Magalhães, Sofia Cavaco

:
Sustained Vowel Game: A Computer Therapy Game for Children with Dysphonia. 26-30
Dynamics of Emotional Speech Exchanges in Multimodal Communication
- Anna Esposito

, Terry Amorese
, Marialucia Cuciniello
, Maria Teresa Riviello, Antonietta Maria Esposito
, Alda Troncone, Gennaro Cordasco
:
The Dependability of Voice on Elders' Acceptance of Humanoid Agents. 31-35 - Oliver Niebuhr

, Uffe Schjoedt:
God as Interlocutor - Real or Imaginary? Prosodic Markers of Dialogue Speech and Expected Efficacy in Spoken Prayer. 36-40 - Michelle Cohn

, Georgia Zellou:
Expressiveness Influences Human Vocal Alignment Toward voice-AI. 41-45 - Catherine Lai, Beatrice Alex

, Johanna D. Moore, Leimin Tian, Tatsuro Hori, Gianpiero Francesca:
Detecting Topic-Oriented Speaker Stance in Conversational Speech. 46-50 - Jilt Sebastian, Piero Pierucci:

Fusion Techniques for Utterance-Level Emotion Recognition Combining Speech and Transcripts. 51-55 - Marvin Rajwadi, Cornelius Glackin, Julie A. Wall

, Gérard Chollet, Nigel Cannings:
Explaining Sentiment Classification. 56-60 - Ricardo Kleinlein, Cristina Luna Jiménez

, Juan Manuel Montero
, Zoraida Callejas, Fernando Fernández Martínez
:
Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models. 61-65
End-to-End Speech Recognition
- Ralf Schlüter:

Survey Talk: Modeling in Automatic Speech Recognition: Beyond Hidden Markov Models. - Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues

, Markus Müller, Alex Waibel:
Very Deep Self-Attention Networks for End-to-End Speech Recognition. 66-70 - Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde:

Jasper: An End-to-End Convolutional Neural Acoustic Model. 71-75 - Niko Moritz, Takaaki Hori, Jonathan Le Roux:

Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition. 76-80 - Yonatan Belinkov, Ahmed Ali, James R. Glass:

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition. 81-85
Speech Enhancement: Multi-Channel
- Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

:
Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder. 86-90 - Kristina Tesch

, Robert Rehr, Timo Gerkmann
:
On Nonlinear Spatial Filtering in Multichannel Speech Enhancement. 91-95 - Juan M. Martín-Doñas

, Jens Heitkaemper, Reinhold Haeb-Umbach
, Angel M. Gomez, Antonio M. Peinado:
Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation. 96-100 - Saeed Bagheri, Daniele Giacobello:

Exploiting Multi-Channel Speech Presence Probability in Parametric Multi-Channel Wiener Filter. 101-105 - Masahito Togami, Tatsuya Komatsu:

Variational Bayesian Multi-Channel Speech Dereverberation Under Noisy Environments with Probabilistic Convolutive Transfer Function. 106-110 - Tomohiro Nakatani, Keisuke Kinoshita

:
Simultaneous Denoising and Dereverberation for Low-Latency Applications Using Frame-by-Frame Online Unified Convolutional Beamformer. 111-115
Speech Production: Individual Differences and the Brain
- Cathryn Snyder, Michelle Cohn

, Georgia Zellou:
Individual Variation in Cognitive Processing Style Predicts Differences in Phonetic Imitation of Device and Human Voices. 116-120 - Aravind Illa, Prasanta Kumar Ghosh:

An Investigation on Speaker Specific Articulatory Synthesis with Speaker Independent Articulatory Inversion. 121-125 - Xiaohan Zhang, Chongke Bi, Kiyoshi Honda, Wenhuan Lu, Jianguo Wei

:
Individual Difference of Relative Tongue Size and its Acoustic Effects. 126-130 - Tsukasa Yoshinaga

, Kazunori Nozaki, Shigeo Wada:
Individual Differences of Airflow and Sound Generation in the Vocal Tract of Sibilant /s/. 131-135 - Shashwat Uttam, Yaman Kumar

, Dhruva Sahrawat, Mansi Aggarwal, Rajiv Ratn Shah
, Debanjan Mahata, Amanda Stent:
Hush-Hush Speak: Speech Reconstruction Using Silent Videos. 136-140 - Pramit Saha, Muhammad Abdul-Mageed, Sidney S. Fels

:
SPEAK YOUR MIND! Towards Imagined Speech Recognition with Hierarchical Deep Learning. 141-145
Speech Signal Characterization 1
- Yu-An Chung, Wei-Ning Hsu, Hao Tang, James R. Glass:

An Unsupervised Autoregressive Model for Speech Representation Learning. 146-150 - Feng Huang, Péter Balázs:

Harmonic-Aligned Frame Mask Based on Non-Stationary Gabor Transform with Application to Content-Dependent Speaker Comparison. 151-155 - Gurunath Reddy M., K. Sreenivasa Rao, Partha Pratim Das:

Glottal Closure Instants Detection from Speech Signal by Deep Features Extracted from Raw Speech and Linear Prediction Residual. 156-160 - Santiago Pascual, Mirco Ravanelli

, Joan Serrà, Antonio Bonafonte, Yoshua Bengio:
Learning Problem-Agnostic Speech Representations from Multiple Self-Supervised Tasks. 161-165 - Bhanu Teja Nellore, Sri Harsha Dumpala, Karan Nathwani, Suryakanth V. Gangashetty

:
Excitation Source and Vocal Tract System Based Acoustic Features for Detection of Nasals in Continuous Speech. 166-170 - Aggelina Chatziagapi, Georgios Paraskevopoulos, Dimitris Sgouropoulos, Georgios Pantazopoulos, Malvina Nikandrou, Theodoros Giannakopoulos, Athanasios Katsamanis

, Alexandros Potamianos, Shrikanth Narayanan:
Data Augmentation Using GANs for Speech Emotion Recognition. 171-175
Neural Waveform Generation
- Zvi Kons, Slava Shechtman, Alexander Sorin, Carmel Rabinovitz, Ron Hoory:

High Quality, Lightweight and Adaptable TTS Using LPCNet. 176-180 - Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal:

Towards Achieving Robust Universal Neural Vocoding. 181-185 - Paarth Neekhara

, Chris Donahue, Miller S. Puckette, Shlomo Dubnov
, Julian J. McAuley
:
Expediting TTS Synthesis with Adversarial Vocoding. 186-190 - Ahmed Mustafa, Arijit Biswas, Christian Bergler, Julia Schottenhamml, Andreas K. Maier:

Analysis by Adversarial Synthesis - A Novel Approach for Speech Vocoding. 191-195 - Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing

, Kazuhiro Kobayashi, Tomoki Toda
:
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation. 196-200 - Xiaohai Tian, Eng Siong Chng

, Haizhou Li
:
A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data. 201-205
Attention Mechanism for Speaker State Recognition
- Kyu Jeong Han, Ramon Prieto, Tao Ma:

Survey Talk: When Attention Meets Speech Applications: Speech & Speaker Recognition Perspective. - Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins

, Haishuai Wang, Björn W. Schuller
:
Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition. 206-210 - Jeng-Lin Li, Chi-Chun Lee

:
Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile. 211-215 - Ascensión Gallardo-Antolín

, Juan Manuel Montero
:
A Saliency-Based Attention LSTM Model for Cognitive Load Classification from Speech. 216-220 - Adria Mallol-Ragolta, Ziping Zhao, Lukas Stappen, Nicholas Cummins

, Björn W. Schuller
:
A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews. 221-225
ASR Neural Network Training — 1
- Andrea Carmantini, Peter Bell, Steve Renals

:
Untranscribed Web Audio for Low Resource Speech Recognition. 226-230 - Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer

, Ralf Schlüter
, Hermann Ney:
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention. 231-235 - Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe

:
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition. 236-240 - Zhong Meng, Yashesh Gaur, Jinyu Li

, Yifan Gong:
Speaker Adaptation for Attention-Based End-to-End Speech Recognition. 241-245 - Peidong Wang, Jia Cui, Chao Weng, Dong Yu:

Large Margin Training for Attention Based End-to-End Speech Recognition. 246-250 - Khoi-Nguyen C. Mac, Xiaodong Cui, Wei Zhang, Michael Picheny:

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition. 251-255
Zero-Resource ASR
- Benjamin Milde, Chris Biemann:

SparseSpeech: Unsupervised Acoustic Unit Discovery with Memory-Augmented Sequence Autoencoders. 256-260 - Lucas Ondel, Hari Krishna Vydana, Lukás Burget

, Jan Cernocký
:
Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery. 261-265 - Yosuke Higuchi, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

:
Speaker Adversarial Training of DPGMM-Based Feature Extractor for Zero-Resource Languages. 266-270 - Manasa Prasad, Daan van Esch, Sandy Ritchie, Jonas Fromseier Mortensen:

Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data. 271-275 - Emmanuel Azuh, David Harwath, James R. Glass:

Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio. 276-280 - Siyuan Feng, Tan Lee

:
Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation. 281-285
Sociophonetics
- Shawn L. Nissen, Sharalee Blunck, Anita Dromey, Christopher Dromey:

Listeners' Ability to Identify the Gender of Preadolescent Children in Different Linguistic Contexts. 286-290 - Wiebke Ahlers

, Philipp Meer
:
Sibilant Variation in New Englishes: A Comparative Sociophonetic Study of Trinidadian and American English /s(tr)/-Retraction. 291-295 - Michele Gubian, Jonathan Harrington, Mary Stevens, Florian Schiel, Paul Warren

:
Tracking the New Zealand English NEAR/SQUARE Merger Using Functional Principal Components Analysis. 296-300 - Iona Gessinger

, Bernd Möbius, Bistra Andreeva
, Eran Raveh, Ingmar Steiner:
Phonetic Accommodation in a Wizard-of-Oz Experiment: Intonation and Segments. 301-305 - Oliver Niebuhr

, Jan Michalsky:
PASCAL and DPA: A Pilot Study on Using Prosodic Competence Scores to Predict Communicative Skills for Team Working and Public Speaking. 306-310 - Jan Michalsky, Heike Schoormann, Thomas Schultze:

Towards the Prosody of Persuasion in Competitive Negotiation. The Relationship Between f0 and Negotiation Success in Same Sex Sales Tasks. 311-315
Resources – Annotation – Evaluation
- Jacob Sager, Ravi Shankar, Jacob Reinhold, Archana Venkataraman:

VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English. 316-320 - Jia Xin Koh, Aqilah Mislan, Kevin Khoo, Brian Ang, Wilson Ang, Charmaine Ng, Ying-Ying Tan:

Building the Singapore English National Speech Corpus. 321-325 - Michael Picheny, Zoltán Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon

:
Challenging the Boundaries of Speech Recognition: The MALACH Corpus. 326-330 - Pravin Bhaskar Ramteke, Sujata Supanekar, Pradyoth Hegde

, Hanna Nelson, Venkataraja Aithal, Shashidhar G. Koolagudi:
NITK Kids' Speech Corpus. 331-335 - Ahmed Ali, Salam Khalifa, Nizar Habash

:
Towards Variability Resistant Dialectal Speech Evaluation. 336-340 - Per Fallgren, Zofia Malisz

, Jens Edlund:
How to Annotate 100 Hours in 45 Minutes. 341-345
Speaker Recognition and Diarization
- Mireia Díez, Lukás Burget

, Shuai Wang, Johan Rohdin, Jan Cernocký:
Bayesian HMM Based x-Vector Clustering for Speaker Diarization. 346-350 - Ville Vestman, Kong Aik Lee

, Tomi H. Kinnunen, Takafumi Koshinaka:
Unleashing the Unused Potential of i-Vectors Enabled by GPU Acceleration. 351-355 - Suwon Shon, Najim Dehak

, Douglas A. Reynolds, James R. Glass:
MCE 2018: The 1st Multi-Target Speaker Detection and Identification Challenge Evaluation. 356-360 - Zhifu Gao, Yan Song, Ian McLoughlin

, Pengcheng Li, Yiheng Jiang, Li-Rong Dai:
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System. 361-365 - Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras:

LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization. 366-370 - Joon Son Chung, Bong-Jin Lee, Icksang Han:

Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings. 371-375 - Jiamin Xie, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:

Multi-PLDA Diarization on Children's Speech. 376-380 - Alan McCree, Gregory Sell, Daniel Garcia-Romero:

Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings. 381-385 - Omid Ghahabi, Volker Fischer:

Speaker-Corrupted Embeddings for Online Speaker Diarization. 386-390 - Tae Jin Park, Kyu Jeong Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis G. Georgiou, Shrikanth Narayanan:

Speaker Diarization with Lexical Information. 391-395 - Laurent El Shafey, Hagen Soltau, Izhak Shafran:

Joint Speech Recognition and Speaker Diarization via Sequence Transduction. 396-400 - Sandro Cumani:

Normal Variance-Mean Mixtures for Unsupervised Score Calibration. 401-405 - Hitoshi Yamamoto, Kong Aik Lee

, Koji Okabe, Takafumi Koshinaka:
Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding. 406-410 - Emre Yilmaz, Adem Derinel, Kun Zhou, Henk van den Heuvel, Niko Brummer, Haizhou Li

, David A. van Leeuwen:
Large-Scale Speaker Diarization of Radio Broadcast Archives. 411-415 - Harishchandra Dubey, Abhijeet Sangwan, John H. L. Hansen:

Toeplitz Inverse Covariance Based Robust Speaker Clustering for Naturalistic Audio Streams. 416-420
ASR for Noisy and Far-Field Speech
- György Kovács, László Tóth, Dirk Van Compernolle, Marcus Liwicki:

Examining the Combination of Multi-Band Processing and Channel Dropout for Robust Speech Recognition. 421-425 - Meet H. Soni, Ashish Panda:

Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition. 426-430 - Long Wu, Hangting Chen, Li Wang, Pengyuan Zhang, Yonghong Yan:

Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning. 431-435 - Ji Ming, Danny Crookes:

Full-Sentence Correlation: A Method to Handle Unpredictable Noise for Robust Speech Recognition. 436-440 - Meet H. Soni, Sonal Joshi

, Ashish Panda:
Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions. 441-445 - Shashi Kumar, Shakti P. Rath:

Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition. 446-450 - Marc Delcroix

, Shinji Watanabe
, Tsubasa Ochiai, Keisuke Kinoshita
, Shigeki Karita, Atsunori Ogawa, Tomohiro Nakatani:
End-to-End SpeakerBeam for Single Channel Target Speech Recognition. 451-455 - I-Hung Hsu, Ayush Jaiswal, Premkumar Natarajan:

NIESR: Nuisance Invariant End-to-End Speech Recognition. 456-460 - Takahito Suzuki, Jun Ogata, Takashi Tsunakawa

, Masafumi Nishida, Masafumi Nishimura:
Knowledge Distillation for Throat Microphone Speech Recognition. 461-465 - Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu:

Improved Speaker-Dependent Separation for CHiME-5 Challenge. 466-470 - Peidong Wang, Ke Tan

, DeLiang Wang:
Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling. 471-475 - Peidong Wang, DeLiang Wang:

Enhanced Spectral Features for Distortion-Independent Acoustic Modeling. 476-480 - Paarth Neekhara

, Shehzeen Hussain
, Prakhar Pandey, Shlomo Dubnov
, Julian J. McAuley
, Farinaz Koushanfar
:
Universal Adversarial Perturbations for Speech Recognition Systems. 481-485 - Masakiyo Fujimoto, Hisashi Kawai:

One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features. 486-490 - Bin Liu, Shuai Nie, Shan Liang, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li:

Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition. 491-495
Social Signals Detection and Speaker Traits Analysis
- Zixiaofan Yang, Bingyan Hu, Julia Hirschberg:

Predicting Humor by Learning from Time-Aligned Comments. 496-500 - Yoan Dinkov, Ahmed Ali, Ivan Koychev

, Preslav Nakov
:
Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information. 501-505 - Guozhen An, Rivka Levitan:

Mitigating Gender and L1 Differences to Improve State and Trait Recognition. 506-509 - Felix Weninger, Yang Sun, Junho Park, Daniel Willett, Puming Zhan:

Deep Learning Based Mandarin Accent Identification for Accent Robust ASR. 510-514 - Gábor Gosztolya, László Tóth:

Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data. 515-519 - Hiroki Mori

, Tomohiro Nagata, Yoshiko Arimoto
:
Conversational and Social Laughter Synthesis with WaveNet. 520-523 - Bogdan Ludusan, Petra Wagner:

Laughter Dynamics in Dyadic Conversations. 524-528 - Khiet P. Truong, Jürgen Trouvain

, Michel-Pierre Jansen:
Towards an Annotation Scheme for Complex Laughter in Speech Corpora. 529-533 - Alice Baird, Shahin Amiriparian

, Nicholas Cummins
, Sarah Sturmbauer, Johanna Janson
, Eva-Maria Meßner, Harald Baumeister
, Nicolas Rohleder, Björn W. Schuller
:
Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test. 534-538 - Alice Baird, Eduardo Coutinho

, Julia Hirschberg, Björn W. Schuller
:
Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results. 539-543 - Oliver Niebuhr

, Kerstin Fischer
:
Do not Hesitate! - Unless You Do it Shortly or Nasally: How the Phonetics of Filled Pauses Determine Their Subjective Frequency and Perceived Speaker Performance. 544-548 - Juan Camilo Vásquez-Correa

, Philipp Klumpp, Juan Rafael Orozco-Arroyave
, Elmar Nöth:
Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech. 549-553
Applications of Language Technologies
- Ching-Ting Chang, Shun-Po Chuang, Hung-yi Lee:

Code-Switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation. 554-558 - Moritz Meier, Celeste Mason, Felix Putze, Tanja Schultz

:
Comparative Analysis of Think-Aloud Methods for Everyday Activities in the Context of Cognitive Robotics. 559-563 - Doug Beeferman, William Brannon

, Deb Roy:
RadioTalk: A Large-Scale Corpus of Talk Radio Transcripts. 564-568 - Salima Mdhaffar

, Yannick Estève, Nicolas Hernandez, Antoine Laurent, Richard Dufour, Solen Quiniou:
Qualitative Evaluation of ASR Adaptation in a Lecture Context: Application to the PASTEL Corpus. 569-573 - Federico Marinelli, Alessandra Cervone, Giuliano Tortoreto, Evgeny A. Stepanov

, Giuseppe Di Fabbrizio, Giuseppe Riccardi:
Active Annotation: Bootstrapping Annotation Lexicon and Guidelines for Supervised NLU Learning. 574-578 - Gerardo Roa Dabike

, Jon Barker:
Automatic Lyric Transcription from Karaoke Vocal Tracks: Resources and a Baseline System. 579-583 - Qiang Huang, Thomas Hain

:
Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention. 584-588 - Jazmín Vidal, Luciana Ferrer, Leonardo Brambilla:

EpaDB: A Database for Development of Pronunciation Assessment Systems. 589-593 - Katrin Angerbauer, Heike Adel, Ngoc Thang Vu:

Automatic Compression of Subtitles with Neural Networks and its Effect on User Experience. 594-598 - Hongyin Luo, Mitra Mohtarami, James R. Glass, Karthik Krishnamurthy, Brigitte Richardson:

Integrating Video Retrieval and Moment Detection in a Unified Corpus for Video Question Answering. 599-603
Speech and Audio Characterization and Segmentation
- Sarah E. Gutz, Jun Wang, Yana Yunusova

, Jordan R. Green:
Early Identification of Speech Changes Due to Amyotrophic Lateral Sclerosis Using Machine Classification. 604-608 - Mohamed Ismail Yasar Arafath K, Aurobinda Routray:

Automatic Detection of Breath Using Voice Activity Detection and SVM Classifier with Application on News Reports. 609-613 - Hee-Soo Heo, Jee-weon Jung, Hye-jin Shim, Ha-Jin Yu:

Acoustic Scene Classification Using Teacher-Student Learning with Soft-Labels. 614-618 - Yanping Chen, Hongxia Jin:

Rare Sound Event Detection Using Deep Learning and Data Augmentation. 619-623 - Bidisha Sharma, Haizhou Li

:
A Combination of Model-Based and Feature-Based Strategy for Speech-to-Singing Alignment. 624-628 - Yosi Shrem, Matthew Goldrick

, Joseph Keshet
:
Dr.VOT: Measuring Positive and Negative Voice Onset Time in the Wild. 629-633 - Jun Hui, Yue Wei, Shutao Chen, Richard Hau Yue So:

Effects of Base-Frequency and Spectral Envelope on Deep-Learning Speech Separation and Recognition Models. 634-638 - Nirmesh J. Shah, Hemant A. Patil:

Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion. 639-643 - Ravi Shankar, Archana Venkataraman:

Weakly Supervised Syllable Segmentation by Vowel-Consonant Peak Classification. 644-648 - Lukás Mateju

, Petr Cerva
, Jindrich Zdánský:
An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs. 649-653 - Zhenyu Tang

, John D. Kanu, Kevin Hogan, Dinesh Manocha:
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks. 654-658
Neural Techniques for Voice Conversion and Waveform Generation
- Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou:

Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks. 659-663 - Ju-Chieh Chou, Hung-yi Lee:

One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization. 664-668 - Hui Lu

, Zhiyong Wu, Dongyang Dai, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng:
One-Shot Voice Conversion with Global Speaker Embeddings. 669-673 - Patrick Lumban Tobing

, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda
:
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder. 674-678 - Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo:

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion. 679-683 - Yusuke Kurita, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda

:
Robustness of Statistical Voice Conversion Based on Direct Waveform Modification Against Background Sounds. 684-688 - Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma:

Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks. 689-693 - Lauri Juvela

, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
:
GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram. 694-698 - Ryuichi Yamamoto, Eunwoo Song

, Jae-Min Kim:
Probability Density Distillation with Generative Adversarial Networks for High-Quality Parallel Waveform Generation. 699-703 - Seyed Hamidreza Mohammadi, Taehwan Kim:

One-Shot Voice Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams. 704-708 - Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing

, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda
, Yu Tsao
, Hsin-Min Wang
:
Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion. 709-713 - Songxiang Liu, Yuewen Cao, Xixin Wu, Lifa Sun, Xunying Liu, Helen Meng:

Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams. 714-718 - Li-Wei Chen, Hung-yi Lee, Yu Tsao

:
Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech. 719-723 - Shaojin Ding, Ricardo Gutierrez-Osuna:

Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion. 724-728 - Cory Stephenson, Gokce Keskin, Anil Thomas, Oguz H. Elibol:

Semi-Supervised Voice Conversion with Amortized Variational Inference. 729-733
Model Adaptation for ASR
- Subhadeep Dey, Petr Motlícek

, Trung Bui, Franck Dernoncourt:
Exploiting Semi-Supervised Training Through a Dropout Regularization in End-to-End Speech Recognition. 734-738 - Chanwoo Kim

, Minkyu Shin, Abhinav Garg, Dhananjaya Gowda:
Improved Vocal Tract Length Perturbation for a State-of-the-Art End-to-End Speech Recognition System. 739-743 - Han Zhu

, Li Wang, Pengyuan Zhang, Yonghong Yan:
Multi-Accent Adaptation Based on Gate Mechanism. 744-748 - Pengcheng Guo, Sining Sun, Lei Xie:

Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition. 749-753 - Markus Kitza, Pavel Golik

, Ralf Schlüter
, Hermann Ney:
Cumulative Adaptation for BLSTM Acoustic Models. 754-758 - Xurong Xie, Xunying Liu, Tan Lee

, Lan Wang:
Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features. 759-763 - Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura:

End-to-End Adaptation with Backpropagation Through WFST for On-Device Speech Recognition System. 764-768 - Leda Sari, Samuel Thomas, Mark A. Hasegawa-Johnson:

Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks. 769-773 - Khe Chai Sim, Petr Zadrazil, Françoise Beaufays:

An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models. 774-778 


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID