


default search action
ASRU 2017: Okinawa, Japan
- 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017. IEEE 2017, ISBN 978-1-5090-4788-8

- Emre Yilmaz

, Julien van Hout, Horacio Franco:
Noise-robust exemplar matching for rescoring query-by-example search. 1-7 - Katerina Zmolíková

, Marc Delcroix
, Keisuke Kinoshita
, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani:
Learning speaker representation for neural network based multichannel speaker extraction. 8-15 - Wei-Ning Hsu, Yu Zhang, James R. Glass:

Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. 16-23 - Anjali Menon, Chanwoo Kim

, Umpei Kurokawa, Richard M. Stern
:
Binaural processing for robust recognition of degraded speech. 24-31 - Shoko Araki

, Nobutaka Ono
, Keisuke Kinoshita
, Marc Delcroix
:
Meeting recognition with asynchronous distributed microphone array. 32-39 - Takuya Higuchi, Keisuke Kinoshita

, Marc Delcroix
, Tomohiro Nakatani:
Adversarial training for data-driven speech enhancement without parallel corpus. 40-47 - Julien van Hout, Vikramjit Mitra, Horacio Franco, Chris Bartels, Dimitra Vergyri:

Tackling unseen acoustic conditions in query-by-example search using time and frequency convolution for multilingual deep bottleneck features. 48-54 - Keisuke Nakamura, Randy Gomez:

Improving separation of overlapped speech for meeting conversations using uncalibrated microphone array. 55-62 - Hagen Soltau, Hank Liao, Hasim Sak:

Reducing the computational complexity for whole word models. 63-68 - Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu:

Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence. 69-76 - Matthew Gibson, Gary Cook, Puming Zhan:

Semi-supervised training strategies for deep neural networks. 77-83 - Jeremy Heng Meng Wong, Mark J. F. Gales:

Multi-task ensembles with teacher-student training. 84-90 - Emre Yilmaz

, Mitchell McLaren, Henk van den Heuvel, David A. van Leeuwen:
Language diarization for semi-supervised bilingual acoustic model training. 91-96 - Xie Chen, X. Liu, Anton Ragni, Y. Wang, Mark J. F. Gales:

Future word contexts in neural network language models. 97-103 - Qi Liu

, Yanmin Qian, Kai Yu:
Future vector enhanced LSTM language model for LVCSR. 104-110 - Jinyu Li

, Guoli Ye, Rui Zhao, Jasha Droppo
, Yifan Gong:
Acoustic-to-word model without OOV. 111-117 - Timo Lohrenz

, Tim Fingscheidt
:
Turbo fusion of magnitude and phase information for DNN-based phoneme recognition. 118-125 - Takashi Masuko:

Computational cost reduction of long short-term memory based on simultaneous compression of input and hidden state. 126-133 - Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

:
Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks. 134-140 - Ahmed Ali, Preslav Nakov

, Peter Bell, Steve Renals
:
WERD: Using social text spelling variants for evaluating dialectal speech recognition. 141-148 - Peter Smit, Siva Reddy Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo:

Character-based units for unlimited vocabulary continuous speech recognition. 149-156 - Jian Kang, Wei-Qiang Zhang, Jia Liu:

Gated convolutional networks based hybrid acoustic models for low resource speech recognition. 157-164 - Shankar Kumar, Michael Nirschl, Daniel Niels Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix X. Yu:

Lattice rescoring strategies for long short term memory language models in speech recognition. 165-172 - Zhongdi Qu, Parisa Haghani, Eugene Weinstein, Pedro J. Moreno:

Syllable-based acoustic modeling with CTC-SMBR-LSTM. 173-177 - Adnan Haider, Philip C. Woodland:

Sequence training of DNN acoustic models with natural gradient. 178-184 - Karan Nathwani, Emmanuel Vincent, Irina Illina:

Consistent DNN uncertainty training and decoding for robust ASR. 185-192 - Kanishka Rao, Hasim Sak, Rohit Prabhavalkar

:
Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer. 193-199 - Lahiru Samarakoon, Brian Mak

:
Unsupervised adaptation of student DNNS learned from teacher RNNS for improved ASR performance. 200-205 - Eric Battenberg, Jitong Chen

, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, Anuroop Sriram
, Zhenyao Zhu:
Exploring neural transducers for end-to-end speech recognition. 206-213 - Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li

, Yifan Gong:
Unsupervised adaptation with domain separation networks for robust speech recognition. 214-221 - Sheng Li

, Xugang Lu, Peng Shen, Ryoichi Takashima, Tatsuya Kawahara
, Hisashi Kawai:
Incremental training and constructing the very deep convolutional residual network acoustic models. 222-227 - David Rybach

, Michael Riley, Johan Schalkwyk:
On lattice generation for large vocabulary speech recognition. 228-235 - Joanna Rownicka, Steve Renals

, Peter Bell:
Simplifying very deep convolutional neural network architectures for robust speech recognition. 236-243 - Gakuto Kurata, Bhuvana Ramabhadran, George Saon

, Abhinav Sethy:
Language modeling with highway LSTM. 244-251 - Ken'ichi Kumatani, Sankaran Panchapagesan, Minhua Wu, Minjae Kim, Nikko Strom, Gautam Tiwari, Arindam Mandal:

Direct modeling of raw audio with DNNS for wake word detection. 252-257 - Khe Chai Sim, Arun Narayanan, Tom Bagby, Tara N. Sainath, Michiel Bacchiani:

Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow. 258-264 - Shinji Watanabe

, Takaaki Hori, John R. Hershey:
Language independent end-to-end architecture for joint language identification and speech recognition. 265-271 - Assaf Hurwitz Michaely, Xuedong Zhang, Gabor Simko, Carolina Parada, Petar S. Aleksic:

Keyword spotting for Google assistant using contextual speech recognition. 272-278 - Pegah Ghahremani, Vimal Manohar, Hossein Hadian

, Daniel Povey, Sanjeev Khudanpur:
Investigation of transfer learning for ASR using LF-MMI trained neural networks. 279-286 - Takaaki Hori, Shinji Watanabe

, John R. Hershey:
Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition. 287-293 - Bin Wang, Zhijian Ou:

Language modeling with neural trans-dimensional random fields. 294-300 - Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:

Listening while speaking: Speech chain by deep learning. 301-308 - Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:

Attention-based Wav2Text with feature transfer learning. 309-315 - Ahmed Ali, Stephan Vogel, Steve Renals

:
Speech recognition challenge in the wild: Arabic MGB-3. 316-322 - Ewan Dunbar, Xuan-Nga Cao, Juan Benjumea, Julien Karadayi, Mathieu Bernard

, Laurent Besacier, Xavier Anguera, Emmanuel Dupoux:
The zero resource speech challenge 2017. 323-330 - Kei Sawada, Keiichi Tokuda, Simon King, Alan W. Black:

The blizzard machine learning challenge 2017. 331-337 - Peter Smit, Siva Reddy Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo:

Aalto system for the 2017 Arabic multi-genre broadcast challenge. 338-345 - Vimal Manohar, Daniel Povey, Sanjeev Khudanpur:

JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning. 346-352 - Maryam Najafian, Wei-Ning Hsu, Ahmed Ali, James R. Glass:

Automatic speech recognition of Arabic multi-genre broadcast media. 353-359 - Ahmet Emin Bulut, Qian Zhang, Chunlei Zhang, Fahimeh Bahmaninezhad, John H. L. Hansen:

UTD-CRSS submission for MGB-3 Arabic dialect identification: Front-end and back-end advancements on broadcast speech. 360-367 - Karel Veselý, Murali Karthick Baskar, Mireia Díez

, Karel Benes
:
MGB-3 but system: Low-resource ASR on Egyptian YouTube data. 368-373 - Suwon Shon, Ahmed Ali, James R. Glass:

MIT-QCRI Arabic dialect identification system for the 2017 multi-genre broadcast challenge. 374-380 - Shun-Po Chuang, Chia-Hung Wan, Pang-Chi Huang, Chi-Yu Yang, Hung-yi Lee:

Seeing and hearing too: Audio representation for video captioning. 381-388 - Bowen Shi, Karen Livescu

:
Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition. 389-396 - Andrey Malinin, Kate M. Knill, Mark J. F. Gales:

A hierarchical attention based model for off-topic spontaneous spoken response detection. 397-403 - Youssef Oualil, Dietrich Klakow, György Szaszák, Ajay Srinivasamurthy, Hartmut Helmke, Petr Motlícek

:
A context-aware speech recognition and understanding system for air traffic control domain. 404-408 - Tuka Alhanai

, Rhoda Au, James R. Glass:
Spoken language biomarkers for detecting cognitive impairment. 409-416 - Markus Müller, Sebastian Stüker, Alex Waibel:

DBLSTM based multilingual articulatory feature extraction for language documentation. 417-423 - Kenneth Leidal, David Harwath, James R. Glass:

Learning modality-invariant representations for speech and images. 424-429 - Chiori Hori, Takaaki Hori, Tim K. Marks, John R. Hershey:

Early and late integration of audio features for automatic video description. 430-436 - Zhuo Chen, Jinyu Li

, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong:
Cracking the cocktail party problem by multi-beam deep attractor network. 437-444 - Hoon Chung, Yun-Kyung Lee, Jeon Gue Park:

Ground truth estimation of spoken english fluency score using decorrelation penalized low-rank matrix factorization. 445-449 - Salil Deena, Raymond W. M. Ng, Pranava Swaroop Madhyastha

, Lucia Specia, Thomas Hain
:
Exploring the use of acoustic embeddings in neural machine translation. 450-457 - Marcely Zanon Boito, Alexandre Berard, Aline Villavicencio

, Laurent Besacier:
Unwritten languages demand attention too! Word discovery with encoder-decoder models. 458-465 - Tien-Hong Lo, Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang

, Berlin Chen:
Neural relevance-aware query modeling for spoken document retrieval. 466-473 - Yanzhang He, Rohit Prabhavalkar

, Kanishka Rao, Wei Li, Anton Bakhtin, Ian McGraw:
Streaming small-footprint keyword spotting using sequence-to-sequence models. 474-481 - Bing Liu, Ian R. Lane:

Iterative policy learning in end-to-end trainable task-oriented neural dialog models. 482-489 - Miroslav Vodolán, Filip Jurcícek:

Denotation extraction for interactive learning in dialogue systems. 490-496 - Pin-Jung Chen, I-Hung Hsu, Yi Yao Huang, Hung-yi Lee:

Mitigating the impact of speech recognition errors on chatbot using sequence-to-sequence model. 497-503 - Titouan Parcollet, Mohamed Morchid, Georges Linarès:

Deep quaternion neural networks for spoken language understanding. 504-511 - Imran A. Sheikh

, Dominique Fohr, Irina Illina:
Topic segmentation in ASR transcripts using bidirectional RNNS for change detection. 512-518 - Komei Sugiura, Hisashi Kawai:

Grounded language understanding for manipulation instructions using GAN-based classification. 519-524 - Emiru Tsunoo, Ondrej Klejch, Peter Bell, Steve Renals

:
Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features. 525-532 - Zih-Wei Lin, Tzu-Wei Sung, Hung-yi Lee, Lin-Shan Lee:

Personalized word representations carrying personalized semantics learned from social network posts. 533-540 - Young-Bum Kim, Sungjin Lee, Ruhi Sarikaya:

Speaker-sensitive dual memory networks for multi-turn slot tagging. 541-546 - Young-Bum Kim, Sungjin Lee, Karl Stratos:

ONENET: Joint domain, intent, slot prediction for spoken language understanding. 547-553 - Po-Chun Chen, Ta-Chung Chi, Shang-Yu Su, Yun-Nung Chen:

Dynamic time-aware attention to speaker roles and contexts for spoken language understanding. 554-560 - Abhinav Rastogi, Dilek Hakkani-Tür

, Larry P. Heck:
Scalable multi-domain dialogue state tracking. 561-568 - Yao Qian, Rutuja Ubale, Vikram Ramanarayanan, Patrick L. Lange, David Suendermann-Oeft, Keelan Evanini, Eugene Tsuprun:

Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system. 569-576 - Ning Gao, Gregory Sell, Douglas W. Oard

, Mark Dredze
:
Leveraging side information for speaker identification with the Enron conversational telephone speech collection. 577-583 - Chunlei Zhang, Kazuhito Koishida:

End-to-end text-independent speaker verification with flexibility in utterance duration. 584-590 - Lea Schonherr

, Steffen Zeiler, Dorothea Kolossa
:
Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription. 591-598 - Jen-Tzung Chien

, Kang-Ting Peng:
Adversarial manifold learning for speaker recognition. 599-605 - Yao Qian, Keelan Evanini, Patrick L. Lange, Robert A. Pugh, Rutuja Ubale, Frank K. Soong:

Improving native language (L1) identifation with better VAD and TDNN trained separately on native and non-native English corpora. 606-613 - Ziqiang Shi, Liu Liu, Mengjiao Wang, Rujie Liu:

Multi-view (Joint) probability linear discrimination analysis for J-vector based text dependent speaker verification. 614-620 - Aditya Siddhant, Preethi Jyothi, Sriram Ganapathy:

Leveraging native language speech for accent identification using deep Siamese networks. 621-628 - Yi Liu, Liang He

, Yao Tian, Zhuzi Chen, Jia Liu, Michael T. Johnson:
Comparison of multiple features and modeling methods for text-dependent speaker verification. 629-636 - Rachel Rakov, Andrew Rosenberg:

Investigating native and non-native English classification and transfer effects using Legendre polynomial coefficient clustering. 637-643 - Pallavi Baljekar, Sai Krishna Rallabandi, Alan W. Black:

The CMU entry to blizzard machine learning challenge. 644-649 - Ya-Jun Hu, Li-Juan Liu, Chuang Ding, Zhen-Hua Ling, Li-Rong Dai:

The USTC system for blizzard machine learning challenge 2017-ES2. 650-656 - Li-Juan Liu, Chuang Ding, Ya-Jun Hu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou, Si Wei:

The iFLYTEK system for blizzard machine learning challenge 2017-ES1. 657-664 - Axel H. Ng, Kyle Gorman, Richard Sproat:

Minimally supervised written-to-spoken text normalization. 665-670 - Eunwoo Song

, Frank K. Soong, Hong-Goo Kang:
Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systems. 671-676 - Berrak Sisman

, Haizhou Li
, Kay Chen Tan
:
Sparse representation of phonetic features for voice conversion with and without parallel data. 677-684 - Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li

:
Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework. 685-691 - Kévin Vythelingum, Yannick Estève, Olivier Rosec:

Error detection of grapheme-to-phoneme conversion in text-to-speech synthesis using speech signal and lexical context. 692-697 - Takuma Okamoto, Kentaro Tachibana, Tomoki Toda

, Yoshinori Shiga, Hisashi Kawai:
Subband wavenet with overlapped single-sideband filterbanks. 698-704 - Moquan Wan, Gilles Degottex, Mark J. F. Gales:

Integrated speaker-adaptive speech synthesis. 705-711 - Tomoki Hayashi, Akira Tamamori, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda

:
An investigation of multi-speaker training for wavenet vocoder. 712-718 - Herman Kamper

, Karen Livescu
, Sharon Goldwater:
An embedded segmental K-means model for unsupervised segmentation and clustering of speech. 719-726 - Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li

:
Multilingual bottle-neck feature learning from untranscribed speech. 727-733 - Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li

:
Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation. 734-739 - Michael Heck, Sakriani Sakti, Satoshi Nakamura:

Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017. 740-746 - Hayato Shibata, Taku Kato, Takahiro Shinozaki, Shinji Watanabe:

Composite embedding systems for ZeroSpeech2017 Track1. 747-753 - T. K. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy:

Deep learning methods for unsupervised acoustic modeling - Leap submission to ZeroSpeech challenge 2017. 754-761 - T. K. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy, V. Susheela Devi:

Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions. 762-768

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














