Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2000, The Journal of the Acoustical Society of America
AI
This paper presents a novel approach for speaker-independent recognition of isolated words utilizing a new distance measure and nearest-neighbor decision rule. The study highlights the challenges posed by individual speaker characteristics on recognition accuracy and demonstrates through experiments that employing multiple reference patterns significantly improves recognition rates. Findings indicate that approximately 10 to 14 reference utterances are necessary to achieve effective recognition in varied speaker conditions, offering insights into optimizing speech recognition systems.
2005
Research in automatic speech and speaker recog- nition has now spanned five decades. This paper sur- veys the major themes and advances made in the past fifty years of research so as to provide a tech- nological perspective and an appreciation of the fun- damental progress that has been accomplished in this important area of speech communication. Although many techniques have been developed, many chal- lenges have yet to be overcome before we can achieve the ultimate goal of creating machines that can com- municate naturally with people. Such a machine needs to be able to deliver a satisfactory performance under a broad range of operating conditions. A much greater understanding of the human speech process is required before automatic speech and speaker recog- nition systems can approach human performance.
IEEE Signal Processing Magazine, 1996
he future commercialization of speaker-and speech-recognition technology is impeded by the large degradation in system performance due to environmental differences between training and testing conditions. This is known as the "mismatched condition." Studies have shown [l] that most contemporary systems achieve good recognition performance if the conditions during training are similar to those during operation (matched conditions). Frequently, mismatched conditions axe present in which the performance is dramatically degraded as compared to the ideal matched conditions. A common example of this mismatch is when training is done on clean speech and testing is performed on noise-or channel-corrupted speech. Robust speech techniques [2] attempt to maintain the performance of a speech processing system under such diverse conditions of operation.
This paper summarizes my 40 years of research on speech and speaker recognition, focusing on selected topics that I have investigated at NTT Laboratories, Bell Laboratories and Tokyo Institute of Technology with my colleagues and students. These topics include: the importance of spectral dynamics in speech perception; speaker recognition methods using statistical features, cepstral features, and HMM/GMM; text-prompted speaker recognition; speech recognition using dynamic features; Japanese LVCSR; robust speech recognition; spontaneous speech corpus construction and analysis; spontaneous speech recognition; automatic speech summarization; and WFST-based decoder development and its applications.
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1986
Underlying of speech data refers the speaker features which are useful in speech recognition, speech processing, speech coding, and speech clustering. We described a brief of the area of speaker recognition, speech applications, and their underlying techniques. The review of automatic speech recognition (ASR) will discuss some of the positive and negative aspects of speaker recognition technologies and also outline the potential trends in research, development and applications.
Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.
2017
Recognizing the speaker can simplify the task of translating speech in systems that have been trained on specific person's voices or it can be used to authenticate or verify the identity of a speaker as part of a security process. This work discusses the Implementation of an Enhanced Speaker Recognition system using MFCC and LBG Algorithm. MFCC has been used extensively for purposes of Speaker Recognition. This work has augmented the existing work by using Vector Quantization and Classification using the Linde Buzo Gray Algorithm. A complete test system has been developed in MATLAB which can be used for real time testing as it can take inputs directly from the Microphone. Therefore, the design can be translated into a Hardware having the necessary real time processing Prerequisites. The system has been tested using the VID TIMIT Database and using the Performance metrics of False Acceptance Rate (FAR), True Acceptance Rate (TAR) and False Rejection Rate(FRR). The system has been...
2014
Speech recognition is a natural means of interaction for a human with a smart assistive environment. In order for this interaction to be effective, such a system should attain a high recognition rate even under adverse conditions. In Speech Recognition speech signals are automatically converted into the corresponding sequence of words in text. When the training and testing conditions are not similar, statistical speech recognition algorithms suffer from severe degradation in recognition accuracy. So we depend on intelligent and recognizable sounds for common communications. In this paper, we first give a brief overview of Speech Recognition and then describe some feature extraction and classifier technique. We have compared MFCC, LPC and PLP feature extraction techniques. We efficiently tested the performance of MFCC is more efficient and accurate then LPC and PLP feature extraction technique in voice recognition and thus more suitable for practical applications.
Inżynieria Bezpieczeństwa Obiektów Antropogenicznych, 2023
The current reality is saturated with intelligent telecommunications solutions, and automatic speaker recognition systems are an integral part of many of them. They are widely used in sectors such as banking, telecommunications and forensics. The ease of performing automatic analysis and efficient extraction of the distinctive characteristics of the human voice makes it possible to identify, verify, as well as authorize the speaker under investigation. Currently, the vast majority of solutions in the field of speaker recognition systems are based on the distinctive features resulting from the structure of the speaker's vocal tract (laryngeal sound analysis), called physical features of the voice. Despite the high efficiency of such systems -oscillating at more than 95% -their further development is already very difficult, due to the fact that the possibilities of distinctive physical features have been exhausted. Further opportunities to increase the effectiveness of ASR systems based on physical features appear after additional consideration of the behavioral features of the speech signal in the system, which is the subject of this article. This article was funded by the Military University of Technology as part of the UGB 866 project.
International Journal of Engineering and Technology Innovation, 2017
Speech recognition is about what is being said, irrespective of who is saying. Speech recognition is a growing field. Major progress is taking place on the technology of automatic speech recognition (ASR). Still, there are lots of barriers in this field in terms of recognition rate, background noise, speaker variability, speaking rate, accent etc. Speech recognition rate mainly depends on the selection of features and feature extraction methods. This paper outlines the feature extraction techniques for speaker dependent speech recognition for isolated words. A brief survey of different feature extraction techniques like Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding Coefficients (LPCC), Perceptual Linear Prediction (PLP), Relative Spectra Perceptual linear Predictive (RASTA-PLP) analysis are presented and evaluation is done. Speech recognition has various applications from daily use to commercial use. We have made a speaker dependent system and this system can ...
… , Department of computer science, University of …, 2003
Front-end or feature extractor is the first component in an automatic speaker recognition system. Feature extraction transforms the raw speech signal into a compact but effective representation that is more stable and discriminative than the original signal. Since the front-end is the first component in the chain, the quality of the later components (speaker modeling and pattern matching) is strongly determined by the quality of the front-end. In other words, classification can be at most as accurate as the features. Several feature extraction methods have been proposed, and successfully exploited in the speaker recognition task. However, almost exclusively, the methods are adopted directly from the speech recognition task. This is somewhat ironical, considering the opposite nature of the two tasks. In speech recognition, speaker variability is one of the major error sources, whereas in speaker recognition it is the information that we wish to extract. The mel-frequency cepstral coefficients (MFCC) is the most evident example of a feature set that is extensively used in speaker recognition, but originally developed for speech recognition purposes. When MFCC front-end is used in speaker recognition system, one makes an implicit assumption that the human hearing meachanism is the optimal speaker recognizer. However, this has not been confirmed, and in fact opposite results exist. Although several methods adopted from speech recognition have shown to work well in practise, they are often used as "black boxes" with fixed parameters. It is not understood what kind of information the features capture from the speech signal. Understanding the features at some level requires experience from specific areas such as speech physiology, acoustic phonetics, digital signal processing and statistical pattern recognition. According to the author's general impression of literature, it seems more and more that currently, at the best we are guessing what is the code in the signal that carries our individuality. This thesis has two main purposes. On the one hand, we attempt to see the feature extraction as a whole, starting from understanding the speech production process, what is known about speaker individuality, and then going i
International Journal of Engineering Research and Technology (IJERT)`, 2016
https://www.ijert.org/a-study-of-various-speech-features-and-classifiers-used-in-speaker-identification https://www.ijert.org/research/a-study-of-various-speech-features-and-classifiers-used-in-speaker-identification-IJERTV5IS020637.pdf Speech processing consists of analysis/synthesis, recognition & coding of speech signal. The recognition field further branched to Speech recognition, Speaker recognition and speaker identification. Speaker identification system is used to identify a speaker among many speakers. To have a good identification rate is a prerequisite for any Speaker identification system which can be achieved by making an optimal choice among the available techniques. In this paper, different speech features & extraction techniques such as MFCC, LPCC, LPC, GLFCC, PLPC etc and different features classification models such as VQ, GMM, DTW, HMM and ANN for speaker identification system have been discussed. Keywords-Linear Predictive Cepstral Coefficients (LPCC), Mel Frequency Cepstral Coefficients (MFCC), Gaussian Mixture Model (GMM), Vector Quantization (VQ), Hidden Markov Model (HMM), Artificial Neural Network (ANN)
IEEE Transactions on Audio, Speech and Language Processing, 2000
The user has requested enhancement of the downloaded file.
This paper discusses in detail different configurations for comparing the performances of different combinations of commonly used techniques for speech recognition and speaker identification. The Feature extraction techniques considered were MFCC, LPC and Autocorrelation. The feature matching techniques considered were VQ, HMM, DTW and ANN.
Firstly, I would like to express acknowledgement to Prof. Nikos Fakotakis, who served as a supervisor of my Ph.D. study. His comprehensive support I enjoyed from the very first day of my work at the Wire Communications Laboratory. It was his countenance, which made possible the successful completion of my study. I would like to express gratitude to Prof. George Kokkinakis, whose profound analysis of my prospective submissions helped me to improve both the presentation style and overall quality of the manuscripts. Subsequently, I would like to thank to: Assoc.Prof. John Mourjopoulos who directed my attention to the psychoacoustic aspects of speech perception, Ass.Prof. Evangelos Dermatas who inspired my interest to the recurrent neural networks, and to Prof. Michael Vrahatis who initiated me in the evolutionary computation techniques. Further, I would like to express thanks to: Dr. Anastasious Tsopanoglou, with whom I had regular discussions during the first year of my study and to Dr. Ilyas Potamitis, whose insights helped me to avoid many snares during all these years. In addition, I would like to thank to all colleagues who contributed to the comfortable collaborative atmosphere at the Wire Communications Laboratory. Finally, I would like to express acknowledgement to the State Scholarship Foundation of Greece (IKY), which financially supported my Ph.D. study during the years 2002/2003/2004/2005. The IKY scholarship gave me the placidity to focus on my study, which I enjoyed very much. I highly appreciate this support. v Contents: Dedication ……………………………………………………………………….. iv Acknowledgements ………………………………………………………………. v Contents ………………………………………………………………………….. vi List of figures …………………………………………………………………….. x List of tables ……………………………………………………………………… xiv List of abbreviations ……………………………………………………………… xv Notations and operations ………………………………………………………… xviii PART I. INTRODUCTION TO THE SPEAKER RECOGNITION TECHNOLOGY AND OVERVIEW OF THE STATE-OF-ART
2017
Speech is an intuitive interface for man machine interaction. Minimizing word error rate is a unique challenge to develop Automatic Speech Recognition (ASR) system. Performance of this system is far from perfect. Acoustic model and language models are fundamentals to build robust ASR engine. This paper presents a stochastic procedure for developing phoneme and word level acoustic models. Acoustic features estimated by Mel Frequency Cepstral Coefficients (MFCC) with 35% of overlapping of frames for every 25 milliseconds of a signal. The paper compares and highlights the word and phoneme level acoustic model performances for Kannada language vocabulary. The performance of the system is recorded for different vocabulary sizes, and word error rate (WER) computed for phoneme and word acoustic models. The system presents accuracy of 94.78046% and 97.6% for word and phoneme acoustic models respectively for the vocabulary 90 words. In addition, 98.08% of recognition rate for the vocabulary ...
IJEER , 2022
This research article presented and focused on recognizing speakers through multi-speaker speeches. The participation of several speakers includes every conference, talk or discussion. This type of talk has different problems as well as stages of processing. Challenges include the unique impurity of the surroundings, the involvement of speakers, speaker distance, microphone equipment etc. In addition to addressing these hurdles in real time, there are also problems in the treatment of the multi-speaker speech. Identifying speech segments, separating the speaking segments, constructing clusters of similar segments and finally recognizing the speaker using these segments are the common sequential operations in the context of multi-speaker speech recognition. All linked phases of speech recognition processes are discussed with relevant methodologies in this article. This entire article will examine the common metrics, methods and conduct. This paper examined the algorithm of speech recognition system at different stages. The voice recognition systems are built through many phases such as voice filter, speaker segmentation, speaker idolization and the recognition of the speaker by 20 speakers.
IEEE International Conference on Acoustics Speech and Signal Processing, 2002
Coping with inter-speaker variability (i.e., differences in the vocal tract characteristics of speakers) is still a major challenge for Automatic Speech Recognizers. In this paper, we discuss a method that compensates for differences in speaker characteristics. In particular, we demonstrate that when continuous density hidden Markov model based system is used as the back-end , a Knowledge-Based Front End (KBFE) can outperform the traditional Mel-Frequency Cepstral Coefficients (MFCCs), particularly when there is a mismatch in the gender and ages of the subjects used to train and test the recognizer.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.