Papers by Abderrahmane Amrouche
Consolidating Product Spectrum and Gammatone filterbank for robust speaker verification under noisy conditions
2015 15th International Conference on Intelligent Systems Design and Applications (ISDA), 2015

This paper provides an overview of low-level features for speaker recognition, with an emphasis o... more This paper provides an overview of low-level features for speaker recognition, with an emphasis on the recently proposed MFCC variant based on asymmetric tapers (MFCC asymmetric from now on); which has proven high noise robustness in the context of speaker verification. Using the TIMIT corpus the performance of the MFCC-asymmetric is compared with: the standard Mel-Frequency Cepstral Coefficients (MFCC) and The Linear Frequency Cepstral Coefficients (LFCC) under clean and noisy environments. To simulate real world conditions, the verification phase was tested with two noises (babble and factory) at different Signal-to-Noise Ratios (SNR) issued from NOISEX-92 database. The experimental results showed that MFCCs-asymmetric tapers (k=4) outperform other features in noisy condition. Finally, we have investigated the impact of consolidating evidences from different features by score level fusion. Preliminary results show promising improvement on verification rate with score fusion. ImprovingthePerformanceofSpeakerVerificationSystemsunderNoisyConditionsusingLowLevelFeaturesandScore LevelFusion 35 ImprovingthePerformanceofSpeakerVerificationSystemsunderNoisyConditionsusingLowLevelFeaturesandScore LevelFusion 37
Algerian Modern Colloquial Arabic Speech Corpus (AMCASC): regional accents recognition within complex socio-linguistic environments
Language Resources and Evaluation, 2016
Regional accents recognition based on i-vectors approach: The case of the Algerian linguistic environment
2015 4th International Conference on Electrical Engineering (ICEE), 2015
Improving Network Echo Cancellation in VoIP Using Packet Loss Concealment
Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication - IPAC '15, 2015

Contribution of prosodic and cepstral features in improvment of a synthesized arabic speaker recognition task performance
2013 Ieee Student Conference on Research and Developement, Dec 1, 2013
An emerging need for biometric Speaker Verification (SV) and Identification (SI) systems is neces... more An emerging need for biometric Speaker Verification (SV) and Identification (SI) systems is necessary for wireless remote access security in goal to be less vulnerable against distortion due to speech coding. This paper presents results on recognition system performed on the decoded speech of the G.729 codec. To show the performance loss due to distortion in the decoding process step, we are oriented to exploit the information contained within the source and the vocal tract resources. For this, SVM-based text-independent speaker classification was designed to use the information that combines the Mel Frequency Cepstral Coefficients (MFCC) features, the Energy, and the Pitch frequency. Experiments were performed over the Arabic spoken digits, the ARADIGIT database. The obtained results show that the best performance of Speaker recognition using G.729 decoded database is obtained by the combination of the prosodic features with an EER equal to 4,22%.

2014 22nd International Conference on Software Telecommunications and Computer Networks, Sep 1, 2014
This paper presents a new method of the Double Talk Detection (DTD) for acoustic echo cancellatio... more This paper presents a new method of the Double Talk Detection (DTD) for acoustic echo cancellation. The main goal is to remove the undesirable acoustic echoes produced by the coupling between the loudspeaker and the microphone of the mobile station. Acoustic Echo Canceller (AEC) based on adaptive filtering is an attractive solution. In this work, DTD using discriminative speech feature extraction from the near-end and the microphone speech signals was performed. The main purpose is to discriminate between these signals for sensing Double Talk (DT) periods. To evaluate the performance we use the NLMS algorithm to update the filter coefficients. Results obtained from the TIMIT database show that the performances of the proposed method are significantly improved, compared to the Normalized Cross Correlation (NCC) and Geigel methods.
Improved variable step-size NLMS adaptive filtering algorithm for acoustic echo cancellation
Digital Signal Processing, 2015
The aim of this study is to perform an Arabic word recognition system, focused to a small vocabul... more The aim of this study is to perform an Arabic word recognition system, focused to a small vocabulary. Various models using neural network approach have been used in ASR. In order to increase the efficiency of the classification task we propose the use of a nonparametric density estimator. Thus, in this paper we present an adaptation scheme for independent speaker Arabic speech recognition based on the General Regression Neural Network (GRNN). In another hand we have also implemented a left-right Hidden Markov Model (DHMM) with five states and relative performances of the two proposed applications are compared to the popular known MLP. Experimental results obtained with large corpora have shown that the use of a nonparametric density estimator with an appropriate smooth factor improves the generalization power of neural network.

An important step in speaker verification is extracting features that best characterize the speak... more An important step in speaker verification is extracting features that best characterize the speaker voice. This paper investigates a front-end processing that aims at improving the performance of speaker verification based on the SVMs classifier, in text independent mode. This approach combines features based on conventional Mel-cepstral Coefficients (MFCCs) and Line Spectral Frequencies (LSFs) to constitute robust multivariate feature vectors. To reduce the high dimensionality required for training these feature vectors, we use a dimension reduction method called principal component analysis (PCA). In order to evaluate the robustness of these systems, different noisy environments have been used. The obtained results using TIMIT database showed that, using the paradigm that combines these spectral cues leads to a significant improvement in verification accuracy, especially with PCA reduction for low signal-to-noise ratio noisy environment.
This paper deals with the use of Automatic Speaker Recognition (ASR) in Local Area Network (LAN),... more This paper deals with the use of Automatic Speaker Recognition (ASR) in Local Area Network (LAN), in the presence of noise. In this work, focused on Distributed Speaker Recognition (DSR), we introduce the client/server architecture, where the client is the front-end of the ETSI Standard Aurora, and the recognition system is located to remote server. For speaker recognition task, achieved in in a text-independent mode, Gaussian Mixture Models (GMM) have been used with the ARADIGIT corpus. Experimental results show that the client server architecture using User Datagram Protocol (UDP) is an appropriate way to realise DSR.
This paper presents a new structure of an acoustic echo suppressor, when acoustic echo cancellati... more This paper presents a new structure of an acoustic echo suppressor, when acoustic echo cancellation in a mobile communication is investigated. In fact, the near end speech is corrupted by the presence of acoustic echo issued from the far-end speaker (double-talk). A classical Acoustic Echo Canceller (AEC) is not sufficient. The performance of classical AEC is improved by Double Talk Detection (DTD) and Noise Reduction (NR). The proposed structure of acoustic echo suppressor presents better performance than that the AEC controlled by DTD.
This paper deals with the effect of transcoded speech over GSM (Global System for Mobile) on Acou... more This paper deals with the effect of transcoded speech over GSM (Global System for Mobile) on Acoustic Echo Cancellation (AEC) system. In order to reduce the unexpected acoustic echo, cancellation techniques became very helpful in mobile communication. Acoustic echo is mainly due to the coupling between the loud-speaker and the microphone of MS (Mobile Station). The AEC system is based on adaptive filtering. In other hand, AMR-WB (Adaptive Multi-Rate Wide Band) speech codec is used to encode and decode the speech. It is standardized in the second generation (2G) and third generation (3G) cellular systems. In our work, the coding speech passed through a transmission channel which is modeled by BSC (Binary Symmetric Channel). The simulation results show the degradation of AEC system performance introduced by the AMR-WB speech codec and transmission channel

Speech recognition systems are gaining increasing importance with the widespread use of mobile an... more Speech recognition systems are gaining increasing importance with the widespread use of mobile and portable devices and other interactive voice response systems. Because of the resource constraints on such devices and the requirements of specific applications, the need to perform speech recognition over a data network becomes inevitable. The requirements of such a system with a human at one end and a machine at the other end are clearly asymmetric. For that, we will investigate, in this paper, the use of the Perceptual Linear Predictive (PLP) features for speaker recognition over Internet Protocol (IP) network. For that, we have implemented client-server architecture. Where the frond-end is located in the client side and the recognition system is located in the server side for speaker recognition in a text-independent mode based on Gaussian Mixture Models (GMM). The ARADIGIT corpus was used in the experiments and results based on 60 speakers were promising.

Fusion strategies for distributed speaker recognition using residual signal based G729 resynthesized speech
With the development of VoIP (Voice over IP) service, there is an emerging need to speech compres... more With the development of VoIP (Voice over IP) service, there is an emerging need to speech compression, particularly for digital speech communication and biometric speaker recognition (SR) systems. This paper presents results issued from Universal Background Gaussian Mixture Model (GMM UBM) based SR system, that is trained and tested on clean and G729 resynthesized speech. To overcome the performance loss due to the G729 codec, residual signal extracted from clean and G729 resynthesized database is used. To get better the performance, we investigated score fusion strategies based on Logistic Regression (LR). The first fusion based on GMM UBM score using LFCC (Linear Frequency Cepstrum Coefficients) and LFCC extracted from LP (Linear Prediction) residual signal. The second used the LFCC extracted from G729 resynthesized speech and its LP residual signal. The best performance is obtained by Logistic Regression (LR) fusion. The correct rate in the first case is 95% based baseline system...

Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker veri... more Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker verification. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. In this work we look into the various models (GMM-UBM and GMM-SVM) and their application to speaker verification. In this paper, features vectors, constituted by the Mel Frequency Cepstral Coefficients (MFCC) extracted from the speech signal are used to train the Gaussian mixture model (GMM) and mean vectors issued from GMM-UBM to train SVM. To fit the data around their average the cepstral mean subtraction (CMS) are applied on the MFCC. For both, GMM-UBM and GMM-SVM systems, 2048-mixture UBM is used. The verification phase was tested with Aurora database at different Signal-to-Noise Ratio (SNR) and under three noisy conditions. The experimental results showed the outperformance of GMM-SVM against GMM-UBM in speaker verification espe...
In this paper subband speech techniques have been proposed for robust speaker verification, where... more In this paper subband speech techniques have been proposed for robust speaker verification, where full-band power spectra are divided into 7-subbands. Then cepstral vectors, which are presented by MFCC, Delta and Delta-Delta coefficients plus energy parameter extracted from TIMIT corpus, of each subband are merged depending on their reliability by using majority vote approach. Specifically, we investigate the performance of speaker verification based on subband approach in noisy conditions using GMM/SVM model. From the results that achieved, we find that, subband processing fusion outperforms traditional wideband techniques in both environments (clean and noisy).
This paper presents an evaluation of speaker verification in mobile communication, where speaker ... more This paper presents an evaluation of speaker verification in mobile communication, where speaker verification (SV) becomes a challenging task for high security purpose. Unfortunately the coupling between the loudspeaker and the microphone of mobile devices produces the acoustic echo of the far-end speaker. Acoustic echo canceller (AEC) must be added for reducing this echo. Furthermore in the double-talk scenario when far-end speech is corrupted by near-end speech the performance of AEC based on adaptive filter are degraded. In this work various measures are taken to demonstrate the impact of AEC with and without double-talk detector (DTD) on the SV task using ARADIGIT corpus.

Robust PCA-GMM-SVM System for Speaker Verification Task
2012 Eighth International Conference on Signal Image Technology and Internet Based Systems, 2012
ABSTRACT This paper presents an automatic speaker verification system based on the hybrid GMM-SVM... more ABSTRACT This paper presents an automatic speaker verification system based on the hybrid GMM-SVM model working in real environment. An important step in speaker verification is extracting features that best characterized the speaker. Mel-Frequency Cepstral Coefficients (MFCC) and their firt and second derivatives are commonly used as acoustic features for speaker verification. To reduce the high dimensionality required for training the feature vectors, we use a dimension reduction method called Principal Component Analysis (PCA) in front-end step. Performance evaluations are conducted using the AURORA database and the robustness of the performed systems was evaluated under different noisy environments. The experimental results show that PCA dimensionality reduction improves significantly the recognition accuracy in speaker verification task, especially in noisy environments.
Automatic speaker recognition for mobile communications using AMR-WB speech coding
2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), 2012
ABSTRACT The aim of this paper is to investigate the influence of Adaptative Multi-Rate Wideband ... more ABSTRACT The aim of this paper is to investigate the influence of Adaptative Multi-Rate Wideband (AMR-WB) speech coding on Distributed Speaker Recognition (DSR). The main goal is to improve speaker recognition performance without resynthesizing the speech waveform in mobile communications. For this purpose, we have implemented a method in order to extract the acoustic features in the compressed domain, directly from encoded bitstream. The obtained results, using ARADIGIT database, show that the proposed approach, with GMM and SVM models, using ISF (Immittance Spectral Frequency) parameters through a noised channel (AWGN and Rayleigh) are promising.
Uploads
Papers by Abderrahmane Amrouche