Hynek Hermansky

Followers

Following

Public Views

John Hyland

Dublin Business School

Joaquim Llisterri

Universitat Autònoma de Barcelona

Anabela Rato

University of Toronto

Mehmet Fatih Amasyali

Yildiz Technical University

Rhys Jones

Swansea University

Emma E Birkett

University of Nottingham

George Christodoulides

Université de Mons

Rachele Sprugnoli

Università Cattolica del Sacro Cuore (Catholic University of the Sacred Heart)

Mohamad Ivan Fanany

University of Indonesia

Guillaume Riflet

Instituto Superior Técnico

Interests

Uploads

Papers by Hynek Hermansky

Segmentation of speech for speaker and language recognition

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

... timer derivative estimation (such as the 10-frame interval applied in language identification... more

Down-sampling speech representation in ASR

6th European Conference on Speech Communication and Technology (Eurospeech 1999)

Features for automatic speech recognition (ASR) are typically sampled at about 100 Hz (10 ms anal... more

Data-driven design of RASTA-like filters

5th European Conference on Speech Communication and Technology (Eurospeech 1997)

We describe use of Linear Discriminant Analysis LDA for data-driven automatic design of RASTA-lik... more We describe use of Linear Discriminant Analysis LDA for data-driven automatic design of RASTA-like lters. The LDA applied to rather long segments of time trajectories of critical-band energies yields FIR lters to be applied to these time trajectories in the feature extraction module. Frequency responses of the rst three discriminant v ectors are in principle consistent with the ad hoc designed RASTA, delta and double-delta lters. On a connected digit task the new features outperform the original RASTA processing.

Download

Qualcomm-ICSI-OGI features for ASR

7th International Conference on Spoken Language Processing (ICSLP 2002)

Our feature extraction module for the Aurora task is based on a combination of a conventional noi... more Our feature extraction module for the Aurora task is based on a combination of a conventional noise supression technique (Wiener filtering) with our temporal processing technigues (linear discriminant RASTA filtering and nonlinear TempoRAl Pattern (TRAP) classifier). We observe better than 58% relative error improvement on the prescribed Aurora Digit Task, a performance level that is somewhat better than the new ETSI Advanced Feature standard. Furthermore, to test generalization of our approach to an independent test set not available during development, we evaluate performance on American English SpeechDatCar digits and show 10.54% relative improvement over the new ETSI standard.

Download

Beyond a single critical-band in TRAP based ASR

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

TRAP based ASR attempts to extract information from rather long (as long as 1 s) and narrow (one ... more

Beyond NYQUIST: towards the recovery of broad-bandwidth speech from narrow-bandwidth speech

4th European Conference on Speech Communication and Technology (Eurospeech 1995)

A new technique is presented which improves thesubjective quality of band-limited speech. The app... more A new technique is presented which improves thesubjective quality of band-limited speech. The approachis based on a linear model of speech production,in which we independently estimate the spectralenvelope and excitation function for a broad-bandwidthspeech signal to reconstruct missing frequency componentsin narrow-bandwidth speech.

Spectral basis functions from discriminant analysis

5th International Conference on Spoken Language Processing (ICSLP 1998)

The work examines Karhunen-Loeve Transform andLinear Discriminant Analysis as means for designing... more The work examines Karhunen-Loeve Transform andLinear Discriminant Analysis as means for designing optimizedspectral bases for the projection of the critical-bandauditory-like spectrum.1. INTRODUCTION1.1. The state-of-artTypical large vocabulary automatic recognition ofspeech (ASR) consists of three main components: featureextraction, pattern classification, and language modeling.The feature extraction attempts to reduce the informationrate of raw speech data by alleviating...

On the importance of components of the modulation spectrum for speaker verification

5th International Conference on Spoken Language Processing (ICSLP 1998)

We provide an analysis of the relative importance ofcomponents of the modulation spectrum for spe... more We provide an analysis of the relative importance ofcomponents of the modulation spectrum for speaker verification.The aim is to remove less relevant components andreduce system sensitivity to acoustic disturbances whileimproving verification accuracy. Spectral components between0.1 Hz and 10 Hz are found to contain the mostuseful speaker information. We discuss this result in thecontext of RASTA processing and cepstral mean subtraction.When

Bark resolution from speech data

7th International Conference on Spoken Language Processing (ICSLP 2002)

This paper discusses the relevance of non-uniform frequency resolution used by current speech ana... more This paper discusses the relevance of non-uniform frequency resolution used by current speech analysis methods like Mel frequency analysis and perceptual linear predictive (PLP) analysis. It is shown that linear discriminant analysis of short-time Fourier spectrum of speech yields spectral basis functions which provide comparatively lower resolution to the high frequency region of spectrum. This is consistent with critical-band resolution and is shown to be caused by the spectral properties of vowel sounds. Further, we show that this non-uniform resolution can be traced to the physiology of speech production mechanism. In ASR experiments, features extracted by the discriminant functions are shown to outperform the conventional features derived by cosine basis functions.

TRAPS - classifiers of temporal patterns

5th International Conference on Spoken Language Processing (ICSLP 1998)

The work proposes a radically different set of featuresfor ASR where TempoRAl Patterns of spectra... more The work proposes a radically different set of featuresfor ASR where TempoRAl Patterns of spectral energies areused in place of the conventional spectral patterns. Theapproach has several inherent advantages, among them robustnessto stationary or slowly varying disturbances.1. INTRODUCTION1.1. Spectral featuresIn 1665 Isaac Newton made the following observation:&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;The filling of a very deepe flaggon with a constant streameof beere or water

Temporal patterns of critical-band spectrum for text-to-speech

6th International Conference on Spoken Language Processing (ICSLP 2000)

The means of the long temporal trajectories of loga-rithmic critical band energies in a vicinity ... more

Novel approaches for one- and two-speaker detection

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

The paper reviews OGI submission for NIST 2002 speaker recognition evaluation. It describes the s... more The paper reviews OGI submission for NIST 2002 speaker recognition evaluation. It describes the systems submitted for oneand two-speaker detection tasks and the post-evaluation improvements. In one-speaker detection system, we present a new design of a data-driven temporal filter. We show that using few broad phonetic categories improves the performance of speaker recognition system. In post evaluation experiments, we show that combinations with complementary features and modeling techniques significantly improve the performance of the GMM-based system. In two-speaker detection system, we present a structured approach to detect speaker in the conversations.

Multi-channel noise reduction using wavelet filter bank

5th European Conference on Speech Communication and Technology (Eurospeech 1997)

Towards handling the acoustic environment in spoken language processing

2nd International Conference on Spoken Language Processing (ICSLP 1992)

Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP)

2nd European Conference on Speech Communication and Technology (Eurospeech 1991)

... EFFECT OF THE COMMUNICATION CHANNEL IN AUDITORY-LIKE ANALYSIS OF SPEECH (RASTA-PLP) Hynek Her... more

Discriminative MLPs in HMM-based recognition of speech in cellular telephony

6th International Conference on Spoken Language Processing (ICSLP 2000)

Deviating from the conventional Hidden Markov Model-Multi-Layer Perceptron (HMM-MLP) hybrid parad... more

Speech variability in the modulation spectral domain - SANOVA technique

6th European Conference on Speech Communication and Technology (Eurospeech 1999)

This paper examines sources of variability in the speech signal using a new technique that is bas... more This paper examines sources of variability in the speech signal using a new technique that is based on a nested spectral analysis of variance (SANOVA). By constructing an ANOVA in the modulation spectral domain, the technique allows a characterization of unwanted variability in the time sequences of logarithmic energy caused by extraneuous sources of variability such as additive noise, convolutional noise, and telephone handset transducer. Very low and moderate to high modulation frequencies are shown to be particularly affected by these sources. Veri cation results for 500 speakers on Switchboard data from the 1998 NIST speaker recognition evaluation are presented to con rm the conclusions. It is shown that a bandpass ltering and down sampling of the time sequences of logarithmic energy, compared to a conventional highpass ltering, leads to a 13% relative reduction of the EER in mismatched conditions.

Analysis of sources of variability in speech

6th European Conference on Speech Communication and Technology (Eurospeech 1999)

... Sachin Kajarekar1, Narendranath Malayath1 and Hynek Hermansky1,2 1Oregon Graduate Institute o... more

Local averaging and differentiating of spectral plane for TRAP-based ASR

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

ABSTRACT Local frequency and time averaging and differentiating op- erators, using three neighbor... more ABSTRACT Local frequency and time averaging and differentiating op- erators, using three neighboring points of critical-band time- frequency plane, are used to process the plane prior to its use in TRAP-based ASR. In that way, five alternative TRAP-based ASR systems (the original one and the time/frequency inte- grated/differentiated ones)are created. We show that the fre- quency differentiating operator improves performance of the TRAP-based ASR. 1. Introduction Unlike features which are based on full short-term spectrum with its short time context, temporal pattern (TRAP) features are based on narrow band spectrum with long time context. By breaking the spectrum into individual critical band and using each critical band independently in the initial stage of the fea- ture extraction, the TRAP-based features can be inherently less sensitive to changes in relative levels of the individual critical bands. Further, by using longer temporal context, all informa- tion about underlying linguistic events, which is spread in time due to coarticulation, may be utilized. Initially, a single time trajectories of critical band spectral densities in each critical band were used as input vectors in the frequency-localized TRAP probabilty estimators (3). Thus, the burden of exploiting the useful information in the tempo- ral pattern and alleviating the irrelevant one was fully left on the estimator. Later, attempts for parametrizing the trajectory vectors were made and the critical band spectral density vec- tors were projected on bases obtained by Principal Comonent Analysis (PCA) (6) or Linear Discriminant Analysis (LDA) (5), with the resulting reduction of the size of the input vector to the frequency-localized probability estimator. Recent studies indicate that information extracted from sev- eral (up to three) neighboring bands improves performance of the TRAP system (7). Since these studies use PCA of the input vector space, it is possible to investigate the resulting projec- tion basis. Such an inspection reveals that the PCA rotation resembles frequency averaging and frequency differentiating of the neighboring bands with the subsequent projection on co- sine transform bases. This observation suggests that a simple pre-processing of a critical-band spectrogram (CRBS) prior to the cosine transformation and the TRAP classification may be beneficial. The current work investigates such modifications of CRBS in TRAP system and evaluates their individual efficiency as well as their effect in conjunction with the original (i.e. un- processed) CRBS.

Band-independent speech-event categories for TRAP based ASR

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

Band-independent categories are investigated for feature es-timation in ASR. These categories rep... more

Segmentation of speech for speaker and language recognition

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

... timer derivative estimation (such as the 10-frame interval applied in language identification... more

Down-sampling speech representation in ASR

6th European Conference on Speech Communication and Technology (Eurospeech 1999)

Features for automatic speech recognition (ASR) are typically sampled at about 100 Hz (10 ms anal... more

Data-driven design of RASTA-like filters

5th European Conference on Speech Communication and Technology (Eurospeech 1997)

Download

Qualcomm-ICSI-OGI features for ASR

7th International Conference on Spoken Language Processing (ICSLP 2002)

Download

Beyond a single critical-band in TRAP based ASR

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

TRAP based ASR attempts to extract information from rather long (as long as 1 s) and narrow (one ... more

Beyond NYQUIST: towards the recovery of broad-bandwidth speech from narrow-bandwidth speech

4th European Conference on Speech Communication and Technology (Eurospeech 1995)

Spectral basis functions from discriminant analysis

5th International Conference on Spoken Language Processing (ICSLP 1998)

On the importance of components of the modulation spectrum for speaker verification

5th International Conference on Spoken Language Processing (ICSLP 1998)

Bark resolution from speech data

7th International Conference on Spoken Language Processing (ICSLP 2002)

TRAPS - classifiers of temporal patterns

5th International Conference on Spoken Language Processing (ICSLP 1998)

Temporal patterns of critical-band spectrum for text-to-speech

6th International Conference on Spoken Language Processing (ICSLP 2000)

The means of the long temporal trajectories of loga-rithmic critical band energies in a vicinity ... more

Novel approaches for one- and two-speaker detection

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

Multi-channel noise reduction using wavelet filter bank

5th European Conference on Speech Communication and Technology (Eurospeech 1997)

Towards handling the acoustic environment in spoken language processing

2nd International Conference on Spoken Language Processing (ICSLP 1992)

Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP)

2nd European Conference on Speech Communication and Technology (Eurospeech 1991)

... EFFECT OF THE COMMUNICATION CHANNEL IN AUDITORY-LIKE ANALYSIS OF SPEECH (RASTA-PLP) Hynek Her... more

Discriminative MLPs in HMM-based recognition of speech in cellular telephony

6th International Conference on Spoken Language Processing (ICSLP 2000)

Deviating from the conventional Hidden Markov Model-Multi-Layer Perceptron (HMM-MLP) hybrid parad... more

Speech variability in the modulation spectral domain - SANOVA technique

6th European Conference on Speech Communication and Technology (Eurospeech 1999)

Analysis of sources of variability in speech

6th European Conference on Speech Communication and Technology (Eurospeech 1999)

... Sachin Kajarekar1, Narendranath Malayath1 and Hynek Hermansky1,2 1Oregon Graduate Institute o... more

Local averaging and differentiating of spectral plane for TRAP-based ASR

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

Band-independent speech-event categories for TRAP based ASR

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

Band-independent categories are investigated for feature es-timation in ASR. These categories rep... more

Hynek Hermansky

Uploads

Papers by Hynek Hermansky

Log In