0% found this document useful (0 votes)
102 views6 pages

تقنيات استخراج الميزات الصوتية

The CMU Pronouncing Dictionary is an open-source dictionary developed by Carnegie Mellon University that provides mappings of English words to their North American pronunciations. It is commonly used for speech recognition and synthesis applications. Some key features extracted from signals in the time domain include mean, variance, standard deviation, kurtosis, and waveforms lengths. Frequency domain features extracted using power spectral density estimation include mean frequency, median frequency, maximum to minimum drop in power density ratio, and signal to noise ratio. These features are used as inputs for machine learning models in speech and audio applications.

Uploaded by

Rowa salman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views6 pages

تقنيات استخراج الميزات الصوتية

The CMU Pronouncing Dictionary is an open-source dictionary developed by Carnegie Mellon University that provides mappings of English words to their North American pronunciations. It is commonly used for speech recognition and synthesis applications. Some key features extracted from signals in the time domain include mean, variance, standard deviation, kurtosis, and waveforms lengths. Frequency domain features extracted using power spectral density estimation include mean frequency, median frequency, maximum to minimum drop in power density ratio, and signal to noise ratio. These features are used as inputs for machine learning models in speech and audio applications.

Uploaded by

Rowa salman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

‫الجامعة التكنولوجية – قسم علوم الحاسوب‬

‫تقرير االمتحان النهائي للفصل الدراسي ( الكورس الثاني)‬


‫‪ ‬لسنة ‪2020-2019‬‬

‫عنوان التقرير‬
‫)‪(Speech recognition‬‬
Speech recognition

Q/ What is the CMU Pronouncing Dictionary? Where can we used in


our subjects. Which is the organization has developed it?
solution:
The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing
dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in
speech recognition research.

CMUdict provides a mapping orthographic/phonetic for English words in their North American
pronunciations. It is commonly used to generate representations for speech recognition (ASR), e.g.
the CMU Sphinx system, and speech synthesis (TTS), e.g. the Festival system. CMUdict can be used
as a training corpus for building statistical grapheme-to-phoneme (g2p) models that will generate
pronunciations for words not yet included in the dictionary. The most recent release is 0.7b; it
contains over 134,000 entries. An interactive lookup version is available.

Applications
 The Unifon converter is based on the CMU Pronouncing Dictionary.
 The Natural Language Toolkit contains an interface to the CMU Pronouncing Dictionary.
 The Carnegie Mellon Logios] tool incorporates the CMU Pronouncing Dictionary.
 PronunDict, a pronunciation dictionary of American English, uses the CMU Pronouncing
Dictionary as its data source. Pronunciation is transcribed in IPA symbols. This dictionary also
supports searching by pronunciation.

Is there is any other dictionaries that have been used for the same purpose s?
Yes, there is, for example (LOGIOS Lexicon Tool).
Speech recognition

Q/ what are the feature extraction in time domain and feature


extraction in frequency?
solution:
Domain specific feature extraction :-

 Failure Mode: depending upon the failure type, certain rations,


differences, DFEs, etc. are extracted for tracking over time

 Operating Mode: specific sensors can be more/less critical in different


operating conditions of machines…
-raw sensors to be used for feature extraction…
- variances under different conditions itself can form basis for further
feature extraction

 Component Function: Features extracted on basis of knowledge


about specific components for which PHM desired…

 Known Relations: Certain relation types can be assumed between


variables of interest…this can affect features calculated for those relations…

Time domain features are extracted from signal:-


so they are easy to implement. The easy implementation is anadvantage of signals but major
disadvantage of time domain features comes from a non-stationary property of thesignal,
changing in statistical properties over time, but time domain features assume the data as a
stationary signal,
Time domain features are calculated from signal amplitude values, so much interference that is
acquired through the
recording come to be another disadvantage of these features…
[Link]
Mean is the most common and easy implemented feature of the time domain. It only finds the mean
of EMG amplitude values over sample length of the signal.

N
mean(μ) = 1 /𝑁 ∑𝑥𝑛
n=1

[Link]
Variance is also most common statistical method for time domain feature extraction.

N
var = 1/ 𝑁 − 1 ∑(x𝑛 − μ) 2
n=1
c. Standard Deviation

N
std(σ) = √ 1/ 𝑁 − 1 ∑(x𝑛 − μ) 2
n=1

e. Kurtosis

Kurtosis is measure of peakness of probability distribution or measure of fourth order cumulative .

4
𝑘𝑢𝑟𝑡 = 1/n ∑ (𝑋𝑛−μ)

σ4

f. Mean Absolute Deviation

The average of the absolute deviations of data points from their mean

N
MAD = 1/ 𝑁 ∑|𝑥𝑛 − 𝑂𝑅𝑇|
n=1
g. AR Coefficients

AR coefficients are popular feature extraction method for biological signals. AR modeling is getting an
equation which fits the signal. AR modeling tries to model the signal by previous data points of the signal

P
𝑥[𝑛] = −∑𝑎𝑘 𝑝 𝑘=1 x[n − k] + e[n]
k=1

h. Waveform Length

Waveform length is a measure of complexity of the EMG signal. It is defined as cumulative length of the
EMG waveform over the time segment.

n-1
𝑊𝐿 = ∑ |𝑥 − 𝑥𝑛|
n=1 n+1

Frequency Domain Features :-

Frequency domain features are extracted widely using Power Spectral Density(PSD). In this work
Periodogram is used in order to estimate Power Spectral Density. 6 frequency domain features are
extracted from PSD and their mathematical definitions are given below.

a. Mean Frequency
Mean frequency is an average frequency which is calculated as sum of product of the EMG power
spectrum and the frequency divided by total sum of the spectrum intensity

b. Median Frequency

Median frequency is a frequency at which the spectrum is divided into two regions with equal
amplitude.

c. Maximum to Minimum Drop in Power Density Ratio


Maximum to Minimum Drop in Power Density Ratio is the ratio of the highest mean power density
value and lowest mean power density value, with a frequency band user defined.

d. Signal to Noise Ratio

Signal to Noise Ratio is a ratio of the signal power and noise power[10]. The signal power and noise
power are estimated separately.

e. Power Spectrum Deformation


The Power Spectrum Deformation ratio is sensitive to changes in spectral symmetry and provides a
indication of spectral deformation.

f. Signal to Motion Artifact Ratio


As stated before motion artifact is low frequency artifact ofEMG signals. They are below 20Hz. The
signal to noise artifact ratio was computed as a ratio of the sum of all power densities for frequencies
below 600Hz and the sum of all power densities that exceed a straight line between the axis origin and
the highest mean power density value, with a frequency above 35Hz .

References

_Cemil Altın , Orhan Er [Link] Bozok


University,Electrical-Electronics Engineering, 66200, Yozgat, Turkey
- [Link]
usphinx/trunk/logios/
- [Link]
- [Link]
cirrusUserTesting=glent_m0&search=feature+extraction+andtime+domain+&title=Special
%3ASearch&go=Go&ns0=1

You might also like