0% found this document useful (0 votes)
78 views2 pages

Myanmar Speech to Text System

Uploaded by

MyintMoe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views2 pages

Myanmar Speech to Text System

Uploaded by

MyintMoe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

International Journal of Science and Engineering Applications

Volume 7–Issue 08,175-176, 2018, ISSN:-2319–7560

Speech to Text Conversion System for Myanmar


Alphabet

Zaw Win Aung


Technological University (Loikaw)
Loikaw, Myanmar

Abstract: This paper is aimed to implement the speech to text conversion system for Myanmar alphabet. The Myanmar alphabet
consists of 33 characters from ‘ka’ to ‘ah’. The proposed system is software architecture which allows the user to speak against the
computer in Myanmar language and the corresponding character is printed on the screen in the Microsoft Office Word Document
Format. The system is emphasized on Speaker Independent Isolated Word Recognition System. The proposed system directly acquires
and converts speech to text. This system contains two main modules: feature extraction and feature matching. Mel Frequency
Cepstrum Coefficients (MFCC) is applied for feature extraction which extracts a small amount of data from the voice signal that can
later be used to represent each character. Feature matching involves the actual procedure to identify the unknown character by
comparing extracted features from the voice inputs of a set of known characters. In this system, Vector Quantization (VQ) approach
using Linde, Buzo and Gray (LBG) clustering algorithm, which reduces the amount of data and complexity, is applied for feature
matching. To implement this system MATLAB programming language is used.

Keywords: Speech to text; Myanmar alphabet; isolated word recognition; Myanmar character; Myanmar language

1. INTRODUCTION 30 Hindi words. Training data was collected from eight


There is a widespread need for transcription services speakers. The developer reported the accuracy of 94.63%.
converting audio files into written text for various purposes: Phonetic Speech Analysis for Speech to Text Conversion
meeting minutes, court reports, medical records, interviews, has been given by Bapat, and Nagalkar[3]. Their work aimed
videos, speeches, and so on. Written text is easier to analyze in generating phonetic codes of the uttered speech in training-
and store than audio files, and apart from this, there are many less, human independent manner. The proposed system has
circumstances one could imagine for needing to transcribe four phases, namely, end point detection, segmenting speech
human speech: those who are deaf still need to listen to into phonemes, phoneme class identification and phoneme
certain audio files; people with limited ability to type, such as variant identification in the class identified. The proposed
those who are paralyzed or suffer from Carpal Tunnel system uses differentiation, zero-crossing calculation and FFT
Syndrome, still need to draft documents; and so on. Speech- operations.
to-Text (STT) system is a system for conversion of speech
into text. It takes speech as input and divides it into small 2. IMPLEMENTATION
segments. These small segments are sounds, known as The proposed speech to text conversion system is
monophones. It extracts the feature vectors of the simulated in MATLAB with speech signal as input and
monophones and matches them with stored feature vectors produces the corresponding text as output. The database
and most likely or higher matched character is returned to the consists of 165 speech samples which were collected from the
editor for printing. same speaker. Each speech sample is about 1 second long.
A System-on-Programmable-Chip (SOPC) based Speech- The speaker is asked to utter Myanmar character from ‘ka’ to
to-Text architecture has been proposed by Murugan and ‘ah’ five times in a training session and one time in a testing
Balaji[1]. This speech-to-text system uses isolated word session later on. The same microphone is used for all
recognition with a vocabulary of ten words (digits 0 to 9) and recordings. Speech signals are sampled at 8000 Hz.
statistical modeling (HMM) for machine speech recognition. In the training phase, feature vectors are calculated from
They used Matlab tool for recording speech in this process. the input speech signal by MFCC feature extraction
The training steps have been performed using PC-based C algorithm. Finally, the codebook or reference model for each
programs. The resulting HMM models are loaded onto a Field speech signal is constructed from the MFCC feature vectors
programmable gate array (FPGA) for the recognition phase. using LBG clustering algorithm and store it in the database. In
The uttered word is recognized based on maximum likelihood the identification phase, the input speech signal is compared
estimation. with the stored reference models in the database and the
An architecture for Hindi Speech Recognition System distance between them is calculated using Euclidean distance.
using Hidden Markov Model Toolkit (HTK) has been And then, the system outputs the speech ID which has
proposed by Kumar and Aggarwal[2]. The proposed system minimum distance as identification result and the
was built as a speech recognition system for Hindi language. corresponding character is printed on the screen in the
Hidden Markov Model Toolkit has been used to develop the Microsoft Office Word Document Format. Figure 1 and
system. The proposed architecture has four phases, namely, Figure 2 show the training and testing phases of speech to text
preprocessing, feature extraction, model generation and conversion system.
pattern classification. The system recognizes the isolated
words using acoustic word model. The system was trained for

www.ijsea.com 175
International Journal of Science and Engineering Applications
Volume 7–Issue 08,175-176, 2018, ISSN:-2319–7560

4. RESULT ANALYSIS
Speech Feature Features In the training phase, including feature extraction and
extraction
codebook construction, total length of training time is about
7.67 seconds for all 165 speech samples. The system is also
Speech tested with 33, 66, 99 and 132 speech samples in the database
Speech model Speech
database modeling and it takes 1.67 seconds, 3.16 seconds, 4.66 seconds and 6.14
seconds respectively for training.
Figure 1. Training phase of speech to text system In the testing phase, when the system is tested by 165
speech samples in the database, the computation time taken by
the system is 6.67 seconds for testing all 33 characters. On the
other hand, the accuracy of the system is exactly 100 percent.
Speech Feature Features
extraction In the experiments of testing by 33, 66, 99 and 132 speech
samples in the database, it is found that the computation times
is 0.73 seconds, 1.89 seconds, 2.78 seconds and 5.52 seconds
respectively for testing all 33 characters. In the case of
Speech Comparison with
database speech database accuracy, the system achieves 91 percent, 97 percent, 100
percent and 100 percent respectively.
Character
printing According to the experiments, it was found that most of
the errors occurred among 'Ka Gyi', 'Gha Gyi', 'Na Gyi' and
Figure 2. Testing phase of speech to text system 'La Gyi' because these characters produce quite similar sound
in Myanmar Language. The error also occurred between 'Ah'
and 'Ha'. When the accuracy is taken into account, the larger
the size of the database is, the higher the accuracy of the
3. EXPERIMENTAL RESULT system is.
This section describes the results of experiments carried
out in different database sizes. In order to show the 5. CONCLUSION
effectiveness of the proposed system, the computation time as From this work it can be concluded that the system is
well as accuracy of the system is computed. The training reliable to use in real world applications and it is reasonably
times taken by the system are shown in Table 1. The fast for working in real-time.
computation times and accuracy of the system are shown in
Table 2. 6. ACKNOWLEDMENTS
Table 1. Computation time taken by the system in The author would like to take the opportunity to thank all
training phase of his colleagues who have given him support and
encouragement during the period of the research. The author
No No: of Trained Samples Time taken (seconds) also would like to express his indebtedness and deep gratitude
1 33 sample 1.67 to his beloved parents, wife and son for their kindness,
2 66 samples 3.16 support and understanding during the whole course of this
3 99 samples 4.66 work and encouragement to attain his ambition without any
4 132 samples 6.14 trouble.
5 165 samples 7.67 7. REFERENCES
[1] Bala Murugan M.T, Balaji .M, “SOPC-Based Speech-to-
Table 2. Computation time and accuracy of the system in Text Conversion”, Nios II Embedded Processor Design
Contest—Outstanding Designs 2006, Second Prize,
testing phase
National Institute of Technology, Trichy, 2006.
No: of No: of Time Accuracy [2] Kumar Kuldeep and Aggarwal R.K., “Hindi Speech
No Test Samples in taken (percent) Recognition System using HTK”, J. of International
Samples the Database (seconds) Journal of Computing and Business Research, vol. 2, pp.
1 33 33 0.73 91% 3-7, 2011.
2 33
sample 66 1.89 97%
3 33
samples 99 2.78 100% [3] Bapat Abhijit V., Nagalkar Lalit K., “Phonetic Speech
4 33 132 5.52 100% Analysis for Speech to Text Conversion”, in IEEE Region
sample
10 Colloquium and the Third International Conference on
5 33
samples 165 6.67 100%
Industrial and Information Systems, Kharagpur, India,
samples 2008, pp. 1-4.

www.ijsea.com 176

You might also like