See discussions, stats, and author profiles for this publication at: [Link]
net/publication/358956724
SPEECH RECOGNITION USING SOFT COMPUTING
Article in International Journal of Research in Computer Science · March 2022
CITATIONS READS
0 379
3 authors:
R.K Srivastava Digesh Pandey
Dr. Shakuntala MIsra Rehabilitation University, Lucknow 8 PUBLICATIONS 38 CITATIONS
25 PUBLICATIONS 513 CITATIONS
SEE PROFILE
SEE PROFILE
Raj Shree Pandey
Babasaheb Bhimrao Ambedkar University
60 PUBLICATIONS 714 CITATIONS
SEE PROFILE
All content following this page was uploaded by R.K Srivastava on 02 March 2022.
The user has requested enhancement of the downloaded file.
DOI: [Link]
Volume 13, No. 1, January-February 2022 ISSN No. 0976-5697
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at [Link]
SPEECH RECOGNITION USING SOFT COMPUTING
Dr. [Link] Digesh Pandey
CST-UP, Dr. ShakuntalaMisra National Rehabilitation CST-UP, Dr. ShakuntalaMisra National Rehabilitation
University University
Lucknow, India Lucknow, India
Raj Shree
CST-UP, BabasahebBhimraoAmbedkar University
Lucknow, India
Abstract: The work of speech affirmation is one of the entrancing field with respect to speech signal taking care of. Achieving accuracy and
strength is a very problematic limit to various regular components. Reformist work and reviews in the speech recognition application has been
gotten using Soft Computing, as one of the system to further develop the affirmation exactness’s. This research paper reviews the various
thoughts of Soft Computing procedure and its applications to speech signal taking care of an area. Since the possibility of speech signal is
questionable, it doesn't deal with consistency at immaculate stretches. To deal with this irregularity and weaknesses, various researchers have
proposed soft computing is one of the better technique to separate the speech signals. This research paper presents the composing work open
related to speech recognition using Soft computing methodology.
Keywords: Speech, soft computing, accuracy, consistency, Hidden Markov Model.
IV describe the soft computing and section V provide
I. INTRODUCTION conclusion of this paper.
Speech is the key, best, solid and normal medium to impart
continuously frameworks. In market because of progression in II. SPEECH RECOGNITION SYSTEM
innovation numerous Speech correspondence applications Speech recognition techniques were initially attempted in early
based gadgets is accessible, they are less expensive and 1952 at Bell Lab, where Davis, Biddulph, and
effectively accessible. Notwithstanding, undesired Balashekdeveloped a disengaged digit recognition framework
commotions in climate cause undesired impacts continuously for a single speaker [1]. There are two types of speech ID
speech preparing frameworks. Human interchanges and smart tasks: closed set and open set. The ID of speech that already
machines are experiences the debased presentation wherein exists in the data set is included in the closed set recognisable
they takes choice dependent on what it gets as a speech. Prior proof; otherwise, it is an open set speech ID task. Disengaged
numerous analysts explored and created different word speech acknowledgment necessitated silence on both
methodologies for commotion decrease and speech upgrades. sides of the word, whereas ceaseless word acknowledgment
The speech upgrade is beneficial for increasing the capacity makes speech difficult to perceive [2]. Apps for speech
and transmission of speech data, as well as developing speech correspondence It is also used in financial structures [3, 4].
recognition-based framework execution, in which precise The approach of the fundamental speech acknowledgment
identification of words and sentences can provide framework [7] was proposed by Juang and B. Yegnanarayana.
mechanisation in the vast majority of human-machine or It consists of four basic building blocks for voice analysis:
machine-based interfaces. By maintaining a low word blunder extraction, language interpretation, and message
rate, speech upgrading can help speed up the display of voice comprehension. Commotion expulsion, quiet evacuation, and
recognition frameworks (WER). There are a variety of voice end point recognition are all part of the speech examination
recognition frameworks available, some of which are stage. To work on the presentation of the speech
integrated into task-specific apps. A robust Mandarin Speech acknowledgment framework, end point identification and
Recognition framework leveraging neural networks applied to commotion expulsion are required. Loud speech is measured
media interfaces performs better in real-world applications [5]. along the basilar film in the internal ear, which allows for
Speech recognition is used in a mixed media language training range analysis of boisterous [Link] speech analysis also
framework for a variety of challenges and ages. Speech maintains the suitable casing size for fragmenting speech
recognition performs recognisable proof of speech defects and signals for further analysis using division, sub segmental, and
follows the patient's progress using time recurrence assessment supra segmental examination procedures [8]. The component
and neural organisation methods in addition to recording the extraction and coding stage reduces the dimensionality of the
voice and breaking down the recorded spoken sign [6]. information vector while maintaining the sign's separation
force. Because the quantity of preparation and test vector
In this paper section I contains the introduction, section II required for the arrangement issue grows with the component
contains the speech recognition system details, section III of the given data, we need to incorporate extraction. The most
contains the details of speech recognition techniques, section often used techniques for highlight extraction are Direct
© 2020-2022, IJARCS All Rights Reserved 11
Yosef Berhanu Buladie et al, International Journal of Advanced Research in Computer Science, 13 (1), Jan-Feb 2022,11-15
Predictive Coding (LPC) and Mel Frequency Cepstral traditional hereditary frameworks. They used
Coefficients (MFCC). Because it is less likely to cause determination/generation, hybrid, and alter boundaries on a set
disruption, MFCC preferred it versus [Link] a neural of coded arrangements (population) [16]. The implementation
transduction approach, the awful sign yield of speech and testing of HMM and ANN techniques for speech
investigation was converted to action signals on the hearable recognition on a Field Programmable Gate Array (FPGA)
nerve. The action signal is then converted into a linguistic device was described in [17]. GA was used to prepare ANN in
code within the cerebrum, and finally message comprehension order to obtain a more precise and optimal arrangement. The
is achieved. results demonstrate that HMM has a little higher
acknowledgment rate than ANN, but ANN's speech
III. SPEECH RECOGNITION TECHNIQUES acknowledgment speed is much faster than HMM's. For the
codebook plan of vector quantization, the LBG calculation is
Temporal, Artificial Neural Network, and Stochastic processes commonly used. One exciting research paper [18] offered a
are the three basic classifications for speech recognition GA-L (GA and LBG) calculation-based approach for vector
strategies. Dynamic Time Warping (DTW) and Vector quantization in speech acknowledgment frameworks, which
Quantization (VQ) are used for global voice recognition, while operates on the nature of the codebook. It's more convincing
Hidden Markov Model (HMM) and Gaussian Mixture Model than a standard LBG calculation. The fluffy rationale
(GMM) are used for stochastic speech recognition, and acknowledgment strategy based on power conveyance
Multilayer Perceptron is used for artificial neural network- example of a part of a speech continually frameworks was
based speech recognition (MLP). PC can use DTW to find the introduced in one research report [19]. For consistent speech
best match between two speech arrangements with particular preparation, example coordinating with measure is used in this
constraints. The decision to be made is based on the global paper design era. For the advanced PDA application,
distance measurements between two speech designs. [9]. In perspective deferral and total and versatile beamforming
DTW, there is a compromise between exactness of calculation [20] were used in the loud automobile
acknowledgment and computational productivity. Dynamic [Link] the managed speech, performance metrics
programming is used to execute enhancement measures in such as sign to commotion proportion and speech
DTW. VQ is useful for speech coders and is commonly used acknowledgment error rate were analysed in this article, and
in Automatic Speech Recognition (ASR). For reference the results demonstrate that an amplifier showcase works
models, it uses minimal codebooks. When VQ is used with better than a single mouthpiece framework. [21] demonstrates
DTW/HMM, capacity and computing time are reduced [10]. that a beamforming-based speech upgrading approach
MLP is a neural organisation process based on back spread improves speech recognition in a multi-mouthpiece
(BP) calculation that is used as a classifier, with hubs environment. The results demonstrated the discourse upgrade
connected to adjacent layers by loads. The execution of MLP ability of the bar shaping strategy in multi mouthpiece
debases in the midst of a ruckus. Stochastic modelling is a organisation procedures by displaying speech
probabilistic model arrangement with shaky data that is more acknowledgment against the channel bank boundaries; channel
appropriate for voice recognition. HMM [11] is a well-known length and number of subbands were broken down by
stochastic approach that is characterised by a limited state assessing level of acknowledgment precision, and the results
markov model and a number of yield circulations. GMM is a demonstrated the discourse upgrade ability of the bar shaping
mechanism for presenting text-based speech recognition. strategy in multi mouthpiece organisation procedures.
Every speaker in GMM has a free GMM model, and the yield Table 1. Comparison between different algorithms
of GMM is determined by using the most extreme probability DBN FDBN AFDBN RL
grouping identifier. Accuracy 75% 79.4% 88.5% 94%
Precision 74% 79.4% 88.5% 93%
Recall 78% 81.4% 91% 97%
IV. SOFT COMPUTING TECHNIQUES Avg. 1145 1100 ms 890 ms 820 ms
Processing ms
Delicate figuring is a collection of computational processes Time
used in design disciplines to explore, show, and dissect
extremely complicated problems where traditional approaches
fail to provide cost-effective solutions. Neural Networks,
Fuzzy Logic, and Evolutionary Computation are important
components of delicate figuring (Genetic Algorithm). The
Artificial Neural Network (ANN) is a data preparation
paradigm inspired by the way natural sensory systems work. It
is made up of a large number of extremely interconnected
handling components (neurons) that work together to address
certain difficulties. ANN is often used for continuous activity
because ANN computations are conducted in a consistent
manner. LotfiZadef devised the fluffy rationale (FL) critical
thinking control framework technique. It manages ambiguous
data, which is represented as fluffy sets of data. FL is used in
many control framework applications because it mimics
human control logic. Ga's (Hereditary Algorithms) are
versatile computational approaches based on the mechanics of
© 2020-2022, IJARCS All Rights Reserved 12
linear
connection.
2. MohitDu Punjabi For regional Mel
a, Automatic languages Frequenc
[Link] Speech like Punjabi, y
arwal , Recognition an efficient, Cepstral
Virender Using HTK abstract, and Coefficie
Kadyan rapid ASR nt
,ShelzaD system is (MFCC),
ua critical. Hidden
Markov
Model,
Dynamic
Time
Warp
3. WiqasGh Analysis of Methods' Multi
ai, Automatic application layer
Navdeep Speech To improve perceptro
Singh Recognition voice n,
Systems for recognition Cooperati
Indo-Aryan performance ve
Languages: , researchers heterogen
Punjabi A are adopting eous
Case Study Cooperative artificial
Heterogeneo neural
us ANN network
Architecture,
Maximum
Likelihood
Linear
Regression,
Extended
MFCC, and
Learning
Vector
quantization.
4. Milind U. Survey of So The Genetic
Nemade , Computing beamformin Algorith
Prof. based Speech g technique m,
Satish K. Recognition is commonly Artificial
Shah Techniques utilised to Neural
for Speech improve Network,
Enhancement voice Mel
in Multimedia recognition Frequenc
Applications performance y
in Cepstral
multimedia Coefficie
applications. nt
(MFCC)
5. Cini A Review on It is a HIDDEN
Kurian Technological foregone MARKO
Development conclusion V
Sl. Authors Title Purpose Algorith
of Automatic that this MODEL,
No. m Speech technology Discrete
1. F. A robust voice It has been speech Recognition will go from Hidden
Beritelli activity demonstrate compress machines Markov
detector for d that the ion, that can Model
wireless SNR and the activity somewhat (DHMM)
communicatio threshold detection duplicate , Semi
ns using so that human Continuo
computing minimises speaking us HMM
the total skills to the (SCHM
error have a building of a M)
© 2020-2022, IJARCS All Rights Reserved 13
Yosef Berhanu Buladie et al, International Journal of Advanced Research in Computer Science, 13 (1), Jan-Feb 2022,11-15
machine that Applications: to give a g, fuzzy
can act like A Perspective general logic,
an intelligent View understandin artificial
person. g of soft neural
6. Nidhi Feature To present a Acoustic computing, network,
Desai , Extraction and comprehensi Phonetic as well as its genetic
[Link] Classification ve appraisal Approach relevance, algorithm
alDhamel Techniques of speech , applications,
iya , for Speech recognition Artificial and
[Link] Recognition: and to Intelligen strengths.
yendra A Review provide ce,
Desai some year- Feature
by-year extraction
development , LPC, V. CONCLUSION
s to this day MFCC In this research paper, Speech is the basic, best, dependable
is a difficult and normal medium to impart progressively frameworks.
and There are such countless utilizations of speech still to be a
fascinating long way from reality on account of absence of productive and
endeavour in solid commotion expulsion component and strategies for
and of itself. saving or working on the clarity for the speech signals. The
7. Gautam SPEECH created a Electroen purpose of this research is to look into ways for delicate
Krishna, RECOGNITI deep cephalogr registering-based speech acknowledgement procedures in
Co Tran, ON WITH NO learning apy interactive media apps for speech improvement. This audit
Jianguo SPEECH OR model (EEG), showed that the beam forming method is commonly employed
Yu, WITH NOISY capable of Speech in mixed media applications to improve voice recognition
Ahmed H SPEECH learning Recogniti performance. As we continue to work on the demonstration of
Tewfik EEG on, a beam forming based speech acknowledgment framework, we
properties Distillatio may expect transformative computational calculation (GA)
and n, Deep advances to be applied in interactive media applications. We
performing Learning focused on the most often used presentation estimation bounds
speech for voice recognition.
recognition
without any
voice input VI. REFERENCES
8. Anupam Artificial Speech NLP,GUI
Choudhar Intelligence recognition ,IP,chann [1] K. H. Davis, R. Biddulph, and S. Balashek, “Automatic
y, Ravi Techniques to will be el model Recognitionof Spoken Digits,” J. Acoust. Soc. Am., 24
Kshirsag Process prevalent in (6): 637-642, 1952.
ar Speech telephone [2] M.G. Sumithra, M.S. Ramya, K. Thanuskodi, “Noise
Recognition networks robust isolated word recognition” International
System throughout Conference on Communication andComputational
the world Intelligence (INCOCCI), pp. 362-367, 2010.
over the next [3] A. Burstein, A. Stolzle, and R. W. Brodersen, “Using
few years, speech recognition in a personal communications system”
necessitating IEEE InternationalConference on Communication, vol.3,
pp.1717-1721, 1992.
an entirely
different [4] T. Isobe, M. Morishima, F. Yoshitani, N. Koizumi and K.
Murakami, “Voice-activated home banking system and its
acoustic field trial”, International Conference on Spoken
model. Language, vol.3, pp. 1688-1691, 1996.
Because [5] Sheu B., Ismail M., Wang M., Tsai R., “Speech
there is no Recognition in multimedia human machine interfaces
GUI, it must using neural networks”, Wiley-IEEE Press, pp. 463-489,
be able to 1998.
connect with [6] V. C. Georgopoulos, “An investigation of audio-visual
telephony speech recognition as applied to multimedia speech
systems and therapy applications”, IEEE International Conference on
manage a multimedia computing and system, vol.1, pp. 481-486,
spoken 1999.
dialogue [7] L. Rabiner, B.H. Juang and B. Yegnanarayana,
with the “Fundamentals of Speech Recognition”, Pearson
Education, first edition, ISBN 978-81- 7758-560-5, 2009.
user.
9. Dr. Uma Soft The goal of Soft
Kumari Computing this work is computin
© 2020-2022, IJARCS All Rights Reserved 14
[8] H.S. Jayanna, S.R. Mahadeva, “Analysis, Feature based Speech Enhancement”, International Journal of
Extraction, Modelling and Testing Techniques for Electronics Communicationand Computer Engineering,
Speaker recognition”, IETE Tech. Rev.,26:181-90, 2009. pp. 745-751, vol.3, Issue-4, 2012.
[9] Bin Amin T. And Mahmood I., “Speech Recognition using [22] F. Beritelli, “A robust voice activity detector for wireless
Dynamic Time Warping”, Second International communications using so computing”, IEEE JOURNAL
Conference on Advances inSpace Technologies, pp. 74- ON SELECTED AREAS IN COMMUNICATIONS,
79, 2008. VOL. 16, NO. 9, DECEMBER 1998
[10] S. Furui, “Vector quantization based speech recognition [23] MohitDua , [Link] , VirenderKadyan ,ShelzaDua,
and speaker recognition techniques”, Twenty-Fifth “Punjabi Automatic Speech Recognition Using HTK”,
Asilmar Conference on Signals,Systems and Computers, IJCSI International Journal of Computer Science Issues,
vol.2, pp.954-958, 1991. Vol. 9, Issue 4, No 1, July 2012
[11] A. P. Varga and R.K. Moore, “Hidden Markov Model [24] WiqasGhai, Navdeep Singh, “Analysis of Automatic
Decomposition of Speech and Noise”, Proc. ICASSP, pp. Speech Recognition Systems for Indo-Aryan Languages:
845-848, 1990. Punjabi A Case Study”, International Journal of Soft
[12] O. L. Frost, III, “An algorithm for linearly constrained Computing and Engineering (IJSCE) ISSN: 2231-2307,
adaptive array processing,” Proc. IEEE, vol. 60, pp. 926– Volume-2, Issue-1, March 2012
935, Jan. 1972. [25] Milind U. Nemade, Prof. Satish K. Shah, “Survey of Soft
[13] Griffiths, L.; Jim, C., "An alternative approach to linearly Computing based Speech Recognition Techniques for
constrained adaptive beamforming,",IEEE Transactions Speech Enhancement in Multimedia Applications”,
on Antennas andPropagation, vol.30, no.1, pp. 27- 34, Jan International Journal of Advanced Research in Computer
1982. and Communication Engineering, ISSN (Print) : 2319-
5940 ISSN (Online) : 2278-1021, Vol. 2, Issue 5, May
[14] Seltzer, M.L.; Raj, B.; Stern, R.M., "Likelihood- 2013
maximizing beamforming for robust hands-free speech
recognition," IEEETransactions on Speech and Audio [26] Cini Kurian, “A Review on Technological Development
Processing, vol.12, no.5, pp. 489- 498, Sept. 2004. of Automatic Speech Recognition”, International Journal
of Soft Computing and Engineering (IJSCE) ISSN: 2231-
[15] W. T. Hong, “Residual Noise Removal on Beamforming 2307, Volume-4 Issue-4, September 2014
for robust Hands-free Speech Recognition”, International
Computer Symposium(ICS), pp. 270-273, 2010. [27] Nidhi Desai , [Link] , [Link]
Desai, “Feature Extraction and Classification Techniques
[16] S.N. Shivnandam, S.N. Deepa, “Principles of Soft for Speech Recognition: A Review”, International Journal
Computing”, Wiley India Pvt Ltd, Reprint: 2010. of Emerging Technology and Advanced Engineering,
[17] Shing T. Pan, Ching F. Chen, Jian H. Zeng, “Speech (ISSN 2250-2459, ISO 9001:2008 Certified Journal,
Recognition via Hidden Markov Model and Neural Volume 3, Issue 12, December 2013)
Network Trained by Genetic Algorithm”, Proc. of 9th [28] Gautam Krishna, Co Tran, Jianguo Yu, Ahmed H Tewfik,
International Conference on MachineLearning and “SPEECH RECOGNITION WITH NO SPEECH OR
Cybernetics, Qingdao, pp. 2950-2955, 2010. WITH NOISY SPEECH”, 2019, arXiv:1903.00739v1
[18] Y. Yujin, Z. Qun, Z. Peihua, “Vector quantization [[Link]] 2 Mar 2019
Codebook Design Method for Speech Recognition Based [29] AnupamChoudhary, Ravi Kshirsagar, “Process Speech
on Genetic Algorithm”, Second International Conference Recognition System using Artificial Intelligence
on Information Engineering andComputer Science, pp. 1- Technique”, International Journal of Soft Computing and
4, 2010. Engineering (IJSCE) ISSN: 2231-2307, Volume-2, Issue-
[19] Tong Zhao, Peng-Yung Woo, “Fuzzy Speech 5, November 2012
Recognition”, International Joint Conference on Neural [30] Dr. Uma Kumari, “Soft Computing Applications: A
Networks, vol. 5, pp. 2959- 2961, 1999. Perspective View”, 2017, Proceedings of the 2nd
[20] Stephen Oh, Vishu V., Panos P., “Hands-Free Voice International Conference on Communication and
Communication in an Automobile With a Microphone Electronics Systems (ICCES 2017) IEEE Xplore
Array”, IEEE InternationalConference on ASSP, vol.1, Compliant - Part Number:CFP17AWO-ART, ISBN:978-
pp. 281-284, 1992. 1-5090-5013-0
[21] Milind U. Nemade, Satish K. Shah, “Improvement in
Speech Recognition Performance using Beamforming
© 2020-2022, IJARCS All Rights Reserved 15
View publication stats