Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2020, Journal of theoretical and applied information technology
In past few eras, emotion recognition from speech is one of the hottest research topic in the field of Human Computer Interaction. Many researches are going on various types of language, but for Bengali language, it is still very novice. In this work, 4 emotional state have been recognized i.e. happy, sad, angry and neutral from Bengali Speech Dataset. Proposed approach uses Pitch and Mel-frequency Cepstral Coefficient (MFCC) feature vectors to train k-Nearest Neighbor classifier for this work. A self-built Bengali emotional speech dataset has been used for both training and testing purpose. The dataset consists of consist of 50 people with 400 isolated emotional sentences. Using this dataset and above technique, we achieved 87.50% average accuracy rate, with detection accuracy each emotion (happy, sad, angry, neutral) respectively 80.00%, 75.00%, 85.00% and 75.00% in this work.
2014
Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into four emotional states such as anger, sadness, neutral, and happiness. The speech samples are from Berlin emotional database and the features extracted from these utterances are energy, pitch, ZCC, entropy, Mel Frequency cepstrum coefficients (MFCC). The K Nearest Neighbor (KNN) is used as a classifier to classify different emotional states. The system gives 86.02% classification accuracy for using energy, entropy, MFCC, ZCC, pitch Features. KeywordsSpeech Emotion; Automatic Emotion Recognition; KNN; Energy; Pitch; MFCC; ZCC.
2013
25 Abstract— This paper presents the results of investigations in speech emotion recognition in Hindi, using only the first four formants and their bandwidths. This research work was done on female speech data base of nearly 1600 utterances comprising neutral, happiness, surprise, anger, sadness, fear and disgust as the elicited emotions. The best of the statistically preprocessed formant and bandwidth features were first identified by the KMeans, K nearest Neighbour and Naive Bayes classification of individual features. This was followed by artificial neural network classification based on the combination of the best formants and bandwidths. The highest overall emotion recognition accuracy obtained by the ANN method was 97.14%, based on the first four values of formants and bandwidths. A striking increase in the recognition accuracy was observed when the number of emotion classes was reduced from seven. The obtained results presented in this paper, have not been reported so far for...
This paper gives a comparison of two extracted features namely pitch and formants for emotion recognition from speech. The research shows that various features namely prosodic and spectral have been used for emotion recognition from speech. The database used for recognition purpose was developed on Marathi language using 100 speakers. We have extracted features pitch and formants. Angry, stress, admiration, teasing and shocking have been recognized on the basis of features energy and formants. The classification technique used here is K-Nearest Neighbor (KNN). The result for formants was about 100% which is comparatively better than that of energy which was 80% of accuracy.
International Journal of Computer Applications, 2018
A modern development in technology is Speech Emotion Recognition (SER). SER in partnership with Humane-Machine interaction (HMI) has advanced machine intelligence. An emotion precise HMI is designed by integrating speech processing and machine learning algorithm which is sculpted to formulate an automated smart and secure application for detecting emotions in a household as well as in commercial application. This project presents a study of distinguishing emotions by acoustic speech recognition (ASR) using K-means nearest neighbor (K-NN), a machine learning (ML) technique. The most significant paralinguistic information obtained from spectral features is presented by ASR i.e. by using Mel frequency cepstrum coefficient (MFCC). The most important processing techniques methods include feature extraction, feature selection, and classification of emotions. A customized dataset consisting of speech corpus, simulated emotion samples in the Sanskrit language is used to classify emotions in different emotional classes i.e. happy, sad, excitement, fear, anger and disgust. The emotions are classified using a K-NN algorithm over 2 separate models, based on the soft and high pitch voice. Model 1 and 2 achieved about 72.95% and 76.96% recognition
2013
Emotion based speaker Identification System is the process of automatically identifying speaker’s emotion based on features extracted from speech waves. This paper presents experiment with the building and testing of a Speaker’s emotion identification for Hindi speech using Mel Frequency Cepestral Coefficients and Vector Quantization techniques. We collected voice samples of Hindi speech sentences in four basic emotions to study speaker’s emotion identification and it was found that with proposed emo-voice model we are able to achieve accuracy of 73% of speaker’s emotion identification in speech out of 93% of the total speech samples provided to the system.
Journal of Interdisciplinary Mathematics, 2020
Now a days speech is fastest medium of giving instruction to machines to do any task. When a person uttered a word at that time machine can understand the semantic of the utterance but not the emotion related with that utterance. This study mainly focused on combining on different types of speech features together and then this paper used various statistical techniques for reducing the dimensionality of data and then applying machine learning algorithm to train dataset and predicting the emotional state of the person so that while receiving the instruction form humans, machines can provide better response for recognizing emotion.
2015
Emotions play important roles in expressing feelings as it tend to make people acts differently. Determining emotions of the speaker is less complicated if we are facing him/her rather than from voice independently such as conversation in telephone. However it would be a great achievement if we able to detect with what emotion a speaker is speaking just by listening to the voice. This project is a small step towards it and we basically are focusing on determining emotions through the recorded speech and developing the prototype system. The ability to detect human emotion from their speech is going to be a great addition in the field of human-robot interaction. The aim of the work is to build an emotion recognition system using Mel-frequency cepstral coefficients (MFCC) and Gaussian mixture model (GMM). In this work four emotional states happy, sad, angry and neutral are taken for classification of emotions. Here we have considered only 10 speakers, 7 male and 3 female, all belonging to upper Assam region and all speak in the same accent. The experiments are performed for only speaker dependent and text independent case.
2013
an emotion is a mental and physiological state associated with a wide variety of feelings, thoughts, and behavior. Emotions are subjective experiences, or experienced from an individual point of view. Emotion is often associated with mood, temperament, personality, and disposition. Hence, in this paper method for detection of human emotions is discussed based on the acoustic features like pitch, energy etc. The proposed system is using the traditional MFCC approach [2] and then using nearest neighbor algorithm for the classification. Emotions has been classified separately for male and female based on the fact male and female voice has altogether different range [1][4] so MFCC varies considerably for the two. Keywords— Emotion Recognition from Speech, Fourier Transform, MelFilterBank, MFCC, Modern MFCC Approach, Nearest Neighbor Algorithm
2022
Emotion recognition from acoustic signals plays a vital role in the field of audio and speech processing. Speech interfaces offer humans an informal and comfortable means to communicate with machines. Emotion recognition from speech signals has a variety of applications in the area of human computer interaction (HCI) and human behavior analysis. In this work, we develop the first emotional speech database of the Urdu language. We also develop the system to classify five different emotions: sadness, happiness, neutral, disgust, and anger using different machine learning algorithms. The Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coefficient (LPC), energy, spectral flux, spectral centroid, spectral roll-off, and zero-crossing were used as speech descriptors. The classification tests were performed on the emotional speech corpus collected from 20 different subjects. To evaluate the quality of speech emotions, subjective listing tests were conducted. The recognition of correctly classified emotions in the complete Urdu emotional speech corpus was 66.5% with K-nearest neighbors. It was found that the disgust emotion has a lower recognition rate as compared to the other emotions. Removing the disgust emotion significantly improves the performance of the classifier to 76.5%.
TJPRC, 2013
The man-machine relation has demanded the smart trends that machines have to react after considering the human emotion levels. The technology boost improved the machine intelligence to identify human emotions at expected level. Harnessing the approaches of speech processing and pattern recognition algorithms a smart and emotions oriented man-machine interaction can be achieved with the tremendous scope in the field of automated home as well as commercial applications. This paper deals with the aspects of pitch, Mel Frequency Cepstrum Coefficients based speech features and wavelet domain in speech emotion recognition. The impact of incorporating different classifier using Gaussian Mixture Model (GMM), K-Nearest Neighbour (K-NN) and Hidden Markov Model (HMM) on the recognition rate in the identification of six emotional categories namely happy, angry, neutral, surprised, fearful and sad from Berlin Emotional Speech Database (BES) is emphasized with intents to do a comparative performance analysis. In the experiments the speech features used are based on pitch, MFCCs and discrete wavelet domain ‘db1’ family based vectors. The features were same for all the three classifiers of GMM, K-NN and HMM in order to check their comparative performance based on the merits of ‘recognition accuracy’, ‘confusion matrix’, ‘precision rate’ and ‘F-measure’. The highest recognition accuracy for the GMM classifiers were 92% for ‘angry’ emotions, the K-NN classifiers gave 90% correct recognition for ‘happy’ class, while the highest recognition scores for the HMM classifier were 78% for ‘angry’ emotions. The confusion matrix statistics depicts the confusion in recognition between ‘happy-neutral’ emotions; however the three classifiers confused atleast once with the ‘angry’ emotion in detection of each of the remaining emotions. The results for precision rate and F-measure convey the superiority of GMM classifiers in emotion recognition system while the K-NN and HMM were average in overall performance.
This paper proposes the use of a minimum number of formant and bandwidth features for efficient classification of the neutral and six basic emotions in two languages. Such a minimal feature set facilitates fast and real time recognition of emotions which is the ultimate goal of any speech emotion recognition system. The investigations were done on emotional speech databases developed by the authors in English as well as Malayalam-a popular Indian language. For each language, the best features were identified by the KMeans, K-nearest neighbor and Naive Bayes classification of individual formants and bandwidths, followed by the artificial neural networks classification of the combination of the best formants and bandwidths. Whereas an overall emotion recognition accuracy of 85.28 % was obtained for Malayalam, based on the values of the first four formants and bandwidths, the recognition accuracy obtained for English was 86.15%, based on a feature set of the four formants and the first and fourth bandwidths, both of which are unprecedented. These results were obtained for elicited emotional speech of females and with statistically preprocessed formants and bandwidth values. Reduction in the number of emotion classes resulted in a striking increase in the recognition accuracy.
IJRET, 2013
Speech processing is the study of speech signals, and the methods used to process them. In application such as speech coding, speech synthesis, speech recognition and speaker recognition technology, speech processing is employed. In speech classification, the computation of prosody effects from speech signals plays a major role. In emotional speech signals pitch and frequency is a most important parameters. Normally, the pitch value of sad and happy speech signals has a great difference and the frequency value of happy is higher than sad speech. But, in some cases the frequency of happy speech is nearly similar to sad speech or frequency of sad speech is similar to happy speech. In such situation, it is difficult to recognize the exact speech signal. To reduce such drawbacks, in this paper we propose a Telugu speech emotion classification system with three features like Energy Entropy, Short Time Energy, Zero Crossing Rate and K-NN classifier for the classification. Features are extracted from the speech signals and given to the K-NN. The implementation result shows the effectiveness of proposed speech emotion classification system in classifying the Telugu speech signals based on their prosody effects. The performance of the proposed speech emotion classification system is evaluated by conducting cross validation on the Telugu speech database
International Journal of Speech Technology, 2012
Emotion recognition from speech has emerged as an important research area in the recent past. In this regard, review of existing work on emotional speech processing is useful for carrying out further research. In this paper, the recent literature on speech emotion recognition has been presented considering the issues related to emotional speech corpora, different types of speech features and models used for recognition of emotions from speech. Thirty two representative speech databases are reviewed in this work from point of view of their language, number of speakers, number of emotions, and purpose of collection. The issues related to emotional speech databases used in emotional speech recognition are also briefly discussed. Literature on different features used in the task of emotion recognition from speech is presented. The importance of choosing different classification models has been discussed along with the review. The important issues to be considered for further emotion recognition research in general and in specific to the Indian context have been highlighted where ever necessary.
2010
In this paper we present a comparative analysis of four classifiers for speech signal emotion recognition. Recognition was performed on emotional Berlin Database. This work focuses on speaker and utterance (phrase) dependent and independent framework. One hundred thirty three (133) sound/speech features have been extracted from Pitch, Mel Frequency Cepstral Coefficients, Energy and Formants. These features have been evaluated in order to create a set of 26 features, sufficient to discriminate between seven emotions in acted speech. Multilayer Percepton, Random Forest, Probabilistic Neural Networks and Support Vector Machine were used for the Emotion Classification at seven classes namely anger, happiness, anxiety/fear, sadness, boredom, disgust and neutral. In the speaker dependent framework, Probabilistic Neural Network reaches very high accuracy(94%), while in the speaker independent framework the classification rate of the Support Vector Machine reaches 80%. The results of numerical experiments are given and discussed in the paper.
In machine interaction with human being is yet challenging task that machine should be able to identify and react to human non-verbal communication such as emotions which makes the human computer interaction become more natural. In present research area automatic emotion recognition using speech is an essential task which paid close attention. Speech signal is a rich source of information and it is an attractive and efficient medium due to its numerous features of expressing approach & extracting emotions through speech is possible. In this paper emotions is recognized through speech using spectral features such as Mel frequency cepstrum coefficient prosodic features like pitch , energy and were utilized & study is carried out using K-Nearest Neighbor classifiers and Gaussian mixture model classifier which is used for detection of six basic emotional states of speaker's such as anger ,happiness , sadness , fear , disgust and neutral using Berlin emotional speech database.
Advances in Intelligent Systems and Computing, 2015
This paper explore on emotion recognition from Marathi speech signals by using feature extraction techniques and classifier to classify Marathi speech utterances according to their emotional contains. A different type of speech feature vectors contains different emotions, due to their corresponding natures. In this we have categorized the emotions as namely Anger, Happy, Sad, Fear, Neutral and Surprise. Mel Frequency Cepstral Coefficient (MFCC) feature parameters extracted from Marathi speech Signals depend on speaker, spoken word as well as emotion. Gaussian mixture Models (GMM) is used to develop Emotion classification model. In this, recently proposed feature extraction technique and classifier is used for Marathi spoken words. In this each subject/Speaker has spoken 7 Marathi words with 6 different emotions that 7 Marathi words are Aathawan, Aayusha, Chamakdar, Iishara, Manav, Namaskar, and Uupay. For experimental work we have created total 924 Marathi speech utterances database and from this we achieved the empirical performance of overall emotion recognition accuracy rate obtained using MFCC and GMM is 84.61% rate of our Emotion Recognition for Marathi Spoken Words (ERFMSW) system. We got average accuracy for male and female is 86.20% and 83.03% respectively.
In the present work, two new features, based on normal and Teager-Energy operated Wavelet Packet Cepstral Coefficients (WPCC2 and tfWPCC2) computed by method 2, have been proposed and their performances has been compared with the existing features based on normal and Teager-Energy operated Wavelet Packet Cepstral Coefficients (WPCC and tfWPCC) computed by method 1, Mel Frequency Cepstral Coefficients (MFCC) and Log Frequency Power Coefficients (LFPC) for emotion recognition from speeches of five native languages of Assam (Assamese, Bodo, Dimasa, Karbi and Mishing). The data consisted of 20 short emotionally biased sentences of each emotion (full-blown) spoken by 20 speakers recorded in a small quiet room. A total of seven GMMs are trained, one for each emotion. The feature set based on tfWPCC2 has exhibited good performance, in terms of both computational efficiency and classification accuracy.
Journal of Advances in Computer Networks, 2014
In recent years the workings which requires human-machine interaction such as speech recognition, emotion recognition from speech recognition is increasing. Not only the speech recognition also the features during the conversation is studied like melody, emotion, pitch, emphasis. It has been proven with the research that it can be reached meaningful results using prosodic features of speech. In this paper we performed pre-processing necessary for emotion recognition from speech data. We extract features from speech signal. To recognize emotion it has been extracted Mel Frequency Cepstral Coefficients (MFCC) from the signals. And we classified with k-NN algorithm.
2016
Emotion recognition from Audio signal Recognition is a recent research topic in the Human Computer Interaction. The demand has risen for increasing communication interface between humans and digital media. Many researchers are working in order to improve their accuracy. But still there is lack of complete system which can recognize emotions from speech. In order to make the human and digital machine interaction more natural, the computer should able to recognize emotional states in the same way as human. The efficiency of emotion recognition system depends on type of features extracted and classifier used for detection of emotions. There are some fundamental emotions such as: Happy, Angry, Sad, Depressed, Bored, Anxiety, Fear and Nervous. These signals were preprocessed and analyzed using various techniques. In feature extraction various parameters used to form a feature vector are: fundamental frequency, pitch contour, formants, duration (pause length ratio) etc. These features are...
In machine interaction with human being is yet challenging task that machine should be able to identify and react to human non-verbal communication such as emotions which makes the human computer interaction become more natural. In present research area automatic emotion recognition using speech is an essential task which paid close attention. Speech signal is a rich source of information and it is an attractive and efficient medium due to its numerous features of expressing approach & extracting emotions through speech is possible. In this paper emotions is recognized through speech using spectral features such as Mel frequency cepstrum coefficient prosodic features like pitch , energy and were utilized & study is carried out using K-Nearest Neighbor classifiers , Support Vector Machine Classifier and Gaussian mixture model classifier which is used for detection of six basic emotional states of speaker's such as anger ,happiness , sadness , fear , disgust and neutral using Berlin emotional speech database.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.