Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014, Journal Scientific and Technical Of Information Technologies, Mechanics and Optics
…
9 pages
1 file
Humans are considered to reason and act rationally and that is believed to be their fundamental difference from the rest of the living entities. Furthermore, modern approaches in the science of psychology underline that humans as a thinking creatures are also sentimental and emotional organisms. There are fifteen universal extended emotions plus neutral emotion: hot anger, cold anger, panic, fear, anxiety, despair, sadness, elation, happiness, interest, boredom, shame, pride, disgust, contempt and neutral position. The scope of the current research is to understand the emotional state of a human being by capturing the speech utterances that one uses during a common conversation. It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA) multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers. The paper deals with emotion classification by a set of majority voting classifiers that combines three certain types of basic classifiers with low computational complexity. The basic classifiers stem from different theoretical background in order to avoid bias and redundancy which gives the proposed set of classifiers the ability to generalize in the emotion domain space.
Humans are considered to reason and act rationally and that is believed to be their fundamental factor that differentiates them from the rest of living entities. Furthermore, modern approaches in the science of psychology underlying that human except of thinking creatures are also sentimental and emotional organisms. There are six universal basic emotions plus neutral emotion, that is: happiness, surprise, fear, sadness, anger, disgust and neutral. The scope of the current research is to understand the emotional state of a human being by capturing the speech utterances that uses during a common conversion. It is proved that given enough acoustic evidence the emotional state of a person can be classified by an ensemble majority voting classifier. The proposed ensemble classifier is constructed over three base classifiers: kNN, C4.5 and SVM Polynomial Kernel. The proposed ensemble classifier achieves better performance than each base classifier. It is compared with two other ensemble classifiers: one-against-all (OAA) multiclass SVM with Radial Basis Function (RBF) kernels and OAA multiclass SVM with Hybrid kernels. The proposed ensemble classifier achieves better performance than the other two ensemble classifiers. The current paper performs emotion classification with an ensemble majority voting classifier that combines three certain types of base classifiers which are of low computational complexity. The base classifiers stem from different theoretical background in order to avoid bias and redundancy which gives to the proposed ensemble classifier the ability to generalize in the emotion domain space.
2010
In this paper we present a comparative analysis of four classifiers for speech signal emotion recognition. Recognition was performed on emotional Berlin Database. This work focuses on speaker and utterance (phrase) dependent and independent framework. One hundred thirty three (133) sound/speech features have been extracted from Pitch, Mel Frequency Cepstral Coefficients, Energy and Formants. These features have been evaluated in order to create a set of 26 features, sufficient to discriminate between seven emotions in acted speech. Multilayer Percepton, Random Forest, Probabilistic Neural Networks and Support Vector Machine were used for the Emotion Classification at seven classes namely anger, happiness, anxiety/fear, sadness, boredom, disgust and neutral. In the speaker dependent framework, Probabilistic Neural Network reaches very high accuracy(94%), while in the speaker independent framework the classification rate of the Support Vector Machine reaches 80%. The results of numerical experiments are given and discussed in the paper.
2012 IEEE Spoken Language Technology Workshop (SLT), 2012
Emotion classification is essential for understanding human interactions and hence is a vital component of behavioral studies. Although numerous algorithms have been developed, the emotion classification accuracy is still short of what is desired for the algorithms to be used in real systems. In this paper, we evaluate an approach where basic acoustic features are extracted from speech samples, and the One-Against-All (OAA) Support Vector Machine (SVM) learning algorithm is used. We use a novel hybrid kernel, where we choose the optimal kernel functions for the individual OAA classifiers. Outputs from the OAA classifiers are normalized and combined using a thresholding fusion mechanism to finally classify the emotion. Samples with low 'relative confidence' are left as 'unclassified' to further improve the classification accuracy. Results show that the decision-level recall of our approach for six-class emotion classification is 80.5%, outperforming a state-of-the-art approach that uses the same dataset.
TJPRC, 2013
The man-machine relation has demanded the smart trends that machines have to react after considering the human emotion levels. The technology boost improved the machine intelligence to identify human emotions at expected level. Harnessing the approaches of speech processing and pattern recognition algorithms a smart and emotions oriented man-machine interaction can be achieved with the tremendous scope in the field of automated home as well as commercial applications. This paper deals with the aspects of pitch, Mel Frequency Cepstrum Coefficients based speech features and wavelet domain in speech emotion recognition. The impact of incorporating different classifier using Gaussian Mixture Model (GMM), K-Nearest Neighbour (K-NN) and Hidden Markov Model (HMM) on the recognition rate in the identification of six emotional categories namely happy, angry, neutral, surprised, fearful and sad from Berlin Emotional Speech Database (BES) is emphasized with intents to do a comparative performance analysis. In the experiments the speech features used are based on pitch, MFCCs and discrete wavelet domain ‘db1’ family based vectors. The features were same for all the three classifiers of GMM, K-NN and HMM in order to check their comparative performance based on the merits of ‘recognition accuracy’, ‘confusion matrix’, ‘precision rate’ and ‘F-measure’. The highest recognition accuracy for the GMM classifiers were 92% for ‘angry’ emotions, the K-NN classifiers gave 90% correct recognition for ‘happy’ class, while the highest recognition scores for the HMM classifier were 78% for ‘angry’ emotions. The confusion matrix statistics depicts the confusion in recognition between ‘happy-neutral’ emotions; however the three classifiers confused atleast once with the ‘angry’ emotion in detection of each of the remaining emotions. The results for precision rate and F-measure convey the superiority of GMM classifiers in emotion recognition system while the K-NN and HMM were average in overall performance.
Speech emotions recognition has become one of the active research areas in speech processing and human computer interaction based applications. An experimental study is conducted in this paper with two spectral features (Rasta-PLP and MFCC), four different emotions (Angry, Happy, Neutral and Sad) and five different learning classifiers (K-star, Logit boost, J48, LAD Tree and Random forest). The focus of this work is to evaluate the performance of learning classifiers in term of classification accuracy and analyzing the effect of emotion and spectral features (MFCC and Rasta PLP) on spoken utterances in Urdu language with four different emotions. Demonstrative experiments were conducted in to two phases. Experimental results of the first phase shows that the combination of spectral features provides significant classification accuracy of learning classifiers as compared to the both features are considered individually. Whereas, Two-way ANOVA testing in the second phase of experiments show that both the factors, emotions and features as well as their interaction have significant effect on speech as the P value is less than the confidence interval.
The article presents an analysis of the possibility of recognizing from speech signal in Polish language. In order to perform experiments a database containing speech recordings with emotional content was created. On its basis, extraction of features from the speech signals was performed. The most important step was to determine which of the previously extracted features were the most suitable to distinguish emotions and with what accuracy the emotions could be classified. Two feature selection methods-Sequential Forward Search (SFS) and t-statistics were examined. Emotion classification was implemented using k-Nearest Neighbor (k-NN), Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) classifiers. Classification was carried out for pairs of emotions. The best results were obtained for classifying neutral and fear (91.9%) and neutral and joy emotions (89.6%).
2019
Speech is the most natural form of communication between human beings. In the field of Human Computer Interaction(HCI) speech emotion recognition system is used. Researchers have been trying to develop a system more like human , emotion recognizing robots is an example of it. By using the speech features emotions are recognized from speech signals. Various speech features are MFCC(Mel Frequency Cepstral Coefficient), pitch, energy, intensity, speaking rate, voice quality, etc. Speech has many parameters which have great weightage in recognizing emotion namely prosodic and acoustic features. SVM(Support Vector Machine) is used as a classifier, which is used to classify the different emotions like happy, angry, sad, fear, neutral, etc. With the help of multiclass SVM, obtain more than two emotions.
International Journal for Research in Applied Science and Engineering Technology, 2019
In this paper methodology for human emotion recognizes by extracting the speech signal. This speaker-based emotion recognition system recognizes the four emotions namely happiness, sadness, fear and angry. Basically, aim of this system to recognize the emotions and estimate the various features namely formant frequency, energy, pitch and MFCC from speech signal. accuracy of emotion detection system using speech signal depends on types of feature used to extract unique characteristics in case of individual emotion recognition. emotions are showing effective changes in specific features and hence SVM (Support Vector Machine) classifier can show better performance. Individual emotion classification shows better performance in our work but in case of average it is considerably less i.e. 68%.
Pakistan journal of engineering technology & science, 2023
Recognition of emotion from speech is a challenging area of research. It is hard for the single classifier to provide better classification accuracy. For this reason, in recent years researchers have focused on ensemble classifier techniques to combine the best results of individual classifiers in an effective way to achieve overall higher classification performance. This paper presents a novel approach to combining classifiers outputs for audio emotion recognition. In this approach, the best results obtained for different emotion classes from various classifiers are combined to create a combined confusion matrix. It is because some classifiers with overall lower performance have better accuracy for a specific class as compared to others with overall higher accuracy. The performance of this approach was analyzed using three emotional speech databases in different languages, i.e., Berlin emotional speech database (EMO-DB), Italian emotional speech database (EMOVO-DB), and Surrey audio-visual expressed emotion database (SAVEE-DB). The openSMILE toolkit was used to extract a total of 8543 audio features. These features include pitch, energy, intensity, jitter, shimmer, formants, zero crossing rate (ZCR), Mel-frequency cepstral coefficients (MFCCs), Mel-frequency bands (MFBs), line spectral pairs (LSPs) and spectral features. These features were normalized using the min-max normalization technique, while correlation-based feature selection (CFS) with a best-first search approach was used for feature reduction. The classification was performed using five different base classifiers, i.e., support vector machine (SVM), multi-layer perceptron (MLP), instance-based learner (IBK), adaptive boosting (AdaBoost), and Random Forest. The experimental results showed better performance for the proposed technique as compared to other state-of-the-art methods. The classification accuracies obtained for the seven emotion classes were 91.8%, 83.7%, and 80.5% for the EMO-DB, EMOVO-DB, and SAVEE-DB, respectively.
This paper presents a machine learning approach to automatic recognition of human emotions from speech. The approach consists of three steps. First, numerical features are extracted from the sound database by using audio feature extractor. Then, feature selection method is used to select the most relevant features. Finally, a machine learning model is trained to recognize seven universal emotions: anger, fear, sadness, happiness, boredom, disgust and neutral. A thorough ML experimental analysis is performed for each step. The results showed that 300 (out of 1582) features, as ranked by the gain ratio, are sufficient for achieving 86% accuracy when evaluated with 10 fold cross-validation. SVM achieved the highest accuracy when compared to KNN and Naive Bayes. We additionally compared the accuracy of the standard SVM (with default parameters) and the one enhanced by Auto-WEKA (optimized algorithm parameters) using the leave-one-speaker-out technique. The results showed that the SVM enhanced with Auto-WEKA achieved significantly better accuracy than the standard SVM, i.e., 73% and 77% respectively. Finally, the results achieved with the 10 fold cross-validation are comparable and similar to the ones achieved by a human, i.e., 86% accuracy in both cases. Even more, low energy emotions (boredom, sadness and disgust) are better recognized by our machine learning approach compared to the human. Povzetek: Prepoznavanje čustev iz govora s pomočjo strojnega učeanja
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Perception in Multimodal Dialogue Systems, 2008
International Journal of Recent Technology and Engineering, 2020
International Journal of Electrical and Computer Engineering (IJECE), 2021
Indonesian Journal of Electrical Engineering and Computer Science, 2021
… , 2010 Proceedings of the …, 2010
International Journal of Computational Linguistics Research, 2019