Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
6 pages
1 file
This paper proposes the use of a minimum number of formant and bandwidth features for efficient classification of the neutral and six basic emotions in two languages. Such a minimal feature set facilitates fast and real time recognition of emotions which is the ultimate goal of any speech emotion recognition system. The investigations were done on emotional speech databases developed by the authors in English as well as Malayalam-a popular Indian language. For each language, the best features were identified by the KMeans, K-nearest neighbor and Naive Bayes classification of individual formants and bandwidths, followed by the artificial neural networks classification of the combination of the best formants and bandwidths. Whereas an overall emotion recognition accuracy of 85.28 % was obtained for Malayalam, based on the values of the first four formants and bandwidths, the recognition accuracy obtained for English was 86.15%, based on a feature set of the four formants and the first and fourth bandwidths, both of which are unprecedented. These results were obtained for elicited emotional speech of females and with statistically preprocessed formants and bandwidth values. Reduction in the number of emotion classes resulted in a striking increase in the recognition accuracy.
2013
25 Abstract— This paper presents the results of investigations in speech emotion recognition in Hindi, using only the first four formants and their bandwidths. This research work was done on female speech data base of nearly 1600 utterances comprising neutral, happiness, surprise, anger, sadness, fear and disgust as the elicited emotions. The best of the statistically preprocessed formant and bandwidth features were first identified by the KMeans, K nearest Neighbour and Naive Bayes classification of individual features. This was followed by artificial neural network classification based on the combination of the best formants and bandwidths. The highest overall emotion recognition accuracy obtained by the ANN method was 97.14%, based on the first four values of formants and bandwidths. A striking increase in the recognition accuracy was observed when the number of emotion classes was reduced from seven. The obtained results presented in this paper, have not been reported so far for...
This paper deals with a novel approach towards Automatic Emotion Classification from human utterances. Discrete Wavelet Transform (DWT) is used for feature extraction from speech signals. Malayalam (One of the south Indian languages) is used for the experiment. We have used an elicited dataset of 500 utterances recorded from 10 male and 8 female speakers. Using Artificial Neural Network we have classified the four emotional classes such as neutral, happy, sad and anger correctly. A classification accuracy of 70% is obtained from this work
2010
In this paper we present a comparative analysis of four classifiers for speech signal emotion recognition. Recognition was performed on emotional Berlin Database. This work focuses on speaker and utterance (phrase) dependent and independent framework. One hundred thirty three (133) sound/speech features have been extracted from Pitch, Mel Frequency Cepstral Coefficients, Energy and Formants. These features have been evaluated in order to create a set of 26 features, sufficient to discriminate between seven emotions in acted speech. Multilayer Percepton, Random Forest, Probabilistic Neural Networks and Support Vector Machine were used for the Emotion Classification at seven classes namely anger, happiness, anxiety/fear, sadness, boredom, disgust and neutral. In the speaker dependent framework, Probabilistic Neural Network reaches very high accuracy(94%), while in the speaker independent framework the classification rate of the Support Vector Machine reaches 80%. The results of numerical experiments are given and discussed in the paper.
Journal of theoretical and applied information technology, 2020
In past few eras, emotion recognition from speech is one of the hottest research topic in the field of Human Computer Interaction. Many researches are going on various types of language, but for Bengali language, it is still very novice. In this work, 4 emotional state have been recognized i.e. happy, sad, angry and neutral from Bengali Speech Dataset. Proposed approach uses Pitch and Mel-frequency Cepstral Coefficient (MFCC) feature vectors to train k-Nearest Neighbor classifier for this work. A self-built Bengali emotional speech dataset has been used for both training and testing purpose. The dataset consists of consist of 50 people with 400 isolated emotional sentences. Using this dataset and above technique, we achieved 87.50% average accuracy rate, with detection accuracy each emotion (happy, sad, angry, neutral) respectively 80.00%, 75.00%, 85.00% and 75.00% in this work.
International Journal of Speech Technology, 2015
Emotions are broad aspects and expressed in a similar way by every human being; however, these are affected by culture. This creates a major threat to the universality of speech emotion detection system. Cultural behaviour of society affects the way emotions are expressed and perceived. Hence, an emotion recognition system customized for languages within a cultural group is feasible. In this work, a speaker dependent and speaker independent emotion recognition system has been proposed for two different dialects of Odisha: Sambalpuri and Cuttacki. Spectral speech features, such as, log power, Melfrequency cepstral coefficients (MFCC), Delta MFCC, Double delta MFCC, log frequency power coefficients, and linear predictive cepstral coefficients, are used with Hidden Markov model and support vector machines (SVM) classifier, for classifying a speech into one of the seven discrete emotion classes: anger, happiness, disgust, fear, sadness, surprise, and neutral. For a better comparative study of system's accuracy, features are taken individually as well as in combinations by varying sampling frequency, frame length and frame overlapping. Best average recognition accuracy obtained for speaker independent system, is 82.14 % for SVM classifier using only MFCC as feature vector. However, for speaker dependent system a hike in accuracy of more than 10 % is seen. It is also revealed that use of MFCC on SVM classifier, not only gives the best overall performance on 8 kHz sampling frequency, but also shows consistent performance for all the emotion classes, compared to other classifiers and feature combinations with less computational complexity. Hence, it can be applied efficiently in call centre application for emotion recognition over telephone.
2018 1st Annual International Conference on Information and Sciences (AiCIS), 2018
Recognizing speech emotions is an important subject in pattern recognition. This work is about studying the effect of extracting the minimum possible number of features on the speech emotion recognition (SER) system. In this paper, three experiments performed to reach the best way that gives good accuracy. The first one extracting only three features: zero crossing rate (ZCR), mean, and standard deviation (SD) from emotional speech samples, the second one extracting only the first 12 Mel frequency cepstral coefficient (MFCC) features, and the last experiment applying feature fusion between the mentioned features. In all experiments, the features are classified using five types of classification techniques, which are the Random Forest (RF), k-Nearest Neighbor (k-NN), Sequential Minimal Optimization (SMO), Naïve Bayes (NB), and Decision Tree (DT). The performance of the system validated over Surrey Audio-Visual Expressed Emotion (SAVEE) dataset for seven emotions. The results of the experiments showed given good accuracy compared with the previous studies using a fusion of a few numbers of features with the RF classifier.
2014
Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into four emotional states such as anger, sadness, neutral, and happiness. The speech samples are from Berlin emotional database and the features extracted from these utterances are energy, pitch, ZCC, entropy, Mel Frequency cepstrum coefficients (MFCC). The K Nearest Neighbor (KNN) is used as a classifier to classify different emotional states. The system gives 86.02% classification accuracy for using energy, entropy, MFCC, ZCC, pitch Features. KeywordsSpeech Emotion; Automatic Emotion Recognition; KNN; Energy; Pitch; MFCC; ZCC.
Journal of Interdisciplinary Mathematics, 2020
Now a days speech is fastest medium of giving instruction to machines to do any task. When a person uttered a word at that time machine can understand the semantic of the utterance but not the emotion related with that utterance. This study mainly focused on combining on different types of speech features together and then this paper used various statistical techniques for reducing the dimensionality of data and then applying machine learning algorithm to train dataset and predicting the emotional state of the person so that while receiving the instruction form humans, machines can provide better response for recognizing emotion.
IJRET, 2013
Speech processing is the study of speech signals, and the methods used to process them. In application such as speech coding, speech synthesis, speech recognition and speaker recognition technology, speech processing is employed. In speech classification, the computation of prosody effects from speech signals plays a major role. In emotional speech signals pitch and frequency is a most important parameters. Normally, the pitch value of sad and happy speech signals has a great difference and the frequency value of happy is higher than sad speech. But, in some cases the frequency of happy speech is nearly similar to sad speech or frequency of sad speech is similar to happy speech. In such situation, it is difficult to recognize the exact speech signal. To reduce such drawbacks, in this paper we propose a Telugu speech emotion classification system with three features like Energy Entropy, Short Time Energy, Zero Crossing Rate and K-NN classifier for the classification. Features are extracted from the speech signals and given to the K-NN. The implementation result shows the effectiveness of proposed speech emotion classification system in classifying the Telugu speech signals based on their prosody effects. The performance of the proposed speech emotion classification system is evaluated by conducting cross validation on the Telugu speech database
Communications in Computer and Information Science, 2009
In this paper, we are introducing the speech database for analyzing the emotions present in speech signals. The proposed database is recorded in Telugu language using the professional artists from All India Radio (AIR), Vijayawada, India. The speech corpus is collected by simulating eight different emotions using the neutral (emotion free) statements. The database is named as Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC). The proposed database will be useful for characterizing the emotions present in speech. Further, the emotion specific knowledge present in speech at different levels can be acquired by developing the emotion specific models using the features from vocal tract system, excitation source and prosody. This paper describes the design, acquisition, post processing and evaluation of the proposed speech database (IITKGP-SESC). The quality of the emotions present in the database is evaluated using subjective listening tests. Finally, statistical models are developed using prosodic features, and the discrimination of the emotions is carried out by performing the classification of emotions using the developed statistical models.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
National Academy Science Letters, 2020
International Journal of Electrical and Computer Engineering (IJECE), 2020
International Journal of Speech Technology, 2020
International Journal Of …, 2010
International Journal of Computer Applications, 2018
International Journal of Computer Applications, 2015