Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2013
25 Abstract— This paper presents the results of investigations in speech emotion recognition in Hindi, using only the first four formants and their bandwidths. This research work was done on female speech data base of nearly 1600 utterances comprising neutral, happiness, surprise, anger, sadness, fear and disgust as the elicited emotions. The best of the statistically preprocessed formant and bandwidth features were first identified by the KMeans, K nearest Neighbour and Naive Bayes classification of individual features. This was followed by artificial neural network classification based on the combination of the best formants and bandwidths. The highest overall emotion recognition accuracy obtained by the ANN method was 97.14%, based on the first four values of formants and bandwidths. A striking increase in the recognition accuracy was observed when the number of emotion classes was reduced from seven. The obtained results presented in this paper, have not been reported so far for...
Journal of Interdisciplinary Mathematics, 2020
Now a days speech is fastest medium of giving instruction to machines to do any task. When a person uttered a word at that time machine can understand the semantic of the utterance but not the emotion related with that utterance. This study mainly focused on combining on different types of speech features together and then this paper used various statistical techniques for reducing the dimensionality of data and then applying machine learning algorithm to train dataset and predicting the emotional state of the person so that while receiving the instruction form humans, machines can provide better response for recognizing emotion.
Speech emotion recognition is one of the interested research topics in today's time. Presently there are many attempts have been made for emotions recognition. The speech features such as energy and formants are extracted from speech. Angry, stress, admiration, teasing and shocking, these emotional states have been recognized on the basis of speech feature using K-Nearest Neighbor (K-NN) as a classification technique.
Advances in Intelligent Systems and Computing, 2015
This paper explore on emotion recognition from Marathi speech signals by using feature extraction techniques and classifier to classify Marathi speech utterances according to their emotional contains. A different type of speech feature vectors contains different emotions, due to their corresponding natures. In this we have categorized the emotions as namely Anger, Happy, Sad, Fear, Neutral and Surprise. Mel Frequency Cepstral Coefficient (MFCC) feature parameters extracted from Marathi speech Signals depend on speaker, spoken word as well as emotion. Gaussian mixture Models (GMM) is used to develop Emotion classification model. In this, recently proposed feature extraction technique and classifier is used for Marathi spoken words. In this each subject/Speaker has spoken 7 Marathi words with 6 different emotions that 7 Marathi words are Aathawan, Aayusha, Chamakdar, Iishara, Manav, Namaskar, and Uupay. For experimental work we have created total 924 Marathi speech utterances database and from this we achieved the empirical performance of overall emotion recognition accuracy rate obtained using MFCC and GMM is 84.61% rate of our Emotion Recognition for Marathi Spoken Words (ERFMSW) system. We got average accuracy for male and female is 86.20% and 83.03% respectively.
2013
Emotion based speaker Identification System is the process of automatically identifying speaker’s emotion based on features extracted from speech waves. This paper presents experiment with the building and testing of a Speaker’s emotion identification for Hindi speech using Mel Frequency Cepestral Coefficients and Vector Quantization techniques. We collected voice samples of Hindi speech sentences in four basic emotions to study speaker’s emotion identification and it was found that with proposed emo-voice model we are able to achieve accuracy of 73% of speaker’s emotion identification in speech out of 93% of the total speech samples provided to the system.
Journal of theoretical and applied information technology, 2020
In past few eras, emotion recognition from speech is one of the hottest research topic in the field of Human Computer Interaction. Many researches are going on various types of language, but for Bengali language, it is still very novice. In this work, 4 emotional state have been recognized i.e. happy, sad, angry and neutral from Bengali Speech Dataset. Proposed approach uses Pitch and Mel-frequency Cepstral Coefficient (MFCC) feature vectors to train k-Nearest Neighbor classifier for this work. A self-built Bengali emotional speech dataset has been used for both training and testing purpose. The dataset consists of consist of 50 people with 400 isolated emotional sentences. Using this dataset and above technique, we achieved 87.50% average accuracy rate, with detection accuracy each emotion (happy, sad, angry, neutral) respectively 80.00%, 75.00%, 85.00% and 75.00% in this work.
This paper deals with a novel feature extraction method based on Linear Predictive Coefficients (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for the emotion recognition from speech. Classification and recognition of feature is done by using Artificial Neural Network. Malayalam (one of the south Indian languages) words were used for the experiment. One hundred and twenty samples are collected, categorized, labeled and stored in a database. By analyzing the results from the experiment the system can understand the different emotions. A recognition accuracy of 79% is achieved from the experiment.
IJRET, 2013
Speech processing is the study of speech signals, and the methods used to process them. In application such as speech coding, speech synthesis, speech recognition and speaker recognition technology, speech processing is employed. In speech classification, the computation of prosody effects from speech signals plays a major role. In emotional speech signals pitch and frequency is a most important parameters. Normally, the pitch value of sad and happy speech signals has a great difference and the frequency value of happy is higher than sad speech. But, in some cases the frequency of happy speech is nearly similar to sad speech or frequency of sad speech is similar to happy speech. In such situation, it is difficult to recognize the exact speech signal. To reduce such drawbacks, in this paper we propose a Telugu speech emotion classification system with three features like Energy Entropy, Short Time Energy, Zero Crossing Rate and K-NN classifier for the classification. Features are extracted from the speech signals and given to the K-NN. The implementation result shows the effectiveness of proposed speech emotion classification system in classifying the Telugu speech signals based on their prosody effects. The performance of the proposed speech emotion classification system is evaluated by conducting cross validation on the Telugu speech database
In the present work, two new features, based on normal and Teager-Energy operated Wavelet Packet Cepstral Coefficients (WPCC2 and tfWPCC2) computed by method 2, have been proposed and their performances has been compared with the existing features based on normal and Teager-Energy operated Wavelet Packet Cepstral Coefficients (WPCC and tfWPCC) computed by method 1, Mel Frequency Cepstral Coefficients (MFCC) and Log Frequency Power Coefficients (LFPC) for emotion recognition from speeches of five native languages of Assam (Assamese, Bodo, Dimasa, Karbi and Mishing). The data consisted of 20 short emotionally biased sentences of each emotion (full-blown) spoken by 20 speakers recorded in a small quiet room. A total of seven GMMs are trained, one for each emotion. The feature set based on tfWPCC2 has exhibited good performance, in terms of both computational efficiency and classification accuracy.
2016
Emotion recognition from Audio signal Recognition is a recent research topic in the Human Computer Interaction. The demand has risen for increasing communication interface between humans and digital media. Many researchers are working in order to improve their accuracy. But still there is lack of complete system which can recognize emotions from speech. In order to make the human and digital machine interaction more natural, the computer should able to recognize emotional states in the same way as human. The efficiency of emotion recognition system depends on type of features extracted and classifier used for detection of emotions. There are some fundamental emotions such as: Happy, Angry, Sad, Depressed, Bored, Anxiety, Fear and Nervous. These signals were preprocessed and analyzed using various techniques. In feature extraction various parameters used to form a feature vector are: fundamental frequency, pitch contour, formants, duration (pause length ratio) etc. These features are...
There are a variety of temporal and spectral features that can be extracted from human speech. These features are related to the pitch, Mel Frequency Cepstral Coefficients (MFCCs) and Formants of speech, can be classified using various algorithms. This study explores statistical features i.e. MFCCs and these features were classified with the help of Linear Discriminant Analaysis (LDA). This article also describes a database of artificial emotional Marathi speech. The data samples were collected from 5 Marathi movies (Actors and Actress) simulated the emotions producing the Marathi utterances which could be used in everyday communication and are interpretable in all applied emotions. The speech samples were distinguished by the various situations from the movie. The data samples were categorized in 5 basic categories that are Happy, Sad, Anger, Afraid and Surprise.
This paper deals with a novel approach towards Automatic Emotion Classification from human utterances. Discrete Wavelet Transform (DWT) is used for feature extraction from speech signals. Malayalam (One of the south Indian languages) is used for the experiment. We have used an elicited dataset of 500 utterances recorded from 10 male and 8 female speakers. Using Artificial Neural Network we have classified the four emotional classes such as neutral, happy, sad and anger correctly. A classification accuracy of 70% is obtained from this work
International Journal of Computer Applications, 2018
A modern development in technology is Speech Emotion Recognition (SER). SER in partnership with Humane-Machine interaction (HMI) has advanced machine intelligence. An emotion precise HMI is designed by integrating speech processing and machine learning algorithm which is sculpted to formulate an automated smart and secure application for detecting emotions in a household as well as in commercial application. This project presents a study of distinguishing emotions by acoustic speech recognition (ASR) using K-means nearest neighbor (K-NN), a machine learning (ML) technique. The most significant paralinguistic information obtained from spectral features is presented by ASR i.e. by using Mel frequency cepstrum coefficient (MFCC). The most important processing techniques methods include feature extraction, feature selection, and classification of emotions. A customized dataset consisting of speech corpus, simulated emotion samples in the Sanskrit language is used to classify emotions in different emotional classes i.e. happy, sad, excitement, fear, anger and disgust. The emotions are classified using a K-NN algorithm over 2 separate models, based on the soft and high pitch voice. Model 1 and 2 achieved about 72.95% and 76.96% recognition
International Journal of Electrical and Computer Engineering (IJECE), 2020
In the last couple of years emotion recognition has proven its significance in the area of artificial intelligence and man machine communication. Emotion recognition can be done using speech and image (facial expression), this paper deals with SER (speech emotion recognition) only. For emotion recognition emotional speech database is essential. In this paper we have proposed emotional database which is developed in Gujarati language, one of the official's language of India. The proposed speech corpus bifurcate six emotional states as: sadness, surprise, anger, disgust, fear, happiness. To observe effect of different emotions, analysis of proposed Gujarati speech database is carried out using efficient speech parameters like pitch, energy and MFCC using MATLAB Software.
International Journal of Computer Applications, 2015
Recognizing emotions from speech is a tuff task as we are not aware of the features which will accurately classify the emotions. This paper is an approach to show which speech feature classifies the emotions more accurately. The features compared here are Pitch and Formant while the classifier used is Linear Discriminant Analysis (LDA). The database used in this experiment was developed using 50 male and 50 female Marathi speaking native speakers. The emotions used here are Neutral, Happy, Sad, Surprise and Boredom. At the end of the experiment it was observed that formant recognized the emotions very efficiently and accurately with respect to that of energy.
National Academy Science Letters, 2020
Speech emotion-related analysis needs good emotion corpus. The construction and evaluation of two Tamil emotion corpora, one for children and the other for the adults, are described here. Children emotion speech samples are collected from 30 Tamil movies, and the length of the utterances varies from 5 to 40 s. Tamil audio plays are the resources for building the adult emotion corpus. The emotion prosodies are collected, segmented and annotated for the categories of anger, happy, sad and neutral emotions. Observers' perception test result has been used to evaluate the annotation of emotion. Automatic emotion classification systems have been built using Gaussian Mixture Model and Support Vector Machine. The database is created with an objective of the acoustic investigation of emotion expression in Tamil, analyzing the influence of speaker's culture and age on emotion expression and investigating the requirement of the need of features unique to Tamil speech on various automatic analysis of speech like emotion recognition, speaker recognition, etc.
International Journal of Information Technology, 2018
This paper presents a study related to perceptual evaluation of emotions expressed in Hindi speech and their acoustic prosodic correlates. Six emotions i.e. Neutral, Happiness, Sad, Fear, Anger and Surprise were selected for the present study. For this purpose, a database of fifteen continuous sentences and isolated words i.e. Hindi digits 'शू न् य, एक, दो, तीन चार, पाँ च, छः, सात, आठ, नौ' spoken by five males and five females repeated three times by each speaker, was created. Understanding the difference between acoustic features for different emotions is important for computer recognition and their classification. For the analysis of acoustic features which changes correspondingly with emotions were analyzed using PRATT speech processing software tool. Human perception experiment shows that the overall recognition of the emotions is about 60% for continuous sentences and 53% for isolated words. It is found that anger has highest intensity followed by neutral, happiness, surprise, sad and fear. Some differences were observed in case of continuous sentences. The dynamic changes in the pitch and intensity in these utterances have been analyzed.
This paper gives a comparison of two extracted features namely pitch and formants for emotion recognition from speech. The research shows that various features namely prosodic and spectral have been used for emotion recognition from speech. The database used for recognition purpose was developed on Marathi language using 100 speakers. We have extracted features pitch and formants. Angry, stress, admiration, teasing and shocking have been recognized on the basis of features energy and formants. The classification technique used here is K-Nearest Neighbor (KNN). The result for formants was about 100% which is comparatively better than that of energy which was 80% of accuracy.
International Journal of Speech Technology, 2012
Emotion recognition from speech has emerged as an important research area in the recent past. In this regard, review of existing work on emotional speech processing is useful for carrying out further research. In this paper, the recent literature on speech emotion recognition has been presented considering the issues related to emotional speech corpora, different types of speech features and models used for recognition of emotions from speech. Thirty two representative speech databases are reviewed in this work from point of view of their language, number of speakers, number of emotions, and purpose of collection. The issues related to emotional speech databases used in emotional speech recognition are also briefly discussed. Literature on different features used in the task of emotion recognition from speech is presented. The importance of choosing different classification models has been discussed along with the review. The important issues to be considered for further emotion recognition research in general and in specific to the Indian context have been highlighted where ever necessary.
2010
In this paper we present a comparative analysis of four classifiers for speech signal emotion recognition. Recognition was performed on emotional Berlin Database. This work focuses on speaker and utterance (phrase) dependent and independent framework. One hundred thirty three (133) sound/speech features have been extracted from Pitch, Mel Frequency Cepstral Coefficients, Energy and Formants. These features have been evaluated in order to create a set of 26 features, sufficient to discriminate between seven emotions in acted speech. Multilayer Percepton, Random Forest, Probabilistic Neural Networks and Support Vector Machine were used for the Emotion Classification at seven classes namely anger, happiness, anxiety/fear, sadness, boredom, disgust and neutral. In the speaker dependent framework, Probabilistic Neural Network reaches very high accuracy(94%), while in the speaker independent framework the classification rate of the Support Vector Machine reaches 80%. The results of numerical experiments are given and discussed in the paper.
2014
Speech emotion recognition (SER) has an increasingly significant role in the interactions among human beings, as well as between human beings and computers. Emotions are an inherent part of even rational decision making. The correct recognition of the emotional content of an utterance assumes the same level of significance as the proper understanding of the semantic content and is an essential element of professional success. Prevalent speech emotion recognition methods generally use a large number of features and considerable signal processing effort. On the other hand, this work presents an approach for SER using minimal features extracted from appropriate, sociolinguistically designed and developed emotional speech databases. Whereas most of the reported works in SER are based on acted speech with its exaggerated display of emotions, this work focuses on elicited emotional speech in which emotions are induced. Since female speech is more expressive of emotions, this research inve...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.