Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
4 pages
1 file
In this study, we investigate several methods on the Interspeech 2013 Paralinguistic Challenge -Social Signals Sub-Challenge dataset. The task of this sub-challenge is to detect laughter and fillers per frame. We apply Random Forests with varying number of trees and randomly selected features. We then proceed with minimum Redundancy Maximum Relevance (mRMR) ranking of features. We employ SVM with linear kernel to form a relative baseline for comparability to baseline provided in the challenge paper. The results indicate the relative superiority of Random Forests to SVMs in terms of subchallenge performance measure, namely UAAUC. We also observe that using mRMR based feature selection, it is possible to reduce the number of features to half with negligible loss of performance. Furthermore, the performance loss due to feature reduction is found to be less in Random Forests compared to SVMs. We also make use of neighboring frames to smooth the posteriors. On the overall, we attain an increase of 5.1% (absolute) in UAAUC in challenge test set.
2005
In the context of detecting 'paralinguistic events' with the aim to make classification of the speaker's emotional state possible, a detector was developed for one of the most obvious 'paralinguistic events', namely laughter. Gaussian Mixture Models were trained with Perceptual Linear Prediction features, pitch&energy, pitch&voicing and modulation spectrum features to model laughter and speech. Data from the ICSI Meeting Corpus and the Dutch CGN corpus were used for our classification experiments. The results showed that Gaussian Mixture Models trained with Perceptual Linear Prediction features performed best with Equal Error Rates ranging from 7.1%-20.0%.
Cognitive Systems Monographs, 2009
Speech Communication, 2007
Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker's state and emotion can be revealed. This paper describes the development of a gender-independent laugh detector with the aim to enable automatic emotion recognition. Different types of features (spectral, prosodic) for laughter detection were investigated using different classification techniques (Gaussian Mixture Models, Support Vector Machines, Multi Layer Perceptron) often used in language and speaker recognition. Classification experiments were carried out with short pre-segmented speech and laughter segments extracted from the ICSI Meeting Recorder Corpus (with a mean duration of approximately 2 s). Equal error rates of around 3% were obtained when tested on speaker-independent speech data. We found that a fusion between classifiers based on Gaussian Mixture Models and classifiers based on Support Vector Machines increases discriminative power. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between laughter and speech. Our acoustic measurements showed differences between laughter and speech in mean pitch and in the ratio of the durations of unvoiced to voiced portions, which indicate that these prosodic features are indeed useful for discrimination between laughter and speech.
Archives of Acoustics, 2016
Laughter is one of the most important paralinguistic events, and it has specific roles in human conversation. The automatic detection of laughter occurrences in human speech can aid automatic speech recognition systems as well as some paralinguistic tasks such as emotion detection. In this study we apply Deep Neural Networks (DNN) for laughter detection, as this technology is nowadays considered state-of-the-art in similar tasks like phoneme identification. We carry out our experiments using two corpora containing spontaneous speech in two languages (Hungarian and English). Also, as we find it reasonable that not all frequency regions are required for efficient laughter detection, we will perform feature selection to find the sufficient feature subset.
Acoustics Speech and Signal …, 2010
In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and visual-to-audio features mapping for both classes. Classification of a new frame is performed via prediction. All the networks produce a prediction of the expected audio/visual features and the network with the best prediction, ...
2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, 2009
Laughter detection is an important area of interest in the Affective Computing and Human-computer Interaction fields. In this paper, we propose a multi-modal methodology based on the fusion of audio and visual cues to deal with the laughter recognition problem in face-to-face conversations. The audio features are extracted from the spectogram and the video features are obtained estimating the mouth movement degree and using a smile and laughter classifier. Finally, the multi-modal cues are included in a sequential classifier. Results over videos from the public discussion blog of the New York Times show that both types of features perform better when considered together by the classifier. Moreover, the sequential methodology shows to significantly outperform the results obtained by an Adaboost classifier.
Laughter is clearly an audiovisual event, consisting of the laughter vocalization and of facial activity, mainly around the mouth and sometimes in the upper face. However, past research on laughter recognition has mainly focused on the information available in the audio channel only, mainly due to the lack of suitable audiovisual data. Only recently few works have been published which combine audio and visual information and most of them deal with the problem of discriminating laughter from speech or other nonlinguistic vocalisations using presegmented data. There are very few works on audiovisual laughter detection from unsegmented audiovisual streams and have either been tested on small datasets or use coarse visual features. As a consequence, results are mixed and it is not clear to what extent the addition of visual information to audio is beneficial for laughter detection. In this work, we attempt to overcome the limitation of previous studies and investigate the performance of audiovisual fusion for laughter detection using audiovisual continuous streams from the SEMAINE database. Our results suggest that there is indeed an improvement in laughter detection with the addition of visual information which is dependent on the performance of the voice activity detector.
IEEE Reviews in Biomedical Engineering, 2016
The study of human nonverbal social behaviors has taken a more quantitative and computational approach in recent years due to the development of smart interfaces and virtual agents or robots able to interact socially. One of the most interesting nonverbal social behaviors, producing a characteristic vocal signal, is laughing. Laughter is produced in several different situations: in response to external physical, cognitive, or emotional stimuli; to negotiate social interactions; and also, pathologically, as a consequence of neural damage. For this reason, laughter has attracted researchers from many disciplines. A consequence of this multidisciplinarity is the absence of a holistic vision of this complex behavior: the methods of analysis and classification of laughter, as well as the terminology used, are heterogeneous; the findings sometimes contradictory and poorly documented. This survey aims at collecting and presenting objective measurement methods and results from a variety of different studies in different fields, to contribute to build a unified model and taxonomy of laughter. This could be successfully used for advances in several fields, from artificial intelligence and human-robot interaction to medicine and psychiatry.
2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013
Within the EU ILHAIRE Project, researchers of several disciplines (e.g., computer sciences, psychology) collaborate to investigate the psychological foundations of laughter, and to bring this knowledge into shape for the use in new technologies (i.e., affective computing). Within this framework, in order to endow machines with laughter capabilities (encoding as well as decoding), one crucial task is an adequate description of laughter in terms of morphology. In this paper we present a work methodology towards automated full body laughter detection: starting from expert annotations of laughter videos we aim to identify the body features that characterize laughter.
2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013
Despite the importance of laughter in social interactions it remains little studied in affective computing. Respiratory, auditory, and facial laughter signals have been investigated but laughter-related body movements have received almost no attention. The aim of this study is twofold: first an investigation into observers' perception of laughter states (hilarious, social, awkward, fake, and non-laughter) based on body movements alone, through their categorization of avatars animated with natural and acted motion capture data. Significant differences in torso and limb movements were found between animations perceived as containing laughter and those perceived as nonlaughter. Hilarious laughter also differed from social laughter in the amount of bending of the spine, the amount of shoulder rotation and the amount of hand movement. The body movement features indicative of laughter differed between sitting and standing avatar postures. Based on the positive findings in this perceptual study, the second aim is to investigate the possibility of automatically predicting the distributions of observer's ratings for the laughter states. The findings show that the automated laughter recognition rates approach human rating levels, with the Random Forest method yielding the best performance.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Lecture Notes in Computer Science, 2008
Lecture Notes in Computer Science, 2018
ACM Transactions on Interactive Intelligent Systems, 2012
Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge
Computers & Electrical Engineering, 2017
Multimedia, IEEE Transactions on, 2011
Acoustics, Speech and Signal …, 2008
International Journal of Signal Processing Systems, 2013