Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
14 pages
1 file
This work proposes a novel system for Violent Scenes Detection, which is based on the combination of visual and audio features with machine learning at segment-level. Multiple Kernel Learning is applied so that multimodality of videos can be maximized. In particular, Mid-level Violence Clustering is proposed in order for mid-level concepts to be implicitly learned, without using manually tagged annotations. Finally a violence-score for each shot is calculated. The whole system is trained ona dataset from MediaEval 2013 Affect Task and evaluated by its official metric. The obtained results outperformed its best score.
2012
This paper presents the work done in Technicolor, INRIA and Imperial College London regarding the Affect Task at MediaEval 2012. This task aims at detecting violent shots in movies. Four different systems and a fusion of three of them are proposed in this paper.
2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), 2014
This paper addresses the issue of detecting violent scenes in Hollywood movies. In this context, we describe the MediaEval 2013 Violent Scene Detection task which proposes a consistent evaluation framework to the research community. 9 participating teams proposed systems for evaluation in 2013, which denotes an increasing interest for the task. In this paper, the 2013 dataset, the annotations process and the task's rules are detailed. The submitted systems are thoroughfully analysed and compared through several metrics to draw conclusions on the most promising techniques among which multimodal systems and mid-level concept detection. Some further late fusions of the systems are investigated and show promising performances.
PloS one, 2013
Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology ''out of the lab'' to real-world, diverse data. In this contribution, we address the problem of finding ''disturbing'' scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.
Data in Brief, 2020
2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012
This paper presents a violent shots detection system that studies several methods for introducing temporal and multimodal information in the framework. It also investigates different kinds of Bayesian network structure learning algorithms for modelling these problems. The system is trained and tested using the MediaEval 2011 Affect Task corpus, which comprises of 15 Hollywood movies. It is experimentally shown that both multimodality and temporality add interesting information into the system. Moreover, the analysis of the links between the variables of the resulting graphs yields important observations about the quality of the structure learning algorithms. Overall, our best system achieved 50% false alarms and 3% missed detection, which is among the best submissions in the MediaEval campaign.
2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images, 2010
In this paper we presented a violence detector built on the concept of visual codebooks using linear support vector machines. It differs from the existing works of violence detection in what concern the data representation, as none has considered local spatio-temporal features with bags of visual words. An evaluation of the importance of local spatio-temporal features for characterizing the multimedia content is conducted through the cross-validation method. The results obtained confirm that motion patterns are crucial to distinguish violence from regular activities in comparison with visual descriptors that rely solely on the space domain.
In this paper we approach the issue of violence detection in typical Hollywood productions. Given the high variability in appearance of violent scenes in movies, training a classifier to predict violent frames directly from visual or/and auditory features seems rather difficult. Instead, we propose a different perspective that relies on fusing mid-level concept predictions that are inferred from low-level features. This is achieved by employing a bank of multi-layer perceptron classifiers featuring a dropout training scheme. Experimental validation conducted in the context of the Violent Scenes Detection task of the MediaEval 2012 Multimedia Benchmark Evaluation show the potential of this approach that ranked first among 34 other submissions in terms of precision and F1-score.
2011
In this paper we present our research results towards the detection of violent scenes in movies, employing advanced fusion methodologies, based on learning, knowledge representation and reasoning. Towards this goal, a multi-step approach is followed: initially, automated audio and visual analysis is performed to extract audio and visual cues. Then, two different fusion approaches are deployed: (i) a multimodal one that provides binary decisions on the existence of violence or not, employing machine learning techniques, (ii) an ontological and reasoning one, that combines the audio-visual cues with violence and multimedia ontologies. The latter reasons out not only the existence of violence or not in a video scene, but also the type of violence (fight, screams, gunshots). Both approaches are experimentally tested, validated and compared for the binary decision problem of violence detection. Finally, results for the violence type identification are presented for the ontological fusion approach. For evaluation purposes, a large dataset of real movie data has been populated.
International Journal for Research in Applied Science and Engineering Technology IJRASET, 2020
Internet video, movies have grown quite speedy in latest years with the success of multimedia social community as well as low cost of clever devices, accounting for 90% of the Internet traffic. Videos with detrimental content, such as horror videos, violent videos, and other detrimental videos are flooding. However, the increasing use of the associated technology via sensitive social groups creates the want for protection from harmful content material to hold the Internet video ecosystem, especially as the number of younger Internet users is growing rapidly. As the violent scene detection in videos has realistic significance in a number of applications, such as sensible surveillance, video retrieval, Internet filtering, film rating, toddler protection towards violent behaviour and so on. This paper details the distinct methods and techniques that are being used for the task of audio-based classification and detection of violent scenes in videos. This paper also contains our proposed method for audio based violent scene detection using extreme learning machine algorithm.
2006
Abstract. This work studies the problem of violence detection in audio data, which can be used for automated content rating. We employ some popular frame-level audio features both from the time and frequency domain. Afterwards, several statistics of the calculated feature sequences are fed as input to a Support Vector Machine classifier, which decides about the segment content with respect to violence.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Information, 2020
Applied Sciences, 2019
Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2019
International Journal of Intelligent Engineering and Systems
Dhaka University Journal of Applied Science and Engineering
computers (MDPI), 2023
2019 22th International Conference on Information Fusion (FUSION)
Lecture Notes in Computer Science, 2019
Computer Analysis of Images and Patterns, 2011
Advances in Data and Information Sciences
Vietnam Journal of Computer Science
Advances in Intelligent Systems and Computing, 2020
Indian journal of computer science and engineering, 2021
Proceedings of the 9th International Conference on Computer Vision Theory and Applications, 2014