0% found this document useful (0 votes)
57 views5 pages

Semester Project Report by Qaiser

The document discusses hate speech detection using natural language processing techniques. It outlines different features that can be extracted from text like simple surface features, word generalization, sentiment analysis, lexical resources, linguistic features, knowledge-based features, meta-information, and multimodal information to aid in detecting hate speech.

Uploaded by

xixa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views5 pages

Semester Project Report by Qaiser

The document discusses hate speech detection using natural language processing techniques. It outlines different features that can be extracted from text like simple surface features, word generalization, sentiment analysis, lexical resources, linguistic features, knowledge-based features, meta-information, and multimodal information to aid in detecting hate speech.

Uploaded by

xixa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Semester Project

Report
Hate Speech Detection by using
natural language processing
By
Qaiser Hassan and Rafiqa
Zainab
The idea of this HATE SPEECH DETECTOR is Extracted From The
Article “A Survey on Hate Speech Detection using Natural
Language Processing “ by Anna Schmidt (Spoken Language
Systems Saarland University D-66123 Saarbrucken, Germany)
and Michael Wiegand( Spoken Language Systems Saarland
University D-66123 Saarbrucken, Germany)

Introduction
Hate speech is commonly defined as any communication that disparages a person or a group on the
basis of some characteristic such as race, color, ethnicity, gender, sexual orientation, nationality,
religion, or other characteristic (Nockleby, 2000). Examples are (1)-(3).1 (1) Go fucking kill yourself and
die already useless ugly pile of shit scumbag. (2) The Jew Faggot behind the Financial Collapse (3) Hope
one of those bitches falls over and breaks her leg Due to the massive rise of user-generated web
content, in particular on social media networks, the amount of hate speech is also steadily increasing.
Over the past years, interest in online hate speech detection and particularly the automatization of this
task has continuously grown, along with the societal impact of the phenomenon. Natural language
processing focusing specifically on this phenomenon is required since basic word filters do not provide a
sufficient remedy: What is considered a hate speech message might be influenced by aspects such as
the domain of an utterance, its discourse context, as well as context consisting of co-occurring media
objects (e.g. images, videos, audio), the exact time of posting and world events at this moment, identity
of author and targeted recipient. This paper provides a short, comprehensive and structured overview
of automatic hate speech detection, and outlines the existing approaches in a systematic manner,
focusing on feature extraction in particular. It is mainly aimed at NLP researchers who are new to the
field of hate speech detection and want to inform themselves about the state of the art.

2 Terminology
In this paper authors use the term hate speech since it can be considered a broad umbrella term for
numerous kinds of insulting user-created content addressed in the individual works we summarize in
this paper. Hate speech is also the most frequently used expression for this phenomenon, and is even a
legal term in several countries

3 Features for Hate Speech Detection

3.1 Simple Surface Features


For any text classification task, the most obvious information to utilize are surface-level features, such as
bag of words. Indeed, unigrams and larger n-grams are included in the feature sets

3.2 Word Generalization


Since hate speech detection is usually applied on small pieces of text (e.g. passages or even individual
sentences), one may face a data sparsity problem. This is why several works address this issue by
applying some form of word generalization. This can be achieved by carrying out word clustering and
then using induced cluster IDs representing sets of words as additional (generalized) features. A
standard algorithm for this is Brown clustering.

3.3 Sentiment Analysis


Hate speech and sentiment analysis are closely related, and it is safe to assume that usually negative
sentiment pertains to a hate speech message. Because of this, several approaches acknowledge the
relatedness of hate speech and sentiment analysis by incorporating the latter as an auxiliary
classification.

3.4 Lexical Resources


Trying to make use of the general assumption that hateful messages contain specific negative words
(such as slurs, insults, etc.), many authors utilize the presence of such words as a feature. To obtain this
type of information lexical resources are required that contain such predictive expressions.

3.5 Linguistic Features


Linguistic aspects also play an important role for hate speech detection. Linguistic features are either
employed in a more generic fashion or are specifically tailored to the task.

3.6 Knowledge-Based Features


Hate speech detection is a task that cannot be solved by simply looking at keywords. Even if one tries to
model larger textual units, as researchers attempt to do by means of linguistic features , it remains
difficult to decide whether some utterance represents hate speech or not. For instance, (5) may not be
regarded as some form of hate speech when only read in isolation. (5) Put on a wig and lipstick and be
who you really are. However, when the context information is given that this utterance has been
directed towards a boy on a social media site for adolescents7 , one could infer that this is a remark to
malign the sexuality or gender identity of the boy being addressed

3.7 Meta-Information
Meta-information (i.e. information about an utterance) is also a valuable source to hate speech
detection. Since the text commonly used as data for this task almost exclusively comes from social
media platforms, a variety of such meta-information is usually offered and can be easily accessed via the
APIs those platforms provide.

3.8 Multimodal Information


Modern social media do not only consist of text but also include images, video and audio content. Such
non-textual content is also regularly commented on, and therefore becomes part of the discourse of a
hate speech utterance. This context outside a written user comment can be used as a predictive feature.

4 Anticipating Alarming Societal Changes


Apart from detecting individual, isolated hateful comments and classifying the types of users involved,
the overall proportion of extreme negative posts over a certain time-span also allows for interesting
avenues of research. Insights into changes in public or personal mood can be gained. Information on
notable increases in the number of hateful posts within a short time span might indicate suspicious
developments in a community. Such information could be utilized to circumvent incidents such as racial
violence, terrorist attacks, or other crimes before they happen, thus providing steps in the direction of
anticipatory governance.

We have Done the whole coding in Matlab here are some


screen shots of the main programming
This code is took from the “train file” that is use to train data to the system. It store the trained data as
input.

This is the part of above file.we train our data by using MFCC algorithm, that is use to audio record.
These codes contain the trained data.

var cv = require('opencv');

var color = [0, 255, 0];


var thickness = 2;
var cascadeFile = './my_cascade.xml';

var inputFiles = [
'./recognize_this_1.jpg', './recognize_this_2.jpg',
'./recognize_this_3.jpg',
'./recognize_this_3.jpg', './recognize_this_4.jpg', './recognize_this_5.jpg'
];

inputFiles.forEach(function(fileName) {
cv.readImage(fileName, function(err, im) {
im.detectObject(cascadeFile, {neighbors: 2, scale: 2}, function(err,
objects) {
console.log(objects);
for(var k = 0; k < objects.length; k++) {
var object = objects[k];
im.rectangle(
[object.x, object.y],
[object.x + object.width, object.y + object.height],
color,
2
);
}
im.save(fileName.replace(/\.jpg/, 'processed.jpg'));
});
});

You might also like