0% found this document useful (0 votes)

61 views5 pages

Detecting Hate and Offensive Speech Online

This document introduces the problem of identifying hate speech, offensive language, and profanity in social media posts. It defines key terms like hate speech, offensive language, and profanity. It then discusses the motivation for solving this problem, including the need for automated methods due to the limitations of manual moderation. The document outlines the proposed work, which includes extracting, cleaning, analyzing the sentiment of social media posts, labeling the data, using a machine learning classifier to classify posts into categories, and evaluating the results. It concludes by describing the structure of the report that will be produced to address this problem.

Uploaded by

NOOB GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views5 pages

Detecting Hate and Offensive Speech Online

Uploaded by

NOOB GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Chapter.

INTRODUCTION
In recent years ,we see exponential growth of social media such as
Face book, Twitter and You tube has revolutionized communication and content
publishing, but is also increasingly exploited for the propagation of Hate,
Offensive and Profane speeches. The anonymity and mobility afforded by such
media has made the breeding and spread of hate speeches eventually leading to
hate crime effortless in a virtual landscape beyond the realms of traditional law
enforcement.

The term ‘hate speeches’ was formally defined as ‘any

communication that disparages a person or a group on the basis of some
characteristics (to be referred to as types of hate or hate classes) such as race,
colour, ethnicity, gender, sexual orientation, nationality, religion, or other
characteristics’.

The term ‘Offensive language’ is a crime that is charged when

someone uses foul or offensive language. It is most commonly used either where a
person has verbally abused police, or along with other more serious charges. The
offence of Offensive Language is contained in section 4A of the Summary
Offenses Act 1988 which states: “A person must not use offensive language in or
near, or within hearing from, a public place or a school.”.

The term ‘Profanity’ is socially offensive language, which may also

be called ‘cursing’, or ‘swearing’ (British English), ‘cuss words’ (American
English vernacular), ‘swear words’, ‘bad words’, or ‘expletives’. Used in this
sense, profanity is language that is generally considered by certain parts of a
culture to be strongly impolite, rude, or offensive. It can show a debasement of
someone or something, or be considered as an expression of strong feeling towards
something.

1.1 MOTIVATION

Building effective counter measures for online Hate, Offensive and

Profane speeches requires as the first step, identifying and tracking Hate,
Offensive and Profane speeches online. For years, social media companies such as
Twitter, Facebook, and YouTube have been investing hundreds of millions of
Rupees every year on this task, but are still being criticised for not doing enough.
This is largely because such efforts are primarily based on manual moderation to
identify and delete offensive materials. The process is labour intensive, time
consuming, and not sustainable or scalable in reality.

A large number of research has been conducted in recent years to

develop automatic methods for Hate, Offensive and Profane speeches detection in
the social media domain. These typically employ semantic content analysis
techniques built on Natural Language Processing (NLP) and Machine Learning
(ML) methods, both of which are core pillars of the Semantic Web research. The
task typically involves classifying textual content into non-hate or hateful, in which
case it may also identify the types of the Hate, Offensive and Profane speeches.
Although current methods have reported promising results, we notice that their
evaluations are largely biased towards detecting content that is non-hate, as
opposed to detecting and classifying real hateful content. A limited number of
studies have shown that, for example, state of the art methods that detect sexism
messages can only obtain an F1 of between 15 and 60 percentage points lower than
detecting non-hate messages. These results suggest that it is much harder to detect
hateful content and their types than non-hate1. However, from a practical point of
view, we argue that the ability to correctly (Precision) and thoroughly (Recall)
detect and identify specific types of hate speeches is more desirable. For example,
social media companies need to flag up hateful content for moderation, while law
enforcement need to identify hateful messages and their nature as forensic
evidence.

We were concerned with the task of detecting; identifying and

analyzing the spread of Hate, Offensive and Profane speeches sentiments in the
social media.

Address concerns on children’s access to offensive content over

Internet, administrators of social media often manually review online contents to
detect and delete offensive materials. However, the manual review tasks of
identifying offensive contents are labor intensive, time consuming, and thus not
sustainable and scalable in reality. Some automatic content filtering software
packages, such as Appen and Internet Security Suite, have been developed to
detect and filter online offensive contents. Most of them simply blocked webpages
and paragraphs that contained dirty words. These word-based approaches not only
affect the readability and usability of web sites, but also fail to identify subtle
offensive messages. For example, under these conventional approaches, the
sentence “you are such a crying baby” will not be identified as offensive content,
because none of its words is included in general offensive lexicons. In addition, the
false positive rate of these word-based detection approaches is often high, due to
the word ambiguity problem, i.e., the same word can have very different meanings
in different contexts.

Pornographic language refers to the portrayal of explicit sexual

subject matter for the purposes of sexual arousal and erotic satisfaction. offensive
language includes any communication outside the law that disparages a person or a
group on the basis of some characteristics such as race, color, ethnicity, gender,
sexual orientation, nationality, and religion. All of these are generally immoral and
harmful for adolescents’ mental health.

1.2 STATEMENT OF THE PROBLEM

In HASOOC we break down given content into four
classes(HATE,OFFENSE,PROFANE and NONE) taking the type and target of
statements.

Offensive language identification: In this we are interested in the identification

of offensive posts and posts containing any form of (untargeted) profanity. In this
there are 4 categories in which the given statement could be classified.

Hate: if statement contain hate words which disparages a person or a group on the
basis of some characteristic such as race, colour, gender, nationality, religion, or
other characteristics.

Offensive: If statements contains offensive language or a targeted (veiled or direct)

offense. To sum up this category includes insults, threats.
Profane: if statement contains words which are strongly impolite, rude, offensive,
or be considered as an expression of strong feeling towards something.

1.3 FLOW OF WORK

Identification of statement into target classes is done by following

steps

1) Data Extraction: In this step we extracted data from datasets to data frames

2) Data Cleaning: In this step we take the required data in required format from
data frames by following processes

 Removing special symbols

 Removing stop words
 Convert acquired data into tokens

3) Sentiment Analysis: In this step we classify the data weather it is an positive

context statement or an negative context statement

4) Label Encoding: In this step we label the statements in given data so that we
labels which are only used in machine learning

5) Machine Learning Classifier: In this step we use an classifier to classify the

data into required target classes
 Splitting an data
 Training an Model
 Predicting target classes by model
6) Result Analysis: In this step we check the results acquired by predicting target
classes using machine learning classifier
 Building confusion matrix
 Precision and recall
1.4Organization of Report
This report structure as follows

Chapter2: Is about how we came to known about the flow of work by referring
different documents.

Chapter3: Is about how we are going to complete the problem.

Chapter4: Is about dataset given, language used and implementation of methods

to rectify the problem.

Chapter5:Is about Is Model can be accepted or rejected by considering different

parameters.

Chapter6: concludes this work and discusses future usages.

RP 5
No ratings yet
RP 5
7 pages
Detecting Offensive Language in English, Hindi, and Marathi Using Classical Supervised Machine Learning Methods and Word/Char N-Grams
No ratings yet
Detecting Offensive Language in English, Hindi, and Marathi Using Classical Supervised Machine Learning Methods and Word/Char N-Grams
7 pages
Detecting Offensive Language in Bengali, Bodo, and Assamese Using Word Unigrams, Char N-Grams, Classical Machine Learning, and Deep Learning Methods
No ratings yet
Detecting Offensive Language in Bengali, Bodo, and Assamese Using Word Unigrams, Char N-Grams, Classical Machine Learning, and Deep Learning Methods
9 pages
Combating Online Hate A Comparative Study On Identification of Hate Speech and Offensive Content in Social Media Text
No ratings yet
Combating Online Hate A Comparative Study On Identification of Hate Speech and Offensive Content in Social Media Text
6 pages
Aggression-Annotated Corpus of Hindi-English Code-Mixed Data
No ratings yet
Aggression-Annotated Corpus of Hindi-English Code-Mixed Data
7 pages
8 - Hateful Symbols or Hateful People Predictive Features For Hate Speech Detection On Twitter
No ratings yet
8 - Hateful Symbols or Hateful People Predictive Features For Hate Speech Detection On Twitter
6 pages
Final Report Edit
No ratings yet
Final Report Edit
26 pages
Automated Hate Speech Detection
No ratings yet
Automated Hate Speech Detection
4 pages
Util Not Included CorpusSpanishOffensiveLanguageResearch 2021.ranlp-1.123
No ratings yet
Util Not Included CorpusSpanishOffensiveLanguageResearch 2021.ranlp-1.123
13 pages
TMP 2001326023
No ratings yet
TMP 2001326023
22 pages
Automated Hate Speech Detection and The Problem of Offensive Language
No ratings yet
Automated Hate Speech Detection and The Problem of Offensive Language
4 pages
ICDIS-2019 Paper 181
No ratings yet
ICDIS-2019 Paper 181
9 pages
Multilingual Offensive Language Detection
No ratings yet
Multilingual Offensive Language Detection
10 pages
Offensive Comments in The Brazilian Web: A Dataset and Baseline Results
No ratings yet
Offensive Comments in The Brazilian Web: A Dataset and Baseline Results
10 pages
Seminar Research Format
No ratings yet
Seminar Research Format
14 pages
Detecting Offensive Hinglish Tweets
No ratings yet
Detecting Offensive Hinglish Tweets
9 pages
G28.docx 10 75
No ratings yet
G28.docx 10 75
66 pages
Offensive Social Network Posts Classification Using Apache Spark Platform
No ratings yet
Offensive Social Network Posts Classification Using Apache Spark Platform
7 pages
Hate Speech Chapter Final Preprint
No ratings yet
Hate Speech Chapter Final Preprint
27 pages
IDENTIFICATION OF HATE CONTENT On Social Media
No ratings yet
IDENTIFICATION OF HATE CONTENT On Social Media
7 pages
1 s2.0 S1877050924030230 Main
No ratings yet
1 s2.0 S1877050924030230 Main
12 pages
Hate Speech Detection and Racial Bias Mitigation in Social Media Based On BERT Model
No ratings yet
Hate Speech Detection and Racial Bias Mitigation in Social Media Based On BERT Model
26 pages
Insights of Learning Approach Towards Determination of Potentially Objectional Communication in Social Networking
No ratings yet
Insights of Learning Approach Towards Determination of Potentially Objectional Communication in Social Networking
8 pages
A Big-Data Processing and Visualization Platform
No ratings yet
A Big-Data Processing and Visualization Platform
21 pages
B.E Cse Batchno 168
No ratings yet
B.E Cse Batchno 168
42 pages
A Lexicon-Based Approach For Hate Speech Detection
No ratings yet
A Lexicon-Based Approach For Hate Speech Detection
17 pages
Comparison of Deep Learning and Ensemble Learning in Classification of Toxic Comments
No ratings yet
Comparison of Deep Learning and Ensemble Learning in Classification of Toxic Comments
6 pages
NLP Case Studynaman
No ratings yet
NLP Case Studynaman
23 pages
RP 3
No ratings yet
RP 3
4 pages
Semester Project Report by Qaiser
No ratings yet
Semester Project Report by Qaiser
5 pages
Detecting Hate Speech On The World Wide Web
No ratings yet
Detecting Hate Speech On The World Wide Web
8 pages
Multilingual and Multi-Aspect Hate Speech Analysis
No ratings yet
Multilingual and Multi-Aspect Hate Speech Analysis
10 pages
Journal Pone 0305657
No ratings yet
Journal Pone 0305657
24 pages
OPSD - Offensive Persian Social Media Dataset
No ratings yet
OPSD - Offensive Persian Social Media Dataset
16 pages
Hate Speech Detection Using Machine Learning2
No ratings yet
Hate Speech Detection Using Machine Learning2
4 pages
Multilingual Hate Speech Detection Survey
No ratings yet
Multilingual Hate Speech Detection Survey
19 pages
Filtering Offensive Language in Online Communities Using
No ratings yet
Filtering Offensive Language in Online Communities Using
10 pages
Twitter Hate Speech Detection Study
No ratings yet
Twitter Hate Speech Detection Study
11 pages
Multilingual Offensive Language Detection
No ratings yet
Multilingual Offensive Language Detection
20 pages
Hate Speech Detection in Hindi Language
No ratings yet
Hate Speech Detection in Hindi Language
8 pages
Digital Content Moderation Guide
No ratings yet
Digital Content Moderation Guide
10 pages
Online Social Networks and Media: Safa Alsafari, Samira Sadaoui, Malek Mouhoub
No ratings yet
Online Social Networks and Media: Safa Alsafari, Samira Sadaoui, Malek Mouhoub
15 pages
A Survey On Hate Speech Detection Using Natural Language Processing
No ratings yet
A Survey On Hate Speech Detection Using Natural Language Processing
10 pages
Arabic Offensive Language Dataset Insights
No ratings yet
Arabic Offensive Language Dataset Insights
83 pages
Hatespeechs 2
No ratings yet
Hatespeechs 2
8 pages
Hate Speech Detection: Challenges and Solutions: A1111111111 A1111111111 A1111111111 A1111111111 A1111111111
No ratings yet
Hate Speech Detection: Challenges and Solutions: A1111111111 A1111111111 A1111111111 A1111111111 A1111111111
16 pages
Chapter Two
No ratings yet
Chapter Two
13 pages
Navigating The Dark Web of Hate: Supervised Machine Learning Paradigm and NLP For Detecting Online Hate Speeches
No ratings yet
Navigating The Dark Web of Hate: Supervised Machine Learning Paradigm and NLP For Detecting Online Hate Speeches
8 pages
Social Media Offensive Behavior Detection
No ratings yet
Social Media Offensive Behavior Detection
7 pages
Investigating Deep Learning Approaches For Hate
No ratings yet
Investigating Deep Learning Approaches For Hate
12 pages
Overview of The HASOC Subtrack at FIRE 2021 Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages-T1-1
No ratings yet
Overview of The HASOC Subtrack at FIRE 2021 Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages-T1-1
19 pages
Multilingual Hate Speech Detection A Semi-Supervised Generative Adversarial Approach
No ratings yet
Multilingual Hate Speech Detection A Semi-Supervised Generative Adversarial Approach
19 pages
Gitari - A Lexicon-Based Approach For Hate Speech Detection
0% (1)
Gitari - A Lexicon-Based Approach For Hate Speech Detection
16 pages
Using Crowdsourcing To Improve Profanity Detection: Sara Owsley Sood Judd Antin, Elizabeth F Churchill
No ratings yet
Using Crowdsourcing To Improve Profanity Detection: Sara Owsley Sood Judd Antin, Elizabeth F Churchill
6 pages
Countering Hate Speech On Social Media
No ratings yet
Countering Hate Speech On Social Media
2 pages
2020 Lrec-1 838
No ratings yet
2020 Lrec-1 838
9 pages
Hate Speech Detection in Online Social Media
No ratings yet
Hate Speech Detection in Online Social Media
12 pages
Ijspt 14 683
No ratings yet
Ijspt 14 683
12 pages
D3 DFFD 01
No ratings yet
D3 DFFD 01
4 pages
Assessing The Role of Supplier Relationship Management in Enhancing Supply Chain Performance
No ratings yet
Assessing The Role of Supplier Relationship Management in Enhancing Supply Chain Performance
7 pages
RDL 2 Reviewer (Semester 1 - Midterms)
No ratings yet
RDL 2 Reviewer (Semester 1 - Midterms)
6 pages
Essential Guide to Engineering Logbooks
No ratings yet
Essential Guide to Engineering Logbooks
3 pages
Hindi - English Code - Switching
100% (1)
Hindi - English Code - Switching
4 pages
Evaluating The Parentification Questionnaire: Psychometric Properties and Psychopathology Correlates
No ratings yet
Evaluating The Parentification Questionnaire: Psychometric Properties and Psychopathology Correlates
18 pages
Call For Instructors S23 - BDA (REV 2023-02-01)
No ratings yet
Call For Instructors S23 - BDA (REV 2023-02-01)
12 pages
Craft Tourism in Gundiyali
No ratings yet
Craft Tourism in Gundiyali
95 pages
Chapter 123 Edit
No ratings yet
Chapter 123 Edit
14 pages
SHS Applied Subjects English and Research MELC
No ratings yet
SHS Applied Subjects English and Research MELC
11 pages
Key Teaching Competences of Epg
No ratings yet
Key Teaching Competences of Epg
5 pages
Services Marketing Project: ON Timezone
No ratings yet
Services Marketing Project: ON Timezone
9 pages
Decision-Making in Risk Management
No ratings yet
Decision-Making in Risk Management
19 pages
The Magazine For Spinning Mills
100% (1)
The Magazine For Spinning Mills
36 pages
Department of Education
No ratings yet
Department of Education
2 pages
Delhi University M.A. Applied Psychology Syllabus
No ratings yet
Delhi University M.A. Applied Psychology Syllabus
57 pages
How To Report Statistics in Medicine: Annotated Guidelines For Authors, Editors, and Reviewers
0% (2)
How To Report Statistics in Medicine: Annotated Guidelines For Authors, Editors, and Reviewers
3 pages
GREENHALGH Et Al (2016) Achieving Research Impact Through Co Creation in Community Based Health Services
No ratings yet
GREENHALGH Et Al (2016) Achieving Research Impact Through Co Creation in Community Based Health Services
38 pages
Paper Utpal de
No ratings yet
Paper Utpal de
17 pages
3.1 Introductory Statement:: Chapter 3: Research Methodology
No ratings yet
3.1 Introductory Statement:: Chapter 3: Research Methodology
12 pages
Benefits of Combining Neuromarketing and AI
No ratings yet
Benefits of Combining Neuromarketing and AI
10 pages
Sample of Short Research Proposal
100% (2)
Sample of Short Research Proposal
5 pages
ENTR 3P98 GROUP Assignment Guidelines Fall 2020
No ratings yet
ENTR 3P98 GROUP Assignment Guidelines Fall 2020
7 pages
VRTS114 - PPT X PRELIM EXAM
No ratings yet
VRTS114 - PPT X PRELIM EXAM
106 pages
Chapter I
No ratings yet
Chapter I
11 pages
Challenges and Opportunities in Indian Healthcare
No ratings yet
Challenges and Opportunities in Indian Healthcare
4 pages
Griha Manual Volume-1 (13-01-2011)
100% (2)
Griha Manual Volume-1 (13-01-2011)
124 pages
Introduction to Decision Science
No ratings yet
Introduction to Decision Science
136 pages
Cooperative-Developer Housing Partnership
No ratings yet
Cooperative-Developer Housing Partnership
125 pages

Detecting Hate and Offensive Speech Online

Uploaded by

Detecting Hate and Offensive Speech Online

Uploaded by

Chapter.

The term ‘hate speeches’ was formally defined as ‘any

The term ‘Offensive language’ is a crime that is charged when

The term ‘Profanity’ is socially offensive language, which may also

Building effective counter measures for online Hate, Offensive and

A large number of research has been conducted in recent years to

We were concerned with the task of detecting; identifying and

Address concerns on children’s access to offensive content over

Pornographic language refers to the portrayal of explicit sexual

1.2 STATEMENT OF THE PROBLEM

Offensive language identification: In this we are interested in the identification

Offensive: If statements contains offensive language or a targeted (veiled or direct)

1.3 FLOW OF WORK

Identification of statement into target classes is done by following

 Removing special symbols

3) Sentiment Analysis: In this step we classify the data weather it is an positive

5) Machine Learning Classifier: In this step we use an classifier to classify the

Chapter3: Is about how we are going to complete the problem.

Chapter4: Is about dataset given, language used and implementation of methods

Chapter5:Is about Is Model can be accepted or rejected by considering different

Chapter6: concludes this work and discusses future usages.

You might also like