0% found this document useful (0 votes)

22 views1 page

Fake Job Detection

The document discusses the importing of various Python libraries and modules for data analysis and natural language processing. It loads a CSV dataset on fake job postings and displays the header of the data. Finally, it imports common libraries for text preprocessing, modeling, visualization, and NLP including NumPy, Pandas, Matplotlib, Seaborn, scikit-learn, wordcloud, and spacy.

Uploaded by

Shruti Saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views1 page

Fake Job Detection

Uploaded by

Shruti Saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

In [1]:

!pip install wordcloud

Requirement already satisfied: wordcloud in c:\users\hp\anaconda3\lib\site-packages (1.8.1)

Requirement already satisfied: pillow in c:\users\hp\anaconda3\lib\site-packages (from wordcloud) (8.2.0)
Requirement already satisfied: matplotlib in c:\users\hp\anaconda3\lib\site-packages (from wordcloud) (3.3.4)
Requirement already satisfied: numpy>=1.6.1 in c:\users\hp\anaconda3\lib\site-packages (from wordcloud) (1.20.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\users\hp\anaconda3\lib\site-packages (from matplotlib->wordcloud) (2.4.7)
Requirement already satisfied: cycler>=0.10 in c:\users\hp\anaconda3\lib\site-packages (from matplotlib->wordcloud) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in c:\users\hp\anaconda3\lib\site-packages (from matplotlib->wordcloud) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\hp\anaconda3\lib\site-packages (from matplotlib->wordcloud) (1.3.1)
Requirement already satisfied: six in c:\users\hp\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib->wordcloud) (1.15.0)

In [2]:
!pip install -U spacy

Requirement already satisfied: spacy in c:\users\hp\anaconda3\lib\site-packages (3.2.4)

Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.8 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (3.0.9)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (0.7.7)
Requirement already satisfied: packaging>=20.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (20.9)
Requirement already satisfied: numpy>=1.15.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (1.20.1)
Requirement already satisfied: pathy>=0.3.5 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (0.6.1)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (2.25.1)
Requirement already satisfied: wasabi<1.1.0,>=0.8.1 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (0.9.1)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (2.0.6)
Requirement already satisfied: jinja2 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (2.11.3)
Requirement already satisfied: srsly<3.0.0,>=2.4.1 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (2.4.2)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (3.3.0)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (1.0.6)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (1.0.2)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (3.0.6)
Requirement already satisfied: thinc<8.1.0,>=8.0.12 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (8.0.15)
Requirement already satisfied: typer<0.5.0,>=0.3.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (0.4.1)
Requirement already satisfied: click<8.1.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (7.1.2)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (2.0.7)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (1.8.2)
Requirement already satisfied: setuptools in c:\users\hp\anaconda3\lib\site-packages (from spacy) (52.0.0.post20210125)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (4.59.0)
Requirement already satisfied: pyparsing>=2.0.2 in c:\users\hp\anaconda3\lib\site-packages (from packaging>=20.0->spacy) (2.4.7)
Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in c:\users\hp\anaconda3\lib\site-packages (from pathy>=0.3.5->spacy) (5.2.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\users\hp\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4->spacy) (3.7.4.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.26.4)
Requirement already satisfied: idna<3,>=2.5 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.10)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2020.12.5)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\hp\anaconda3\lib\site-packages (from jinja2->spacy) (1.1.1)

In [3]:
import re
import string
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.base import TransformerMixin
from sklearn.metrics import accuracy_score, plot_confusion_matrix, classification_report, confusion_matrix
from wordcloud import WordCloud
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from spacy.lang.en import English

In [4]:
df=pd.read_csv('fake_job_postings.csv')

In [5]:
df.head()

Out[5]: job_id title location department salary_range company_profile description requirements benefits telecommuting has_company_logo has_questions employment_type required_experien

We're Food52,
Food52, a fast- Experience with
Marketing US, NY, and we've
0 1 Marketing NaN growing, James content management NaN 0 1 0 Other Internsh
Intern New York created a
Beard Award-winn... systems a m...
groundbreaki...

What you
Customer 90 Seconds, the will get
Organised - Focused What we expect from
Service - Cloud NZ, , worlds Cloud from
1 2 Success NaN - Vibrant - you:Your key 0 1 0 Full-time Not Applicab
Video Auckland Video Production usThrough
Awesome!Do you... responsibilit...
Production ... being part
of...

Commissioning Valor Services

Our client, located in Implement pre-
Machinery US, IA, provides
2 3 NaN NaN Houston, is actively commissioning and NaN 0 1 0 NaN Na
Assistant Wever Workforce
se... commissioning ...
(CMA) Solutions th...

Our
culture is
Account
Our passion for THE COMPANY: ESRI EDUCATION: Bachelor’s anything
Executive - US, DC,
3 4 Sales NaN improving quality – Environmental or Master’s in GIS, but 0 1 0 Full-time Mid-Senior lev
Washington Washington
of life thro... Systems Rese... busi... corporate
DC
—we have
...

SpotSource
JOB TITLE: QUALIFICATIONS:RN Full
Bill Review US, FL, Fort Solutions LLC is a
4 5 NaN NaN Itemization Review license in the State of Benefits 0 1 1 Full-time Mid-Senior lev
Manager Worth Global Human
ManagerLOCATION:... Texa... Offered
Cap...

In [6]:
df.shape

Out[6]: (17880, 18)

In [7]:
df.isnull().sum()

Out[7]: job_id 0
title 0
location 346
department 11547
salary_range 15012
company_profile 3308
description 1
requirements 2695
benefits 7210
telecommuting 0
has_company_logo 0
has_questions 0
employment_type 3471
required_experience 7050
required_education 8105
industry 4903
function 6455
fraudulent 0
dtype: int64

In [8]:
columns = ['job_id', 'telecommuting', 'has_company_logo', 'has_questions', 'salary_range', 'employment_type']
for colu in columns:
del df[colu]

In [9]:
df.head()

Out[9]: title location department company_profile description requirements benefits required_experience required_education industry function fraudulent

We're Food52, and

US, NY, New Food52, a fast-growing, Experience with content
0 Marketing Intern Marketing we've created a NaN Internship NaN NaN Marketing 0
York James Beard Award-winn... management systems a m...
groundbreaki...

Customer Service - 90 Seconds, the worlds Organised - Focused - What you will get Marketing
NZ, , What we expect from Customer
1 Cloud Video Success Cloud Video Vibrant - Awesome!Do from usThrough Not Applicable NaN and 0
Auckland you:Your key responsibilit... Service
Production Production ... you... being part of... Advertising

Commissioning Valor Services provides Implement pre-

Our client, located in
2 Machinery Assistant US, IA, Wever NaN Workforce Solutions commissioning and NaN NaN NaN NaN NaN 0
Houston, is actively se...
(CMA) th... commissioning ...

Our culture is
Our passion for THE COMPANY: ESRI –
Account Executive - US, DC, EDUCATION: Bachelor’s or anything but Computer
3 Sales improving quality of Environmental Systems Mid-Senior level Bachelor's Degree Sales 0
Washington DC Washington Master’s in GIS, busi... corporate—we Software
life thro... Rese...
have ...

SpotSource Solutions JOB TITLE: Itemization

US, FL, Fort QUALIFICATIONS:RN license Full Benefits Hospital & Health Care
4 Bill Review Manager NaN LLC is a Global Human Review Mid-Senior level Bachelor's Degree 0
Worth in the State of Texa... Offered Health Care Provider
Cap... ManagerLOCATION:...

In [10]:
df.fillna('', inplace=True)

In [11]:
plt.figure(figsize=(15,5))
sns.countplot(y='fraudulent', data=df)
plt.show()

In [12]:
df.groupby('fraudulent')['fraudulent'].count()

Out[12]: fraudulent
0 17014
1 866
Name: fraudulent, dtype: int64

In [13]:
exp = dict(df.required_experience.value_counts())
del exp['']

In [14]:
exp

Out[14]: {'Mid-Senior level': 3809,

'Entry level': 2697,
'Associate': 2297,
'Not Applicable': 1116,
'Director': 389,
'Internship': 381,
'Executive': 141}

In [15]:
plt.figure(figsize=(10,5))
sns.set_theme(style='whitegrid')
plt.bar(exp.keys(), exp.values())
plt.title('No. of jobs with Experience', size=20)
plt.xlabel('Experience', size=10)
plt.ylabel('No. of jobs', size=10)
plt.xticks(rotation=30)
plt.show()

In [16]:
def split(location):
l = location.split(',')
return l[0]
df['country'] = df.location.apply(split)

In [17]:
df.head()

Out[17]: title location department company_profile description requirements benefits required_experience required_education industry function fraudulent country

We're Food52, and Food52, a fast-growing, Experience with content

US, NY, New
0 Marketing Intern Marketing we've created a James Beard Award- management systems a Internship Marketing 0 US
York
groundbreaki... winn... m...

What you will

Customer Service 90 Seconds, the Organised - Focused - Marketing
NZ, , What we expect from get from Customer
1 - Cloud Video Success worlds Cloud Video Vibrant - Awesome!Do Not Applicable and 0 NZ
Auckland you:Your key responsibilit... usThrough Service
Production Production ... you... Advertising
being part of...

Commissioning Valor Services Implement pre-

US, IA, Our client, located in
2 Machinery provides Workforce commissioning and 0 US
Wever Houston, is actively se...
Assistant (CMA) Solutions th... commissioning ...

Our culture is
Our passion for THE COMPANY: ESRI –
Account Executive US, DC, EDUCATION: Bachelor’s or anything but Computer
3 Sales improving quality of Environmental Systems Mid-Senior level Bachelor's Degree Sales 0 US
- Washington DC Washington Master’s in GIS, busi... corporate—we Software
life thro... Rese...
have ...

SpotSource Solutions JOB TITLE: Itemization QUALIFICATIONS:RN Health

Bill Review US, FL, Fort Full Benefits Hospital &
4 LLC is a Global Review license in the State of Mid-Senior level Bachelor's Degree Care 0 US
Manager Worth Offered Health Care
Human Cap... ManagerLOCATION:... Texa... Provider

In [18]:
countr = dict(df.country.value_counts()[:14])
del countr['']
countr

Out[18]: {'US': 10656,

'GB': 2384,
'GR': 940,
'CA': 457,
'DE': 383,
'NZ': 333,
'IN': 276,
'AU': 214,
'PH': 132,
'NL': 127,
'BE': 117,
'IE': 114,
'SG': 80}

In [19]:
plt.figure(figsize=(8,6))
plt.title('Country-wise Job Posting',size=20)
plt.bar(countr.keys(), countr.values())
plt.ylabel('No. of jobs', size=10)
plt.xlabel('Countries', size=10)

Out[19]: Text(0.5, 0, 'Countries')

In [20]:
edu = dict(df.required_education.value_counts()[:7])
del edu['']
edu

Out[20]: {"Bachelor's Degree": 5145,

'High School or equivalent': 2080,
'Unspecified': 1397,
"Master's Degree": 416,
'Associate Degree': 274,
'Certification': 170}

In [21]:
plt.figure(figsize=(15,6))
plt.title('Job postings based on Education', size=20)
plt.bar(edu.keys(), edu.values())
plt.ylabel('No. of Jobs', size=10)
plt.xlabel('Education', size=10)

Out[21]: Text(0.5, 0, 'Education')

In [22]:
print(df[df.fraudulent==0].title.value_counts()[:10])

English Teacher Abroad 311

Customer Service Associate 146
Graduates: English Teacher Abroad (Conversational) 144
English Teacher Abroad 95
Software Engineer 86
English Teacher Abroad (Conversational) 83
Customer Service Associate - Part Time 76
Account Manager 73
Web Developer 66
Project Manager 62
Name: title, dtype: int64

In [23]:
print(df[df.fraudulent==1].title.value_counts()[:10])

Cruise Staff Wanted URGENT 21

Data Entry Admin/Clerical Positions - Work From Home 21
Home Based Payroll Typist/Data Entry Clerks Positions Available 21
Customer Service Representative 17
Administrative Assistant 16
Home Based Payroll Data Entry Clerk Position - Earn $100-$200 Daily 12
Account Sales Managers $80-$130,000/yr 10
Network Marketing 10
Payroll Data Coordinator Positions - Earn $100-$200 Daily 10
Payroll Clerk 10
Name: title, dtype: int64

In [24]:
df['text']=df['title']+' '+df['company_profile']+' '+df['description']+' '+df['requirements']+' '+df['benefits']
del df['title']
del df['location']
del df['department']
del df['company_profile']
del df['description']
del df['requirements']
del df['benefits']
del df['required_experience']
del df['required_education']
del df['industry']
del df['function']
del df['country']

In [25]:
df.head()

Out[25]: fraudulent text

0 0 Marketing Intern We're Food52, and we've creat...

1 0 Customer Service - Cloud Video Production 90 S...

2 0 Commissioning Machinery Assistant (CMA) Valor ...

3 0 Account Executive - Washington DC Our passion ...

4 0 Bill Review Manager SpotSource Solutions LLC i...

In [26]:
fraudjobs_text = df[df.fraudulent==1].text
realjobs_text = df[df.fraudulent==0].text

In [27]:
STOPWORDS = spacy.lang.en.stop_words.STOP_WORDS
plt.figure(figsize=(16,14))
wc = WordCloud(min_font_size = 3, max_words = 3000, width = 1500, height = 800, stopwords= STOPWORDS).generate(str(" ".join(fraudjobs_text)))
plt.imshow(wc, interpolation = 'bilinear')

Out[27]: <matplotlib.image.AxesImage at 0x157155b68b0>

In [28]:
STOPWORDS = spacy.lang.en.stop_words.STOP_WORDS
plt.figure(figsize=(16,14))
wc = WordCloud(min_font_size = 3, max_words = 3000, width = 1500, height = 800, stopwords= STOPWORDS).generate(str(" ".join(realjobs_text)))
plt.imshow(wc, interpolation = 'bilinear')

Out[28]: <matplotlib.image.AxesImage at 0x157148d7040>

In [29]:
!pip install spacy && python -m spacy download en

Requirement already satisfied: spacy in c:\users\hp\anaconda3\lib\site-packages (3.2.4)

Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.8 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (3.0.9)
Requirement already satisfied: setuptools in c:\users\hp\anaconda3\lib\site-packages (from spacy) (52.0.0.post20210125)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (3.3.0)
Requirement already satisfied: thinc<8.1.0,>=8.0.12 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (8.0.15)
Requirement already satisfied: jinja2 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (2.11.3)
Requirement already satisfied: click<8.1.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (7.1.2)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (3.0.6)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (1.8.2)
Requirement already satisfied: packaging>=20.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (20.9)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (2.25.1)
Requirement already satisfied: numpy>=1.15.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (1.20.1)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (0.7.7)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (2.0.6)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (1.0.2)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (4.59.0)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (2.0.7)
Requirement already satisfied: typer<0.5.0,>=0.3.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (0.4.1)
Requirement already satisfied: srsly<3.0.0,>=2.4.1 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (2.4.2)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (1.0.6)
Requirement already satisfied: wasabi<1.1.0,>=0.8.1 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (0.9.1)
Requirement already satisfied: pathy>=0.3.5 in c:\users\hp\anaconda3\lib\site-packages (from spacy) (0.6.1)
Requirement already satisfied: pyparsing>=2.0.2 in c:\users\hp\anaconda3\lib\site-packages (from packaging>=20.0->spacy) (2.4.7)
Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in c:\users\hp\anaconda3\lib\site-packages (from pathy>=0.3.5->spacy) (5.2.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\users\hp\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4->spacy) (3.7.4.3)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.26.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.10)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\hp\anaconda3\lib\site-packages (from jinja2->spacy) (1.1.1)
Collecting en-core-web-sm==3.2.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl (13.9 MB)
Requirement already satisfied: spacy<3.3.0,>=3.2.0 in c:\users\hp\anaconda3\lib\site-packages (from en-core-web-sm==3.2.0) (3.2.4)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (3.3.0)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.0.2)
Requirement already satisfied: packaging>=20.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (20.9)
Requirement already satisfied: wasabi<1.1.0,>=0.8.1 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (0.9.1)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.0.7)
Requirement already satisfied: pathy>=0.3.5 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (0.6.1)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (0.7.7)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (3.0.6)
Requirement already satisfied: click<8.1.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (7.1.2)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.0.6)
Requirement already satisfied: thinc<8.1.0,>=8.0.12 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (8.0.15)
Requirement already satisfied: typer<0.5.0,>=0.3.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (0.4.1)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.8 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (3.0.9)
Requirement already satisfied: numpy>=1.15.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.20.1)
Requirement already satisfied: setuptools in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (52.0.0.post20210125)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.8.2)
Requirement already satisfied: jinja2 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.11.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.25.1)
Requirement already satisfied: srsly<3.0.0,>=2.4.1 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.4.2)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (4.59.0)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\users\hp\anaconda3\lib\site-packages (from spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.0.6)
Requirement already satisfied: pyparsing>=2.0.2 in c:\users\hp\anaconda3\lib\site-packages (from packaging>=20.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.4.7)
Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in c:\users\hp\anaconda3\lib\site-packages (from pathy>=0.3.5->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (5.2.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\users\hp\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0)
(3.7.4.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2020.12.5)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (4.0.0)
Requirement already satisfied: idna<3,>=2.5 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\hp\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.26.4)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\hp\anaconda3\lib\site-packages (from jinja2->spacy<3.3.0,>=3.2.0->en-core-web-sm==3.2.0) (1.1.1)
[!] As of spaCy v3.0, shortcuts like 'en' are deprecated. Please use the full
pipeline package name 'en_core_web_sm' instead.
[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')

In [30]:
punctuations = string.punctuation

nlp = spacy.load("en_core_web_sm")
stop_words = spacy.lang.en.stop_words.STOP_WORDS

parser = English()

def spacy_tokenizer(sentence):
mytokens = parser(sentence)

mytokens = [word.lemma_.lower().strip() if word.lemma_ != "-PRON" else word.lower_ for word in mytokens]

mytokens = [ word for word in mytokens if word not in stop_words and word not in punctuations ]

return mytokens

# Custom transformer using spacy

class predictors(TransformerMixin):
def transform(self, X, **transform_params):
#Cleaning text
return [clean_text(text) for text in X]

def fit(self, X, y=None, **fit_params):

return self

def get_params(self, deep=True):

return {}

# Basic function to clean the text

def clean_text(text):
# Removing spaces and converting text to lowercase
return text.strip().lower()

In [31]:
df['text'] = df['text'].apply(clean_text)

In [32]:
cv = TfidfVectorizer(max_features = 100)
x = cv.fit_transform(df['text'])
df1 = pd.DataFrame(x.toarray(), columns = cv.get_feature_names())
df.drop(['text'], axis=1, inplace=True)
main_df = pd.concat([df1,df], axis=1)

In [33]:
main_df.head()

Out[33]: ability about all also amp an and are as at ... who will with work working world years you your fraudulent

0 0.000000 0.041120 0.000000 0.042424 0.036488 0.000000 0.755238 0.000000 0.078653 0.000000 ... 0.000000 0.000000 0.186067 0.051026 0.068029 0.000000 0.000000 0.000000 0.000000 0

1 0.021895 0.094183 0.035394 0.024292 0.041787 0.029771 0.490896 0.056626 0.060050 0.052431 ... 0.000000 0.078004 0.165735 0.043827 0.116862 0.099327 0.000000 0.204854 0.130452 0

2 0.000000 0.000000 0.176807 0.000000 0.041749 0.089231 0.397029 0.113149 0.000000 0.000000 ... 0.000000 0.062346 0.307512 0.058383 0.000000 0.000000 0.000000 0.094462 0.074476 0

3 0.023267 0.000000 0.018806 0.000000 0.000000 0.094909 0.695542 0.000000 0.031906 0.037144 ... 0.023132 0.049735 0.075480 0.046573 0.000000 0.105551 0.019806 0.050236 0.059411 0

4 0.000000 0.000000 0.068009 0.000000 0.040147 0.028602 0.606379 0.081605 0.115386 0.000000 ... 0.000000 0.000000 0.159230 0.028071 0.037425 0.000000 0.035814 0.030279 0.107427 0

5 rows × 101 columns

In [34]:
Y = main_df.iloc[:, -1]
X = main_df.iloc[:, :-1]

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3)

print(X_train.shape)
print(Y_train.shape)
print(X_test.shape)
print(Y_test.shape)

(12516, 100)
(12516,)
(5364, 100)
(5364,)

In [35]:
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_jobs=3,oob_score=True,n_estimators=100,criterion="entropy")
model = rfc.fit(X_train,Y_train)

In [36]:
print(X_test)

ability about all also amp an and \

4319 0.000000 0.000000 0.000000 0.082903 0.000000 0.050800 0.358994
8997 0.000000 0.000000 0.000000 0.000000 0.052054 0.000000 0.436794
4062 0.141852 0.101699 0.038218 0.000000 0.000000 0.128586 0.429105
9844 0.000000 0.000000 0.102455 0.000000 0.000000 0.086178 0.304500
1854 0.042142 0.090640 0.068125 0.000000 0.120646 0.057302 0.584916
... ... ... ... ... ... ... ...
14190 0.000000 0.000000 0.132536 0.000000 0.000000 0.037160 0.583560
2631 0.000000 0.000000 0.000000 0.000000 0.000000 0.123859 0.615942
11486 0.000000 0.000000 0.032025 0.000000 0.340289 0.080812 0.465325
13669 0.000000 0.000000 0.056661 0.000000 0.200687 0.047659 0.523907
2763 0.000000 0.070304 0.000000 0.072533 0.000000 0.044446 0.558382

are as at ... well who will \

4319 0.144939 0.000000 0.059645 ... 0.000000 0.074289 0.212966
8997 0.000000 0.074804 0.000000 ... 0.000000 0.000000 0.038868
4062 0.122290 0.000000 0.000000 ... 0.049710 0.000000 0.000000
9844 0.081958 0.086913 0.050591 ... 0.000000 0.000000 0.135479
1854 0.081745 0.028896 0.000000 ... 0.000000 0.000000 0.030028
... ... ... ... ... ... ... ...
14190 0.000000 0.074954 0.000000 ... 0.000000 0.000000 0.077892
2631 0.078530 0.041639 0.000000 ... 0.063844 0.000000 0.173082
11486 0.000000 0.054335 0.000000 ... 0.000000 0.000000 0.028232
13669 0.090651 0.000000 0.000000 ... 0.000000 0.000000 0.049950
2763 0.000000 0.000000 0.000000 ... 0.000000 0.064997 0.000000

with work working world years you your

4319 0.121203 0.099714 0.199411 0.000000 0.000000 0.161335 0.063600
8997 0.117975 0.036397 0.000000 0.000000 0.092872 0.039260 0.000000
4062 0.153395 0.063099 0.000000 0.053627 0.040252 0.000000 0.000000
9844 0.034268 0.000000 0.000000 0.000000 0.000000 0.182460 0.000000
1854 0.159501 0.168714 0.149956 0.000000 0.000000 0.030331 0.107610
... ... ... ... ... ... ... ...
14190 0.059106 0.109410 0.000000 0.061990 0.046529 0.000000 0.000000
2631 0.197008 0.121559 0.108044 0.000000 0.000000 0.087413 0.051689
11486 0.085692 0.026437 0.000000 0.000000 0.000000 0.199615 0.067449
13669 0.000000 0.000000 0.000000 0.000000 0.059675 0.100906 0.000000
2763 0.106042 0.043621 0.000000 0.000000 0.000000 0.282309 0.111289

[5364 rows x 100 columns]

In [37]:
pred = rfc.predict(X_test)
score = accuracy_score(Y_test, pred)
score

Out[37]: 0.9737136465324385

In [38]:
print('Classification_Report\n')
print(classification_report(Y_test, pred))
print('Confusion Matrix\n')
print(confusion_matrix(Y_test, pred))

Classification_Report

precision recall f1-score support

0 0.97 1.00 0.99 5117

1 1.00 0.43 0.60 247

accuracy 0.97 5364

macro avg 0.99 0.71 0.79 5364
weighted avg 0.97 0.97 0.97 5364

Confusion Matrix

[[5117 0]
[ 141 106]]

In [ ]:

Bigmartsalesprediction
No ratings yet
Bigmartsalesprediction
27 pages
Clickbait Classifier Modified
No ratings yet
Clickbait Classifier Modified
21 pages
Kaggle GPU Setup for ComfyUI
No ratings yet
Kaggle GPU Setup for ComfyUI
10 pages
ComfyUI Installation Guide for Python
No ratings yet
ComfyUI Installation Guide for Python
24 pages
Caso 2 Lau
No ratings yet
Caso 2 Lau
27 pages
Data Science Package Setup
No ratings yet
Data Science Package Setup
28 pages
1Speech&LangProc - Ipynb - Colab
No ratings yet
1Speech&LangProc - Ipynb - Colab
3 pages
Assign 9-20U00323 Sec C.ipynb - Colaboratory
No ratings yet
Assign 9-20U00323 Sec C.ipynb - Colaboratory
5 pages
DL - Libraries - Installation - Jupyter Notebook
No ratings yet
DL - Libraries - Installation - Jupyter Notebook
2 pages
Data Import and Visualization Setup
No ratings yet
Data Import and Visualization Setup
4 pages
Data Science
No ratings yet
Data Science
1 page
Python Module Installation Log
No ratings yet
Python Module Installation Log
5 pages
Error
No ratings yet
Error
22 pages
ChatGPT Data Science Cheat Sheet
100% (5)
ChatGPT Data Science Cheat Sheet
78 pages
ChatGPT Cheat Sheet - DataCamp PDF
91% (11)
ChatGPT Cheat Sheet - DataCamp PDF
78 pages
Assignment 2
No ratings yet
Assignment 2
17 pages
Mini Projects 3-6-Satyaki Mitra
No ratings yet
Mini Projects 3-6-Satyaki Mitra
60 pages
Named - Entity - Recognition (LAbsheet-07) .Ipynb (20221CSE0413) - Colab
No ratings yet
Named - Entity - Recognition (LAbsheet-07) .Ipynb (20221CSE0413) - Colab
2 pages
AI's Impact on Industries and Ethics
No ratings yet
AI's Impact on Industries and Ethics
14 pages
IQBAL Fresher 19
No ratings yet
IQBAL Fresher 19
3 pages
Python Advanced Exercises - Google Search
No ratings yet
Python Advanced Exercises - Google Search
2 pages
Sayan Sarkar: Education
No ratings yet
Sayan Sarkar: Education
1 page
A145286344 23681 24 2018 Tensorflow
No ratings yet
A145286344 23681 24 2018 Tensorflow
15 pages
Ccs339 Text and Speech Analysis Lab Manual
No ratings yet
Ccs339 Text and Speech Analysis Lab Manual
51 pages
TP Final Grupo 2 V7.ipynb - Colab
No ratings yet
TP Final Grupo 2 V7.ipynb - Colab
40 pages
Introduction To Python 3: Chang Y. Chung
No ratings yet
Introduction To Python 3: Chang Y. Chung
25 pages
Convert IPYNB to PDF with Vertopal
No ratings yet
Convert IPYNB to PDF with Vertopal
5 pages
AnuragRoy CV
No ratings yet
AnuragRoy CV
1 page
Python
No ratings yet
Python
8 pages
BCA ML Lab Manual: Setup & Examples
No ratings yet
BCA ML Lab Manual: Setup & Examples
23 pages
Pip Install Jupyterthemes
No ratings yet
Pip Install Jupyterthemes
14 pages
Instal Modules
No ratings yet
Instal Modules
6 pages
AI Phase4
No ratings yet
AI Phase4
26 pages
Microsoft Windows (Versión 10.0.261
No ratings yet
Microsoft Windows (Versión 10.0.261
87 pages
ABC
No ratings yet
ABC
26 pages
Unit-2 - Jupyter Notebook
No ratings yet
Unit-2 - Jupyter Notebook
17 pages
Face Mask Detection
No ratings yet
Face Mask Detection
32 pages
Intro To Python
No ratings yet
Intro To Python
11 pages
Underwater Object Detection With YOLO v8
No ratings yet
Underwater Object Detection With YOLO v8
47 pages
JRF Interview Preparation Checklist
No ratings yet
JRF Interview Preparation Checklist
4 pages
Python For Web Development Pre
No ratings yet
Python For Web Development Pre
15 pages
Python Package Installation Guide
No ratings yet
Python Package Installation Guide
519 pages
Python for Scientific Computing
No ratings yet
Python for Scientific Computing
77 pages
Apply SVM To Amazon Reviews Data Set Avg W2vec (M)
No ratings yet
Apply SVM To Amazon Reviews Data Set Avg W2vec (M)
8 pages
Major Project
No ratings yet
Major Project
144 pages
2025 Quiz 375
No ratings yet
2025 Quiz 375
5 pages
Requirements Dev
No ratings yet
Requirements Dev
7 pages
PY0101 - Python For Data Science, AI, & Development Cheat Sheet
No ratings yet
PY0101 - Python For Data Science, AI, & Development Cheat Sheet
2 pages
3 Prerequisites Software
No ratings yet
3 Prerequisites Software
5 pages
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
No ratings yet
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
8 pages
Introduction to Python Programming
No ratings yet
Introduction to Python Programming
53 pages
YOLOv5 Setup for Aircraft Detection
No ratings yet
YOLOv5 Setup for Aircraft Detection
5 pages
ENGG1003 Lab08 PythonBasics 2425T2
No ratings yet
ENGG1003 Lab08 PythonBasics 2425T2
31 pages
Report Intership Chapters
No ratings yet
Report Intership Chapters
39 pages
Statistics Machine Learning Python
No ratings yet
Statistics Machine Learning Python
356 pages
Troubleshooting Conda Environment Issues
No ratings yet
Troubleshooting Conda Environment Issues
5 pages
Web Server Log Analysis
No ratings yet
Web Server Log Analysis
6 pages
Compensation Basics for Employees
50% (2)
Compensation Basics for Employees
30 pages
Establishing Landlord Relationships
100% (1)
Establishing Landlord Relationships
58 pages
Accenture - How To Out-Collaborate The Crisis 2020
No ratings yet
Accenture - How To Out-Collaborate The Crisis 2020
32 pages
The Digital Evolution in Hospitality A Global Revi
No ratings yet
The Digital Evolution in Hospitality A Global Revi
23 pages
Project Report of SLIC
100% (1)
Project Report of SLIC
87 pages
SM Process
No ratings yet
SM Process
26 pages
Job Description For College Placement Drive - SAP & EPM
No ratings yet
Job Description For College Placement Drive - SAP & EPM
2 pages
Social Audit On GSWS
No ratings yet
Social Audit On GSWS
13 pages
America's Most Successful Startups (Samwer, 1999)
93% (30)
America's Most Successful Startups (Samwer, 1999)
180 pages
Bangalore Real Estate Market Outlook 2009
No ratings yet
Bangalore Real Estate Market Outlook 2009
53 pages
Bachetle (Inv. F2025151 (MJF828) 1-180
No ratings yet
Bachetle (Inv. F2025151 (MJF828) 1-180
1 page
Mighty Corp: History & Tax Evasion Case
No ratings yet
Mighty Corp: History & Tax Evasion Case
18 pages
Practice Ch4
No ratings yet
Practice Ch4
35 pages
Planit Tosca Overview
No ratings yet
Planit Tosca Overview
16 pages
Indian Accounting Standards (Ind AS) : Basic Concepts
No ratings yet
Indian Accounting Standards (Ind AS) : Basic Concepts
18 pages
Module 1 TSA - Aguilar
No ratings yet
Module 1 TSA - Aguilar
4 pages
Assignment Sales of Goods
No ratings yet
Assignment Sales of Goods
6 pages
Electronic Payment System
No ratings yet
Electronic Payment System
6 pages
The Middle East GC Summit - 2024 - V1
No ratings yet
The Middle East GC Summit - 2024 - V1
2 pages
Exide Life Golden Years Policy Details
No ratings yet
Exide Life Golden Years Policy Details
38 pages
Secure Your Salesforce Ecosystem Identity and Access Management
No ratings yet
Secure Your Salesforce Ecosystem Identity and Access Management
10 pages
Homework # 2 - 700
No ratings yet
Homework # 2 - 700
3 pages
Financial Calculations Guide
100% (2)
Financial Calculations Guide
20 pages
How To Use Volume Oscillator To Boost Your Profits
100% (3)
How To Use Volume Oscillator To Boost Your Profits
15 pages
Negotiations Assignment
No ratings yet
Negotiations Assignment
6 pages
Ruban Raj IDM End Term
No ratings yet
Ruban Raj IDM End Term
35 pages
Sales Memory Aid
No ratings yet
Sales Memory Aid
37 pages
Create a Winning Resume with Rbk Resume
100% (1)
Create a Winning Resume with Rbk Resume
8 pages
Guideline Effective Control and Management of Railway Operations PDF
No ratings yet
Guideline Effective Control and Management of Railway Operations PDF
22 pages
Nivea Skincare Operations Strategy Overview
No ratings yet
Nivea Skincare Operations Strategy Overview
17 pages

Fake Job Detection

Uploaded by

Fake Job Detection

Uploaded by

In [1]:

!pip install wordcloud

Requirement already satisfied: wordcloud in c:\users\hp\anaconda3\lib\site-packages (1.8.1)

Requirement already satisfied: spacy in c:\users\hp\anaconda3\lib\site-packages (3.2.4)

Commissioning Valor Services

Out[6]: (17880, 18)

We're Food52, and

Commissioning Valor Services provides Implement pre-

SpotSource Solutions JOB TITLE: Itemization

Out[14]: {'Mid-Senior level': 3809,

We're Food52, and Food52, a fast-growing, Experience with content

What you will

Commissioning Valor Services Implement pre-

SpotSource Solutions JOB TITLE: Itemization QUALIFICATIONS:RN Health

Out[18]: {'US': 10656,

Out[19]: Text(0.5, 0, 'Countries')

Out[20]: {"Bachelor's Degree": 5145,

Out[21]: Text(0.5, 0, 'Education')

English Teacher Abroad 311

Cruise Staff Wanted *URGENT* 21

Out[25]: fraudulent text

0 0 Marketing Intern We're Food52, and we've creat...

1 0 Customer Service - Cloud Video Production 90 S...

2 0 Commissioning Machinery Assistant (CMA) Valor ...

3 0 Account Executive - Washington DC Our passion ...

4 0 Bill Review Manager SpotSource Solutions LLC i...

Out[27]: <matplotlib.image.AxesImage at 0x157155b68b0>

Out[28]: <matplotlib.image.AxesImage at 0x157148d7040>

Requirement already satisfied: spacy in c:\users\hp\anaconda3\lib\site-packages (3.2.4)

mytokens = [word.lemma_.lower().strip() if word.lemma_ != "-PRON" else word.lower_ for word in mytokens]

# Custom transformer using spacy

def fit(self, X, y=None, **fit_params):

def get_params(self, deep=True):

# Basic function to clean the text

5 rows × 101 columns

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3)

ability about all also amp an and \

are as at ... well who will \

with work working world years you your

[5364 rows x 100 columns]

precision recall f1-score support

0 0.97 1.00 0.99 5117

accuracy 0.97 5364

You might also like

Cruise Staff Wanted URGENT 21