0% found this document useful (0 votes)

23 views22 pages

ML Projects

Uploaded by

subhasishguha742

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views22 pages

ML Projects

Uploaded by

subhasishguha742

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Task-MINI PROJECT ON MACHINE LEARNING

Name: Prof Shantanu Chakraborty

Reg.No: GO_STP_7458

Date: 08-06-2021

Logistic Regression Model on Why HR Leaving | Predicting employee attrition using Machine Learning

Predict retention of an employee within an organization such that whether the employee will leave the company or continue with it. An
organization is only as good as its employees, and these people are the true source of its competitive advantage.

Kaggle Link: https://www.kaggle.com/giripujar/hr-analytics

First do data exploration and visualization, after this create a logistic regression model to predict Employee Attrition Using
Machine Learning & Python.

Import Packages

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as matplot
import seaborn as sns
%matplotlib inline

df=pd.read_csv("/content/HR_comma_sep.csv")
df.head()

satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident left

0 0.38 0.53 2 157 3 0 1

1 0.80 0.86 5 262 6 0 1

2 0.11 0.88 7 272 4 0 1

3 0.72 0.87 5 223 5 0 1

4 0.37 0.52 2 159 3 0 1

df.shape

(14999, 10)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14999 entries, 0 to 14998
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 satisfaction_level 14999 non-null float64
1 last_evaluation 14999 non-null float64
2 number_project 14999 non-null int64
3 average_montly_hours 14999 non-null int64
4 time_spend_company 14999 non-null int64
5 Work_accident 14999 non-null int64
6 left 14999 non-null int64
7 promotion_last_5years 14999 non-null int64
8 Department 14999 non-null object
9 salary 14999 non-null object
dtypes: float64(2), int64(6), object(2)
memory usage: 1.1+ MB

df.dtypes
satisfaction_level float64
last_evaluation float64
number_project int64
average_montly_hours int64
time_spend_company int64
Work_accident int64
left int64
promotion_last_5years int64
Department object
salary object
dtype: object

df.describe()

satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident

count 14999.000000 14999.000000 14999.000000 14999.000000 14999.000000 14999.000000 14

mean 0.612834 0.716102 3.803054 201.050337 3.498233 0.144610

std 0.248631 0.171169 1.232592 49.943099 1.460136 0.351719

min 0.090000 0.360000 2.000000 96.000000 2.000000 0.000000

25% 0.440000 0.560000 3.000000 156.000000 3.000000 0.000000

50% 0.640000 0.720000 4.000000 200.000000 3.000000 0.000000

75% 0.820000 0.870000 5.000000 245.000000 4.000000 0.000000

max 1.000000 1.000000 7.000000 310.000000 10.000000 1.000000

df.columns

Index(['satisfaction_level', 'last_evaluation', 'number_project',

'average_montly_hours', 'time_spend_company', 'Work_accident', 'left',
'promotion_last_5years', 'Department', 'salary'],
dtype='object')

df.groupby('left').count()
satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident pro

left

0 11428 11428 11428 11428 11428 11428

1 3571 3571 3571 3571 3571 3571

sns.set(rc={'figure.figsize':(9,7)})
correlation_matrix = df.corr().round(2)
sns.heatmap(data=correlation_matrix, annot=True ,cmap="YlGnBu")
<matplotlib.axes._subplots.AxesSubplot at 0x7ffaf6b30790>

corr = df.corr()
corr = (corr)
sns.heatmap(corr,
xticklabels=corr.columns.values,
yticklabels=corr.columns.values,cmap='PuRd')
plt.title('Heatmap of Correlation Matrix')
corr
satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Wo

satisfaction_level 1.000000 0.105021 -0.142970 -0.020048 -0.100866

last_evaluation 0.105021 1.000000 0.349333 0.339742 0.131591

number_project -0.142970 0.349333 1.000000 0.417211 0.196786

average_montly_hours -0.020048 0.339742 0.417211 1.000000 0.127755

time_spend_company -0.100866 0.131591 0.196786 0.127755 1.000000

Work_accident 0.058697 -0.007104 -0.004741 -0.010143 0.002120

left -0.388375 0.006567 0.023787 0.071287 0.144822

promotion_last_5years 0.025605 -0.008684 -0.006064 -0.003544 0.067433

df.groupby('salary').mean()

satisfaction_level last_evaluation number_project average_montly_hours time_spend_company Work_accident

salary

high 0.637470 0.704325 3.767179 199.867421 3.692805 0.155214

low 0.600753 0.717017 3.799891 200.996583 3.438218 0.142154

medium 0.621817 0.717322 3.813528 201.338349 3.529010 0.145361

pd.crosstab(df.Department, df.left)

left 0 1

Department

IT 954 273

RandD 666 121

accounting 563 204

hr 524 215

management 539 91

marketing 655 203

product_mng 704 198

sales 3126 1014

support 1674 555

technical 2023 697

emp_population_satisfaction = df['satisfaction_level'].mean()
emp_turnover_satisfaction = df [df [ 'left']==1]['satisfaction_level'].mean()

print('The mean for the employee population is: ' +

str(emp_population_satisfaction))
print('The mean for the employees that had a left is: '+ str (emp_turnover_satisfaction))

The mean for the employee population is: 0.6128335222348166

The mean for the employees that had a left is: 0.44009801176140917

f, axes = plt.subplots (ncols=3, figsize=(15, 6))

sns.distplot (df.satisfaction_level, kde= False, color="y", ax=axes[0]).set_title('Employee Satisfaction Distribution')

axes[0].set_ylabel('Employee Count')

sns.distplot (df.last_evaluation, kde= False, color="b", ax=axes[1]).set_title('Employee Evaluation Distribution')

axes [1].set_ylabel ('Employee Count')

sns.distplot (df.average_montly_hours, kde=False, color="r", ax=axes[2]).set_title( 'Employee Average Monthly Hours Distribu

axes[2].set_ylabel ('Employee Count')

/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated
functio warnings.warn(msg, FutureWarning)
Text(0, 0.5, 'Employee Count')

color_types = ['#78c850','#F08030','#6890F0','#ABB820','#A8A878','#A040A0','#F8D030',

'#E0C068','#EE99AC', '#C03028','#F85888','#B8A038','#705898','#98D8D8', '#7038F8']

sns.countplot (x='Department', data=df, palette=color_types).set_title('Employee Department Distribution');
%matplotlib inline

#Bar chart for department employee work for and the frequency of
turnover pd.crosstab(df.Department,df.left).plot(kind='bar')
plt.title('Turnover Frequency for Department')
plt.xlabel('Department')
plt.ylabel('Frequency of Turnover')
plt.savefig('department_bar_chart')

#Bar chart for employee salary level and the frequency of turnover
table=pd.crosstab(df.salary, df.left)
table.div(table.sum(1).astype(float), axis=0).plot(kind='bar', stacked=True)
plt.title('Stacked Bar Chart of Salary Level vs Turnover')
plt.xlabel('Salary Level')
plt.ylabel('Proportion of Employees')
plt.savefig('salary_bar_chart')
fig = plt.figure(figsize= (15,5),)

ax=sns.kdeplot (df.loc [(df ["left"] == 0),'last_evaluation'], color='blue', shade=True)

ax=sns.kdeplot (df.loc[ (df['left'] == 1),'last_evaluation'], color='black', shade=True)

plt.title('Employee Evaluation Distribution Left vs retained')

Text(0.5, 1.0, 'Employee Evaluation Distribution Left vs retained')

fig= plt.figure (figsize=(15,5))

ax=sns.kdeplot(df.loc[(df ['left'] == 0),'average_montly_hours'], color='green', shade=True)

ax=sns.kdeplot(df.loc[(df['left'] ==1),'average_montly_hours'], color='red',shade=True)

plt.title('Employee Evaluation Distribution Left vs retained')

Text(0.5, 1.0, 'Employee Evaluation Distribution Left vs retained')

fig=plt.figure(figsize=(15,5))

ax=sns.kdeplot(df.loc[(df ['left'] == 0), 'satisfaction_level'], color='red', shade=True)

ax=sns.kdeplot(df.loc[(df [ 'left'] == 1), 'satisfaction_level'], color='black', shade=True)
plt.title ('Employee Evaluation Distribution Left vs retained')

Text(0.5, 1.0, 'Employee Evaluation Distribution Left vs retained')

Code Text
data = df[['satisfaction_level', 'average_montly_hours', 'promotion_last_5years',

'salary']] data.head ()
satisfaction_level average_montly_hours promotion_last_5years salary

0 0.38 157 0 low

1 0.80 262 0 medium

2 0.11 272 0 medium

3 0.72 223 0 low

4 0.37 159 0 low

salary=pd.get_dummies (data['salary'], prefix='salary')

salary

salary_high salary_low salary_medium

0 0 1 0

1 0 0 1

2 0 0 1

3 0 1 0

4 0 1 0

... ... ... ...

14994 0 1 0

14995 0 1 0

14996 0 1 0

14997 0 1 0

14998 0 1 0
14999 rows × 3 columns

new_df = pd.concat([data,salary],axis=1)
new_df
satisfaction_level average_montly_hours promotion_last_5years salary sa

0 0.38 157 0 low

1 0.80 262 0 medium

2 0.11 272 0 medium

3 0.72 223 0 low

4 0.37 159 0 low

... ... ... ... ...

14994 0.40 151 0 low

14995 0.37 160 0 low

14996 0.37 143 0 low

new_df.drop(['salary','salary_high'], axis=1, inplace=True)

new_df
satisfaction_level average_montly_hours promotion_last_5years salary_low

0 0.38 157 0 1

1 0.80 262 0 0

2 0.11 272 0 0

3 0.72 223 0 1
4 0.37 159 0 1
X = new_df.copy()
X ... ... ... ... ...
14994 0.40 151 0 1
satisfaction_level average_montly_hours promotion_last_5years salary_low
14995 0.37 160 0 1
0 0.38 157 0 1
14996 0.37 143 0 1
1 0.80 262 0 0

2 0.11 272 0 0

3 0.72 223 0 1

4 0.37 159 0 1

... ... ... ... ...

14994 0.40 151 0 1

14995 0.37 160 0 1

14996 0.37 143 0 1

y = df['left']
y

0 1
1 1
2 1
3 1
4 1
..
14994 1
14995 1
14996 1
14997 1
14998 1
Name: left, Length: 14999, dtype: int64

from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(X, y,
test_size=0.30, random_state=99)
train_x.shape, test_x.shape, train_y.shape, test_y.shape

((10499, 5), (4500, 5), (10499,), (4500,))

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression(solver='liblinear')
lr.fit(train_x,train_y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,

intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='auto', n_jobs=None, penalty='l2',
random_state=None, solver='liblinear', tol=0.0001, verbose=0,
warm_start=False)

y_pred = lr.predict(test_x)
y_pred

array([0, 0, 0, ..., 0, 0, 0])

lr.score(test_x,test_y)

0.7724444444444445
from sklearn.metrics import accuracy_score,confusion_matrix
plot_confusion_matrix

accuracy_score(test_y,y_pred)

0.7724444444444445

confusion_matrix(test_y,y_pred)

array([[3202, 229],
[ 795, 274]])

plot_confusion_matrix(lr, test_x, test_y,cmap=plt.cm.PuBu)

<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x7ffaf2863710>

from sklearn import metrics

y_true = test_y # true labels
y_probas = y_pred # predicted results
fpr tpr thresholds = metrics roc curve(y true y probas pos label=0)
fpr, tpr, thresholds = metrics.roc_curve(y_true, y_probas, pos_label=0)
# Print ROC curve
plt.plot(fpr,tpr,linewidth="4",color='black')
plt.show()
# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)
c 0s completed at 6:00 PM

HR Analytic Using Logistic Regression
No ratings yet
HR Analytic Using Logistic Regression
12 pages
Predicting Employee Churn in Python
100% (1)
Predicting Employee Churn in Python
19 pages
Churn Prediction - Commercial Use of Data Science
No ratings yet
Churn Prediction - Commercial Use of Data Science
25 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
Assignment Ds Midterm
No ratings yet
Assignment Ds Midterm
2 pages
Employee Info
No ratings yet
Employee Info
2 pages
PA Lab 4
No ratings yet
PA Lab 4
6 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
2 pages
Salary Prediction
No ratings yet
Salary Prediction
32 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
Maxbox Starter139 Top5 Data Diagram Types
No ratings yet
Maxbox Starter139 Top5 Data Diagram Types
4 pages
Employee Data Analysis Report
No ratings yet
Employee Data Analysis Report
22 pages
Chapter 1
No ratings yet
Chapter 1
19 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Satya772244@gmail Compdf
No ratings yet
Satya772244@gmail Compdf
7 pages
Geo Python Doc (1) 7,8 Bavesh
No ratings yet
Geo Python Doc (1) 7,8 Bavesh
9 pages
Komal ML Assg1
No ratings yet
Komal ML Assg1
9 pages
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
No ratings yet
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
10 pages
[email protected]
No ratings yet
[email protected]
13 pages
Office Sales Data Visualization
No ratings yet
Office Sales Data Visualization
27 pages
AI Assignment 6 - Employee Performance Analysis - Jupyter Notebook
No ratings yet
AI Assignment 6 - Employee Performance Analysis - Jupyter Notebook
9 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Data Analysis Using Python
No ratings yet
Data Analysis Using Python
12 pages
Exp1d
No ratings yet
Exp1d
6 pages
Python Assignment-2
No ratings yet
Python Assignment-2
3 pages
Decision - Tree-Random - Forest - Jupyter Notebook
No ratings yet
Decision - Tree-Random - Forest - Jupyter Notebook
12 pages
SMARAN HR Analytics - Ipynb - Colab
No ratings yet
SMARAN HR Analytics - Ipynb - Colab
65 pages
Employee Turnover Analytics
No ratings yet
Employee Turnover Analytics
32 pages
Srushti ML Assign1
No ratings yet
Srushti ML Assign1
9 pages
Random Forest Classifier
No ratings yet
Random Forest Classifier
18 pages
SQL & Python Interview Q&A
No ratings yet
SQL & Python Interview Q&A
7 pages
Predicting Salary with Experience
100% (1)
Predicting Salary with Experience
7 pages
Kunj Project 1
No ratings yet
Kunj Project 1
34 pages
Employee Retention Analysis & Prediction
No ratings yet
Employee Retention Analysis & Prediction
9 pages
188 Code Tugas 1
No ratings yet
188 Code Tugas 1
18 pages
Capstone Project - Employee Attrition Rate
No ratings yet
Capstone Project - Employee Attrition Rate
66 pages
Kunj Project 1
No ratings yet
Kunj Project 1
34 pages
Logistic Binary Classification
No ratings yet
Logistic Binary Classification
3 pages
EDA Techniques in SAS for Data Science
No ratings yet
EDA Techniques in SAS for Data Science
25 pages
Kunj 3
No ratings yet
Kunj 3
34 pages
Data Visualization with Python Libraries
No ratings yet
Data Visualization with Python Libraries
61 pages
Python Data Science Course Overview
No ratings yet
Python Data Science Course Overview
61 pages
Ali Bhai's IP Project
No ratings yet
Ali Bhai's IP Project
31 pages
Salaries For San Francisco Employee
No ratings yet
Salaries For San Francisco Employee
30 pages
Data Project
No ratings yet
Data Project
12 pages
IP Employee Project
No ratings yet
IP Employee Project
31 pages
PySpark Slides
No ratings yet
PySpark Slides
30 pages
06 Seaborn
No ratings yet
06 Seaborn
13 pages
Data Science Salary Analysis 2023
No ratings yet
Data Science Salary Analysis 2023
17 pages
Viksit Ip Project File
No ratings yet
Viksit Ip Project File
33 pages
Capstone Project Assignment
No ratings yet
Capstone Project Assignment
3 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
6 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Employee - Attrition - Rate - Jupyter Notebook
No ratings yet
Employee - Attrition - Rate - Jupyter Notebook
62 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Data Analysis with Python Tools
No ratings yet
Data Analysis with Python Tools
5 pages
Project Report Certificate: Hardware Requirement
No ratings yet
Project Report Certificate: Hardware Requirement
11 pages
Employee Management-Ghanim, Rudra
No ratings yet
Employee Management-Ghanim, Rudra
25 pages
Manage Meetings Assessment Guide
No ratings yet
Manage Meetings Assessment Guide
11 pages
YUVA's Resume
No ratings yet
YUVA's Resume
1 page
Fixed Slite Display: Installation Manual
No ratings yet
Fixed Slite Display: Installation Manual
61 pages
Validitas Reliabilitas Instrumen Kehamilan
No ratings yet
Validitas Reliabilitas Instrumen Kehamilan
4 pages
R&D Engineer Job at SEDEMAC Mechatronics
No ratings yet
R&D Engineer Job at SEDEMAC Mechatronics
2 pages
Economics
No ratings yet
Economics
22 pages
Service Provider Assessment Framework
0% (1)
Service Provider Assessment Framework
68 pages
MVS Notes Unit-I
No ratings yet
MVS Notes Unit-I
16 pages
Motor Data Sheet (90kw)
No ratings yet
Motor Data Sheet (90kw)
7 pages
Overview of Coronary Bypass Surgery
No ratings yet
Overview of Coronary Bypass Surgery
32 pages
Understanding the Mischief Rule in Law
No ratings yet
Understanding the Mischief Rule in Law
28 pages
Exclusive Model Management Contract
No ratings yet
Exclusive Model Management Contract
9 pages
Introduction
No ratings yet
Introduction
16 pages
Osti Ia G Interlocks All v032113
No ratings yet
Osti Ia G Interlocks All v032113
195 pages
GEH-6794-EX2100e Excitation Control Operation Guide
100% (4)
GEH-6794-EX2100e Excitation Control Operation Guide
62 pages
International Standard Banking Practice: Documents and The Need For Completion of A Box, Field or Space
No ratings yet
International Standard Banking Practice: Documents and The Need For Completion of A Box, Field or Space
1 page
Chengalpattu District AI Directory
No ratings yet
Chengalpattu District AI Directory
10 pages
Enhancing Student Engagement in Math
No ratings yet
Enhancing Student Engagement in Math
1 page
Microsoft Word - Gosavi - PRATIK-Orgnal
No ratings yet
Microsoft Word - Gosavi - PRATIK-Orgnal
7 pages
No Speed Limit Three Essays On Accelerationism Forerunners Ideas First Steven Shaviro 2015 University of Minnesota Press PDF
No ratings yet
No Speed Limit Three Essays On Accelerationism Forerunners Ideas First Steven Shaviro 2015 University of Minnesota Press PDF
33 pages
Data Quality - SOP
No ratings yet
Data Quality - SOP
16 pages
Pryce Properties Corp (Pryce Corp) Vs Nolasco JR
No ratings yet
Pryce Properties Corp (Pryce Corp) Vs Nolasco JR
10 pages
Audacity
No ratings yet
Audacity
6 pages
Personality PDF
No ratings yet
Personality PDF
10 pages
High Voltage Pspice Manual PDF
No ratings yet
High Voltage Pspice Manual PDF
35 pages
ELgamal & LFSR
No ratings yet
ELgamal & LFSR
21 pages
Basic Concepts of Statistics
83% (29)
Basic Concepts of Statistics
36 pages
CBC Acp Ncii
100% (3)
CBC Acp Ncii
32 pages
U - CS - 20 - 023 Laundry Management System One
No ratings yet
U - CS - 20 - 023 Laundry Management System One
4 pages
Petition for Relief: Dagupan Case Analysis
100% (1)
Petition for Relief: Dagupan Case Analysis
4 pages