0% found this document useful (0 votes)

43 views20 pages

Employee - Preprocessing - Ipynb - Colab

Data Preproceesing

Uploaded by

vivekagangwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views20 pages

Employee - Preprocessing - Ipynb - Colab

Data Preproceesing

Uploaded by

vivekagangwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

10/23/24, 8:44 PM Employee_Preprocessing.

ipynb - Colab

1 import pandas as pd
2 import numpy as np
3 from sklearn.preprocessing import LabelEncoder, StandardScaler
4 import seaborn as sns
5 import matplotlib.pyplot as plt
6 from mpl_toolkits.mplot3d import Axes3D

1 df = pd.read_csv('/content/Employee.csv')

1 df

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

0 Bachelors 2017 Bangalore 3 34 Male No 0 0

1 Bachelors 2013 Pune 1 28 Female No 3 1

2 Bachelors 2014 New Delhi 3 38 Female No 2 0

3 Masters 2016 Bangalore 3 27 Male No 5 1

4 Masters 2017 Pune 3 24 Male Yes 2 1

... ... ... ... ... ... ... ... ... ...

4648 Bachelors 2013 Bangalore 3 26 Female No 4 0

4649 Masters 2013 Pune 2 37 Male No 2 1

4650 Masters 2018 New Delhi 3 27 Male No 5 1

4651 Bachelors 2012 Bangalore 3 30 Male Yes 2 0

4652 Bachelors 2015 Bangalore 3 33 Male Yes 4 0

4653 rows × 9 columns

Next steps: Generate code with df

toggle_off View recommended plots New interactive sheet

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 1/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

keyboard_arrow_down Categorical to Numerical Data

1 df["Education"].unique()

array(['Bachelors', 'Masters', 'PHD'], dtype=object)

1 def replace_education(education):
2 if education=='Bachelors':
3 return 0
4 elif education=='Masters':
5 return 1
6 else:
7 return 2
8 df['Education']=df['Education'].apply(replace_education)

1 df.head()

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

0 0 2017 Bangalore 3 34 Male No 0 0

1 0 2013 Pune 1 28 Female No 3 1

2 0 2014 New Delhi 3 38 Female No 2 0

3 1 2016 Bangalore 3 27 Male No 5 1

4 1 2017 Pune 3 24 Male Yes 2 1

Next steps: Generate code with df

toggle_off View recommended plots New interactive sheet

1 df['City'].unique()

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 2/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

array(['Bangalore', 'Pune', 'New Delhi'], dtype=object)

1 def replace_city(city):
2 if city == 'Bangalore':
3 return 0
4 elif city == 'Pune':
5 return 1
6 elif city == 'New Delhi':
7 return 2
8 else:
9 return 3
10 df['City']=df['City'].apply(replace_city)

1 df.head()

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

0 0 2017 0 3 34 Male No 0 0

1 0 2013 1 1 28 Female No 3 1

2 0 2014 2 3 38 Female No 2 0

3 1 2016 0 3 27 Male No 5 1

4 1 2017 1 3 24 Male Yes 2 1

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

1 def replace_gender(gender):
2 if gender == 'Male':
3 return 0
4 elif gender == 'Female':
5 return 1
6 else:

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 3/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

7 return 2
8 df['G d '] df['G d '] l ( l d )
1 df.head()

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

0 0 2017 0 3 34 0 No 0 0

1 0 2013 1 1 28 1 No 3 1

2 0 2014 2 3 38 1 No 2 0

3 1 2016 0 3 27 0 No 5 1

4 1 2017 1 3 24 0 Yes 2 1

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

1 def replace_bench_status(status):
2 """
3 Convert EverBenched status to numerical values
4 No: 0
5 Yes: 1
6 """
7 if status == 'No':
8 return 0
9 elif status == 'Yes':
10 return 1
11 else:
12 return 2
13 df['EverBenched']=df['EverBenched'].apply(replace_bench_status)

1 df.head(10)

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 4/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

0 0 2017 0 3 34 0 0 0 0

1 0 2013 1 1 28 1 0 3 1

2 0 2014 2 3 38 1 0 2 0

3 1 2016 0 3 27 0 0 5 1

4 1 2017 1 3 24 0 1 2 1

5 0 2016 0 3 22 0 0 0 0

6 0 2015 2 3 38 0 0 0 0

7 0 2016 0 3 34 1 0 2 1

8 0 2016 1 3 23 0 0 1 0

9 1 2017 2 2 37 0 0 2 0

Next steps: Generate code with df

toggle_off View recommended plots New interactive sheet

1 df['JoiningYear'].unique()
2

array([2017, 2013, 2014, 2016, 2015, 2012, 2018])

1 df["Age"]=df["Age"].apply(lambda v: (v-df["Age"].min()))/(df["Age"].max()-df["Age"].min())

1 df.head()

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 5/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

0 0 2017 0 3 0.631579 0 0 0 0

1 0 2013 1 1 0.315789 1 0 3 1

2 0 2014 2 3 0.842105 1 0 2 0

3 1 2016 0 3 0.263158 0 0 5 1

4 1 2017 1 3 0.105263 0 1 2 1

Next steps: Generate code with df

toggle_off View recommended plots New interactive sheet

1 df.to_csv("/content/employee_preprocessed.csv",index=False)
2 new_df=pd.read_csv("/content/employee_preprocessed.csv")
3 new_df.head()

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

0 0 2017 0 3 0.631579 0 0 0 0

1 0 2013 1 1 0.315789 1 0 3 1

2 0 2014 2 3 0.842105 1 0 2 0

3 1 2016 0 3 0.263158 0 0 5 1

4 1 2017 1 3 0.105263 0 1 2 1

Next steps: Generate code with new_df

toggle_off View recommended plots New interactive sheet

keyboard_arrow_down Correlation
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 6/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

1 new_df.corr()

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDo

Education 1.000000 0.142670 0.390890 -0.140741 -0.010611 0.010889 -0.052249 -0.00

JoiningYear 0.142670 1.000000 0.138264 -0.096078 0.013165 0.012213 0.049353 -0.03

City 0.390890 0.138264 1.000000 -0.232683 -0.041364 0.209442 -0.026699 -0.01

PaymentTier -0.140741 -0.096078 -0.232683 1.000000 0.007631 -0.235119 0.019207 0.01

Age -0.010611 0.013165 -0.041364 0.007631 1.000000 0.003866 -0.016135 -0.13

Gender 0.010889 0.012213 0.209442 -0.235119 0.003866 1.000000 -0.019653 -0.00

EverBenched -0.052249 0.049353 -0.026699 0.019207 -0.016135 -0.019653 1.000000 0.00

ExperienceInCurrentDomain -0.004463 -0.036525 -0.011093 0.018314 -0.134643 -0.008745 0.001408 1.00

LeaveOrNot 0.080497 0.181705 0.076730 -0.197638 -0.051126 0.220701 0.078438 -0.03

1 plt.figure(figsize=(20, 15))

<Figure size 2000x1500 with 0 Axes>

Results and Analysis

keyboard_arrow_down 1. Education Distribution

1 # 1. Education Distribution
2 new_df['Education']=new_df['Education'].replace({0:"Bachelors",1:"Masters",2:"PHD"})
3 education_distribution = new_df['Education'].value_counts()

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 7/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

4 print(education_distribution)
5 education_distribution.plot(kind='bar', color='skyblue')
6 plt.title('Distribution of Educational Qualifications')
7 plt.xlabel('Education Qualification')
8 plt.ylabel('Number of Employees')
9 plt.xticks(rotation=45)
10 plt.show()

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 8/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

Education
Bachelors 3601
Masters 873
PHD 179
Name: count, dtype: int64

keyboard_arrow_down 2. Service Length by City

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 9/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

1 # 2. Service Length by City

2 plt.subplot(2, 3, 2)
3 sns.boxplot(data=new_df, x='City', y='JoiningYear')
4 plt.title('Joining Year Distribution Across Cities')
5 plt.xlabel('City')
6 plt.ylabel('Joining Year')

Text(0, 0.5, 'Joining Year')

keyboard_arrow_down 3. Payment Tier vs Experience Scatter Plot

1 # 3. Payment Tier vs Experience Scatter Plot
2 plt.subplot(2, 3, 3)
3 sns.scatterplot(data=new_df, x='ExperienceInCurrentDomain', y='PaymentTier')
4 plt.title('Payment Tier vs Experience')
5 plt.xlabel('Experience in Current Domain')
6 plt.ylabel('Payment Tier')

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 10/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

Text(0, 0.5, 'Payment Tier')

keyboard_arrow_down 4. Gender Distribution

1 # 4. Gender Distribution
2 plt.subplot(2, 3, 4)
3 sns.countplot(data=new_df, x='Gender')
4 plt.title('Gender Distribution')
5 plt.xlabel('Gender (0: Female, 1: Male)')
6 plt.ylabel('Count')

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 11/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

Text(0, 0.5, 'Count')

keyboard_arrow_down 5. Leave Analysis

1 # 5. Leave Analysis
2 plt.subplot(2, 3, 5)
3 sns.countplot(data=new_df, x='LeaveOrNot')
4 plt.title('Leave Distribution')
5 plt.xlabel('Leave (0: No, 1: Yes)')
6 plt.ylabel('Count')

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 12/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

Text(0, 0.5, 'Count')

1 import matplotlib.pyplot as plt

2 import seaborn as sns
3
4 plt.figure(figsize=(10, 6))
5 sns.countplot(x='LeaveOrNot', hue='Gender', data=new_df)
6 plt.title('Leave Status Distribution by Gender')
7 plt.xlabel('Leave Status')
8 plt.ylabel('Number of Employees')
9 plt.legend(title='Gender')
10 plt.show()

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 13/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

1 # Create age groups

2 bins = [20, 30, 40, 50, 60, 70] # Adjust bins as per your dataset
3 labels = ['20-30', '30-40', '40-50', '50-60', '60-70']
4 new_df['Age Group'] = pd.cut(new_df['Age'], bins=bins, labels=labels)
5
6 plt.figure(figsize=(10, 6))
7 sns.countplot(x='Age', hue='LeaveOrNot', data=new_df)
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 14/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

8 plt.title('Leave Status by Age Group')

9 plt.xlabel('Age Group')
10 plt.ylabel('Number of Employees')
11 plt.legend(title='Leave Status')
12 plt.show()

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 15/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

1 plt.figure(figsize=(10, 6))
2 sns.countplot(x='PaymentTier', hue='LeaveOrNot', data=new_df)
3 plt.title('Leave Status by Payment Tier')
4 plt.xlabel('Payment Tier')
5 plt.ylabel('Number of Employees')
6 plt.legend(title='Leave Status')
7 plt.show()

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 16/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

1 # Adjust layout
2 plt.tight_layout()
3 plt.show()

<Figure size 800x550 with 0 Axes>

keyboard_arrow_down 6. Scatter Plot, Correlation heatmap, Histograms

1 # Create a 3D scatter plot
2 fig = plt.figure(figsize=(10, 8))
3 ax = fig.add_subplot(111, projection='3d')
4
5 scatter = ax.scatter(new_df['ExperienceInCurrentDomain'],
6 new_df['PaymentTier'],
7 new_df['Age'],
8 c=new_df['LeaveOrNot'],
9 cmap='viridis')
10
11 ax.set_xlabel('Experience')
12 ax.set_ylabel('Payment Tier')
13 ax.set_zlabel('Age')
14 plt.colorbar(scatter, label='Leave or Not')
15 plt.title('3D Scatter Plot: Experience, Payment Tier, and Age')
16 plt.show()

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 17/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 18/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

1 # Create a correlation heatmap

2 plt.figure(figsize=(10, 8))
3 sns.heatmap(new_df.corr(), annot=True, cmap='coolwarm', center=0)
4 plt.title('Correlation Heatmap')
5 plt.show()

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 19/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab

1 # Create histograms for numerical variables

2 new_df[['Age', 'ExperienceInCurrentDomain']].hist(bins=10, figsize=(10, 4))
3 plt.suptitle('Distributions of Age and Experience')
4 plt.tight_layout()
5 plt.show()

https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 20/20

Working-With-Csv Cheatsheet
No ratings yet
Working-With-Csv Cheatsheet
10 pages
Exp 343
No ratings yet
Exp 343
18 pages
Complete Case Analysis (CCA) : Advantages
No ratings yet
Complete Case Analysis (CCA) : Advantages
6 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
Ss
No ratings yet
Ss
9 pages
Data Overview: 25480 Entries
No ratings yet
Data Overview: 25480 Entries
11 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Animesh Jain
No ratings yet
Animesh Jain
13 pages
Pandas
No ratings yet
Pandas
35 pages
HACKATHON
No ratings yet
HACKATHON
8 pages
Data Visualization - Ipynb - Colab
No ratings yet
Data Visualization - Ipynb - Colab
9 pages
Absenteeism Module
No ratings yet
Absenteeism Module
2 pages
Campus Recruitment Analysis
No ratings yet
Campus Recruitment Analysis
18 pages
Modelling and Simmulation Assignment - Ipynb - Colab
No ratings yet
Modelling and Simmulation Assignment - Ipynb - Colab
7 pages
EDA - Session-1 - Basic Dataframe Opertaions-1
No ratings yet
EDA - Session-1 - Basic Dataframe Opertaions-1
7 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Graph For 101-200
No ratings yet
Graph For 101-200
3 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
56 pages
Data Science Project
No ratings yet
Data Science Project
19 pages
Zindi Financial Inclusion Guide
No ratings yet
Zindi Financial Inclusion Guide
12 pages
Demographic Analyze
No ratings yet
Demographic Analyze
32 pages
Asg One
No ratings yet
Asg One
10 pages
Data Analysis for Workforce Insights
No ratings yet
Data Analysis for Workforce Insights
12 pages
Eda - 1@3pm 8th Nov
No ratings yet
Eda - 1@3pm 8th Nov
2 pages
Data Preprocessing 1
No ratings yet
Data Preprocessing 1
6 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
Churn Prediction - Commercial Use of Data Science
No ratings yet
Churn Prediction - Commercial Use of Data Science
25 pages
Demographic Analyze
No ratings yet
Demographic Analyze
31 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
23 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
Unemployment Inferschema
No ratings yet
Unemployment Inferschema
74 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Day89 90 Loan Predictions Model 1706059551
No ratings yet
Day89 90 Loan Predictions Model 1706059551
25 pages
ML Projects
No ratings yet
ML Projects
22 pages
Pandas
No ratings yet
Pandas
32 pages
Naan Mudhalvan Assisgnment
No ratings yet
Naan Mudhalvan Assisgnment
6 pages
Salary Analysis Using ANOVA & PCA
No ratings yet
Salary Analysis Using ANOVA & PCA
16 pages
Employee Management Project
No ratings yet
Employee Management Project
33 pages
I Love Merge
No ratings yet
I Love Merge
56 pages
DW 14
No ratings yet
DW 14
14 pages
Source Code55
No ratings yet
Source Code55
18 pages
Assignment 1
No ratings yet
Assignment 1
12 pages
Dsba Project Main Et Easyvisa
No ratings yet
Dsba Project Main Et Easyvisa
46 pages
Employee Management System Overview
No ratings yet
Employee Management System Overview
29 pages
Exploratory Data Analysis and Preprocessing Pipeline
No ratings yet
Exploratory Data Analysis and Preprocessing Pipeline
18 pages
188 Code Tugas 1
No ratings yet
188 Code Tugas 1
18 pages
Python Pandas: 12 Data Manipulation Techniques
100% (2)
Python Pandas: 12 Data Manipulation Techniques
19 pages
Satya772244@gmail Compdf
No ratings yet
Satya772244@gmail Compdf
7 pages
ICT2103 Full Book-Part-3
No ratings yet
ICT2103 Full Book-Part-3
14 pages
Analyzing Student Performance in Exams Using Python
No ratings yet
Analyzing Student Performance in Exams Using Python
11 pages
4.3. Handling Missing Values - Ipynb - Colaboratory
No ratings yet
4.3. Handling Missing Values - Ipynb - Colaboratory
2 pages
Week 4 LAB
No ratings yet
Week 4 LAB
26 pages
Srivardhan Python
No ratings yet
Srivardhan Python
25 pages
Import Import As Import As: #Default To CSV
No ratings yet
Import Import As Import As: #Default To CSV
6 pages
Logistic Binary Classification
No ratings yet
Logistic Binary Classification
3 pages
Employee Data Analysis Report
No ratings yet
Employee Data Analysis Report
22 pages
EDA Basics: Python for Data Analysis
100% (1)
EDA Basics: Python for Data Analysis
30 pages
PySpark Slides
No ratings yet
PySpark Slides
30 pages
Class 12 Computer Science Project Report On Library Management
No ratings yet
Class 12 Computer Science Project Report On Library Management
27 pages
Battletech AgeOfWar - Ods
100% (1)
Battletech AgeOfWar - Ods
3 pages
Placements 2022-2023
No ratings yet
Placements 2022-2023
1 page
Brksec 2023
No ratings yet
Brksec 2023
82 pages
Personalized Email Classifier Organize Your Inbox Your Way
No ratings yet
Personalized Email Classifier Organize Your Inbox Your Way
6 pages
Series: Programmable Logic Controller
No ratings yet
Series: Programmable Logic Controller
6 pages
Physical Inventory Count Memo: Roles and Responsibilities (See Roles and Responsibilities Excel File) For Assignments)
No ratings yet
Physical Inventory Count Memo: Roles and Responsibilities (See Roles and Responsibilities Excel File) For Assignments)
4 pages
"Online Sport Shop": Zeal Institute of Business Administration, Computer Application & Research
No ratings yet
"Online Sport Shop": Zeal Institute of Business Administration, Computer Application & Research
4 pages
A9 Mini WiFi Camera User Manual
No ratings yet
A9 Mini WiFi Camera User Manual
27 pages
Thevenin's Theorem Explained
No ratings yet
Thevenin's Theorem Explained
16 pages
Unsafe Scaffolding Observations Report
No ratings yet
Unsafe Scaffolding Observations Report
6 pages
FSP-5000-RPS / Rel. 4.1: Panel Programming Software / Firmware Update
No ratings yet
FSP-5000-RPS / Rel. 4.1: Panel Programming Software / Firmware Update
2 pages
CHC Manual Colector LT30 Getting Started en
No ratings yet
CHC Manual Colector LT30 Getting Started en
14 pages
Software Testing Books
No ratings yet
Software Testing Books
13 pages
Rajib Mall Lecture Notes
No ratings yet
Rajib Mall Lecture Notes
136 pages
Lutron TM-946 PDF
No ratings yet
Lutron TM-946 PDF
2 pages
Real-Time Employee Attendance Monitoring With Mobile App
No ratings yet
Real-Time Employee Attendance Monitoring With Mobile App
5 pages
Lecture 2 - BJT
No ratings yet
Lecture 2 - BJT
37 pages
NGO Darpan
No ratings yet
NGO Darpan
13 pages
ACI Design Guide
No ratings yet
ACI Design Guide
137 pages
Junior Capabilities & Insight Analyst - FTE 2024
No ratings yet
Junior Capabilities & Insight Analyst - FTE 2024
2 pages
Your Personal SWOT Analysis and Preparing Your Curriculum Vitae
No ratings yet
Your Personal SWOT Analysis and Preparing Your Curriculum Vitae
8 pages
Mathematics Basics Overview
No ratings yet
Mathematics Basics Overview
12 pages
IT ES308 IU 2F Datasheet - SWITCH ETHERNET UNMANAGED INDUSTRIAL
No ratings yet
IT ES308 IU 2F Datasheet - SWITCH ETHERNET UNMANAGED INDUSTRIAL
2 pages
JavaScript for Frontend Beginners
No ratings yet
JavaScript for Frontend Beginners
2 pages
Priyanshu Rawat Resume - Docx 20240526 125947 0000
No ratings yet
Priyanshu Rawat Resume - Docx 20240526 125947 0000
2 pages
15672346185d6a1a3a13c52TH JLG Dharangaon
No ratings yet
15672346185d6a1a3a13c52TH JLG Dharangaon
1 page
iPECS IP Phone - 1000i Series: Ericsson-LG Enterprise
No ratings yet
iPECS IP Phone - 1000i Series: Ericsson-LG Enterprise
26 pages
UC900 SS23 Cat.7 LSH-FR C S1d1a1
No ratings yet
UC900 SS23 Cat.7 LSH-FR C S1d1a1
3 pages
Group Decision Support Systems Guide
100% (1)
Group Decision Support Systems Guide
29 pages

Employee - Preprocessing - Ipynb - Colab

Uploaded by

Employee - Preprocessing - Ipynb - Colab

Uploaded by

10/23/24, 8:44 PM Employee_Preprocessing.

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

0 Bachelors 2017 Bangalore 3 34 Male No 0 0

1 Bachelors 2013 Pune 1 28 Female No 3 1

2 Bachelors 2014 New Delhi 3 38 Female No 2 0

3 Masters 2016 Bangalore 3 27 Male No 5 1

4 Masters 2017 Pune 3 24 Male Yes 2 1

4648 Bachelors 2013 Bangalore 3 26 Female No 4 0

4649 Masters 2013 Pune 2 37 Male No 2 1

4650 Masters 2018 New Delhi 3 27 Male No 5 1

4651 Bachelors 2012 Bangalore 3 30 Male Yes 2 0

4652 Bachelors 2015 Bangalore 3 33 Male Yes 4 0

4653 rows × 9 columns

Next steps: Generate code with df

keyboard_arrow_down Categorical to Numerical Data

array(['Bachelors', 'Masters', 'PHD'], dtype=object)

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

0 0 2017 Bangalore 3 34 Male No 0 0

1 0 2013 Pune 1 28 Female No 3 1

2 0 2014 New Delhi 3 38 Female No 2 0

3 1 2016 Bangalore 3 27 Male No 5 1

4 1 2017 Pune 3 24 Male Yes 2 1

Next steps: Generate code with df

array(['Bangalore', 'Pune', 'New Delhi'], dtype=object)

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

4 1 2017 1 3 24 Male Yes 2 1

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

Next steps: Generate code with df

array([2017, 2013, 2014, 2016, 2015, 2012, 2018])

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

Next steps: Generate code with df

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot

Next steps: Generate code with new_df

Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDo

Education 1.000000 0.142670 0.390890 -0.140741 -0.010611 0.010889 -0.052249 -0.00

JoiningYear 0.142670 1.000000 0.138264 -0.096078 0.013165 0.012213 0.049353 -0.03

City 0.390890 0.138264 1.000000 -0.232683 -0.041364 0.209442 -0.026699 -0.01

PaymentTier -0.140741 -0.096078 -0.232683 1.000000 0.007631 -0.235119 0.019207 0.01

Age -0.010611 0.013165 -0.041364 0.007631 1.000000 0.003866 -0.016135 -0.13

Gender 0.010889 0.012213 0.209442 -0.235119 0.003866 1.000000 -0.019653 -0.00

EverBenched -0.052249 0.049353 -0.026699 0.019207 -0.016135 -0.019653 1.000000 0.00

ExperienceInCurrentDomain -0.004463 -0.036525 -0.011093 0.018314 -0.134643 -0.008745 0.001408 1.00

LeaveOrNot 0.080497 0.181705 0.076730 -0.197638 -0.051126 0.220701 0.078438 -0.03

<Figure size 2000x1500 with 0 Axes>

Results and Analysis

keyboard_arrow_down 1. Education Distribution

keyboard_arrow_down 2. Service Length by City

1 # 2. Service Length by City

Text(0, 0.5, 'Joining Year')

keyboard_arrow_down 3. Payment Tier vs Experience Scatter Plot

Text(0, 0.5, 'Payment Tier')

keyboard_arrow_down 4. Gender Distribution

Text(0, 0.5, 'Count')

keyboard_arrow_down 5. Leave Analysis

Text(0, 0.5, 'Count')

1 import matplotlib.pyplot as plt

1 # Create age groups

8 plt.title('Leave Status by Age Group')

<Figure size 800x550 with 0 Axes>

keyboard_arrow_down 6. Scatter Plot, Correlation heatmap, Histograms

1 # Create a correlation heatmap

1 # Create histograms for numerical variables

You might also like