10/23/24, 8:44 PM Employee_Preprocessing.
ipynb - Colab
1 import pandas as pd
2 import numpy as np
3 from sklearn.preprocessing import LabelEncoder, StandardScaler
4 import seaborn as sns
5 import matplotlib.pyplot as plt
6 from mpl_toolkits.mplot3d import Axes3D
1 df = pd.read_csv('/content/Employee.csv')
1 df
Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot
0 Bachelors 2017 Bangalore 3 34 Male No 0 0
1 Bachelors 2013 Pune 1 28 Female No 3 1
2 Bachelors 2014 New Delhi 3 38 Female No 2 0
3 Masters 2016 Bangalore 3 27 Male No 5 1
4 Masters 2017 Pune 3 24 Male Yes 2 1
... ... ... ... ... ... ... ... ... ...
4648 Bachelors 2013 Bangalore 3 26 Female No 4 0
4649 Masters 2013 Pune 2 37 Male No 2 1
4650 Masters 2018 New Delhi 3 27 Male No 5 1
4651 Bachelors 2012 Bangalore 3 30 Male Yes 2 0
4652 Bachelors 2015 Bangalore 3 33 Male Yes 4 0
4653 rows × 9 columns
Next steps: Generate code with df
toggle_off View recommended plots New interactive sheet
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 1/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
keyboard_arrow_down Categorical to Numerical Data
1 df["Education"].unique()
array(['Bachelors', 'Masters', 'PHD'], dtype=object)
1 def replace_education(education):
2 if education=='Bachelors':
3 return 0
4 elif education=='Masters':
5 return 1
6 else:
7 return 2
8 df['Education']=df['Education'].apply(replace_education)
1 df.head()
Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot
0 0 2017 Bangalore 3 34 Male No 0 0
1 0 2013 Pune 1 28 Female No 3 1
2 0 2014 New Delhi 3 38 Female No 2 0
3 1 2016 Bangalore 3 27 Male No 5 1
4 1 2017 Pune 3 24 Male Yes 2 1
Next steps: Generate code with df
toggle_off View recommended plots New interactive sheet
1 df['City'].unique()
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 2/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
array(['Bangalore', 'Pune', 'New Delhi'], dtype=object)
1 def replace_city(city):
2 if city == 'Bangalore':
3 return 0
4 elif city == 'Pune':
5 return 1
6 elif city == 'New Delhi':
7 return 2
8 else:
9 return 3
10 df['City']=df['City'].apply(replace_city)
1 df.head()
Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot
0 0 2017 0 3 34 Male No 0 0
1 0 2013 1 1 28 Female No 3 1
2 0 2014 2 3 38 Female No 2 0
3 1 2016 0 3 27 Male No 5 1
4 1 2017 1 3 24 Male Yes 2 1
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
1 def replace_gender(gender):
2 if gender == 'Male':
3 return 0
4 elif gender == 'Female':
5 return 1
6 else:
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 3/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
7 return 2
8 df['G d '] df['G d '] l ( l d )
1 df.head()
Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot
0 0 2017 0 3 34 0 No 0 0
1 0 2013 1 1 28 1 No 3 1
2 0 2014 2 3 38 1 No 2 0
3 1 2016 0 3 27 0 No 5 1
4 1 2017 1 3 24 0 Yes 2 1
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
1 def replace_bench_status(status):
2 """
3 Convert EverBenched status to numerical values
4 No: 0
5 Yes: 1
6 """
7 if status == 'No':
8 return 0
9 elif status == 'Yes':
10 return 1
11 else:
12 return 2
13 df['EverBenched']=df['EverBenched'].apply(replace_bench_status)
1 df.head(10)
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 4/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot
0 0 2017 0 3 34 0 0 0 0
1 0 2013 1 1 28 1 0 3 1
2 0 2014 2 3 38 1 0 2 0
3 1 2016 0 3 27 0 0 5 1
4 1 2017 1 3 24 0 1 2 1
5 0 2016 0 3 22 0 0 0 0
6 0 2015 2 3 38 0 0 0 0
7 0 2016 0 3 34 1 0 2 1
8 0 2016 1 3 23 0 0 1 0
9 1 2017 2 2 37 0 0 2 0
Next steps: Generate code with df
toggle_off View recommended plots New interactive sheet
1 df['JoiningYear'].unique()
2
array([2017, 2013, 2014, 2016, 2015, 2012, 2018])
1 df["Age"]=df["Age"].apply(lambda v: (v-df["Age"].min()))/(df["Age"].max()-df["Age"].min())
1 df.head()
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 5/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot
0 0 2017 0 3 0.631579 0 0 0 0
1 0 2013 1 1 0.315789 1 0 3 1
2 0 2014 2 3 0.842105 1 0 2 0
3 1 2016 0 3 0.263158 0 0 5 1
4 1 2017 1 3 0.105263 0 1 2 1
Next steps: Generate code with df
toggle_off View recommended plots New interactive sheet
1 df.to_csv("/content/employee_preprocessed.csv",index=False)
2 new_df=pd.read_csv("/content/employee_preprocessed.csv")
3 new_df.head()
Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDomain LeaveOrNot
0 0 2017 0 3 0.631579 0 0 0 0
1 0 2013 1 1 0.315789 1 0 3 1
2 0 2014 2 3 0.842105 1 0 2 0
3 1 2016 0 3 0.263158 0 0 5 1
4 1 2017 1 3 0.105263 0 1 2 1
Next steps: Generate code with new_df
toggle_off View recommended plots New interactive sheet
keyboard_arrow_down Correlation
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 6/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
1 new_df.corr()
Education JoiningYear City PaymentTier Age Gender EverBenched ExperienceInCurrentDo
Education 1.000000 0.142670 0.390890 -0.140741 -0.010611 0.010889 -0.052249 -0.00
JoiningYear 0.142670 1.000000 0.138264 -0.096078 0.013165 0.012213 0.049353 -0.03
City 0.390890 0.138264 1.000000 -0.232683 -0.041364 0.209442 -0.026699 -0.01
PaymentTier -0.140741 -0.096078 -0.232683 1.000000 0.007631 -0.235119 0.019207 0.01
Age -0.010611 0.013165 -0.041364 0.007631 1.000000 0.003866 -0.016135 -0.13
Gender 0.010889 0.012213 0.209442 -0.235119 0.003866 1.000000 -0.019653 -0.00
EverBenched -0.052249 0.049353 -0.026699 0.019207 -0.016135 -0.019653 1.000000 0.00
ExperienceInCurrentDomain -0.004463 -0.036525 -0.011093 0.018314 -0.134643 -0.008745 0.001408 1.00
LeaveOrNot 0.080497 0.181705 0.076730 -0.197638 -0.051126 0.220701 0.078438 -0.03
1 plt.figure(figsize=(20, 15))
<Figure size 2000x1500 with 0 Axes>
<Figure size 2000x1500 with 0 Axes>
Results and Analysis
keyboard_arrow_down 1. Education Distribution
1 # 1. Education Distribution
2 new_df['Education']=new_df['Education'].replace({0:"Bachelors",1:"Masters",2:"PHD"})
3 education_distribution = new_df['Education'].value_counts()
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 7/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
4 print(education_distribution)
5 education_distribution.plot(kind='bar', color='skyblue')
6 plt.title('Distribution of Educational Qualifications')
7 plt.xlabel('Education Qualification')
8 plt.ylabel('Number of Employees')
9 plt.xticks(rotation=45)
10 plt.show()
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 8/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
Education
Bachelors 3601
Masters 873
PHD 179
Name: count, dtype: int64
keyboard_arrow_down 2. Service Length by City
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 9/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
1 # 2. Service Length by City
2 plt.subplot(2, 3, 2)
3 sns.boxplot(data=new_df, x='City', y='JoiningYear')
4 plt.title('Joining Year Distribution Across Cities')
5 plt.xlabel('City')
6 plt.ylabel('Joining Year')
Text(0, 0.5, 'Joining Year')
keyboard_arrow_down 3. Payment Tier vs Experience Scatter Plot
1 # 3. Payment Tier vs Experience Scatter Plot
2 plt.subplot(2, 3, 3)
3 sns.scatterplot(data=new_df, x='ExperienceInCurrentDomain', y='PaymentTier')
4 plt.title('Payment Tier vs Experience')
5 plt.xlabel('Experience in Current Domain')
6 plt.ylabel('Payment Tier')
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 10/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
Text(0, 0.5, 'Payment Tier')
keyboard_arrow_down 4. Gender Distribution
1 # 4. Gender Distribution
2 plt.subplot(2, 3, 4)
3 sns.countplot(data=new_df, x='Gender')
4 plt.title('Gender Distribution')
5 plt.xlabel('Gender (0: Female, 1: Male)')
6 plt.ylabel('Count')
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 11/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
Text(0, 0.5, 'Count')
keyboard_arrow_down 5. Leave Analysis
1 # 5. Leave Analysis
2 plt.subplot(2, 3, 5)
3 sns.countplot(data=new_df, x='LeaveOrNot')
4 plt.title('Leave Distribution')
5 plt.xlabel('Leave (0: No, 1: Yes)')
6 plt.ylabel('Count')
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 12/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
Text(0, 0.5, 'Count')
1 import matplotlib.pyplot as plt
2 import seaborn as sns
3
4 plt.figure(figsize=(10, 6))
5 sns.countplot(x='LeaveOrNot', hue='Gender', data=new_df)
6 plt.title('Leave Status Distribution by Gender')
7 plt.xlabel('Leave Status')
8 plt.ylabel('Number of Employees')
9 plt.legend(title='Gender')
10 plt.show()
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 13/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
1 # Create age groups
2 bins = [20, 30, 40, 50, 60, 70] # Adjust bins as per your dataset
3 labels = ['20-30', '30-40', '40-50', '50-60', '60-70']
4 new_df['Age Group'] = pd.cut(new_df['Age'], bins=bins, labels=labels)
5
6 plt.figure(figsize=(10, 6))
7 sns.countplot(x='Age', hue='LeaveOrNot', data=new_df)
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 14/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
8 plt.title('Leave Status by Age Group')
9 plt.xlabel('Age Group')
10 plt.ylabel('Number of Employees')
11 plt.legend(title='Leave Status')
12 plt.show()
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 15/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
1 plt.figure(figsize=(10, 6))
2 sns.countplot(x='PaymentTier', hue='LeaveOrNot', data=new_df)
3 plt.title('Leave Status by Payment Tier')
4 plt.xlabel('Payment Tier')
5 plt.ylabel('Number of Employees')
6 plt.legend(title='Leave Status')
7 plt.show()
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 16/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
1 # Adjust layout
2 plt.tight_layout()
3 plt.show()
<Figure size 800x550 with 0 Axes>
keyboard_arrow_down 6. Scatter Plot, Correlation heatmap, Histograms
1 # Create a 3D scatter plot
2 fig = plt.figure(figsize=(10, 8))
3 ax = fig.add_subplot(111, projection='3d')
4
5 scatter = ax.scatter(new_df['ExperienceInCurrentDomain'],
6 new_df['PaymentTier'],
7 new_df['Age'],
8 c=new_df['LeaveOrNot'],
9 cmap='viridis')
10
11 ax.set_xlabel('Experience')
12 ax.set_ylabel('Payment Tier')
13 ax.set_zlabel('Age')
14 plt.colorbar(scatter, label='Leave or Not')
15 plt.title('3D Scatter Plot: Experience, Payment Tier, and Age')
16 plt.show()
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 17/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 18/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
1 # Create a correlation heatmap
2 plt.figure(figsize=(10, 8))
3 sns.heatmap(new_df.corr(), annot=True, cmap='coolwarm', center=0)
4 plt.title('Correlation Heatmap')
5 plt.show()
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 19/20
10/23/24, 8:44 PM Employee_Preprocessing.ipynb - Colab
1 # Create histograms for numerical variables
2 new_df[['Age', 'ExperienceInCurrentDomain']].hist(bins=10, figsize=(10, 4))
3 plt.suptitle('Distributions of Age and Experience')
4 plt.tight_layout()
5 plt.show()
https://colab.research.google.com/drive/1tWnpTY_yOiEgCIEt4fzEFJQ6TLjqirIu#scrollTo=s6IVoT8z7dxG&printMode=true 20/20