0% found this document useful (0 votes)
19 views4 pages

Assignment2 Datamining

The report presents data modifications made to the Employee Attrition and Factors dataset, including the removal of irrelevant columns, handling missing values, encoding categorical variables, and outlier management. It also details feature engineering efforts, such as creating new features to analyze employee retention and earnings. Additionally, visualizations are provided to illustrate attrition rates by department and the distribution of monthly income, aiding in the analysis of employee retention strategies.

Uploaded by

Esmael Elkot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views4 pages

Assignment2 Datamining

The report presents data modifications made to the Employee Attrition and Factors dataset, including the removal of irrelevant columns, handling missing values, encoding categorical variables, and outlier management. It also details feature engineering efforts, such as creating new features to analyze employee retention and earnings. Additionally, visualizations are provided to illustrate attrition rates by department and the distribution of monthly income, aiding in the analysis of employee retention strategies.

Uploaded by

Esmael Elkot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Assiut University

Faculty of computers and informa on

Report in Data Mining

Dataset name:
Employee A ri on and Factors

Student name:
Esmail Mohamed Esmail Elkot
Sec on : CS 2
Data Modifica ons
 Dropped Irrelevant Columns:
o Removed EmployeeCount, Over18, StandardHours, and EmployeeNumber as
these columns did not add analy cal value (e.g., StandardHours was the same for
all entries).

 Handling Missing Values:


o Used NumPy to fill any missing values in numerical columns with the column
mean.

 Encoding Categorical Variables:


o Converted binary categorical columns (A ri on and Gender) into numerical
values for analysis, mapping them to binary values (e.g., 1 for Yes/Male, 0 for
No/Female).
o Applied one-hot encoding to columns with mul ple categories, such as
BusinessTravel, Department, Educa onField, JobRole, MaritalStatus, and
OverTime, to convert them into numerical features.

 Outlier Handling:
o Capped the MonthlyIncome column at the 95th percen le to manage
outliers, limi ng the impact of extreme values on analysis.
 Feature Engineering:
o Created a new feature called YearsInRoleRa o, which is the ra o of
YearsInCurrentRole to YearsAtCompany + 1. This feature helps to understand
how long employees stay in their roles rela ve to their total me in the
company.
o Created a binary column HighEarner to indicate whether an employee earns
above a certain threshold (mean + standard devia on), helping iden fy high
earners.
 Using data frame func ons:
o Shape
o Size
o Head
o Tail
o Describe
o Series kind(box)

Visualiza ons and Insights

 Visualiza on 1: A ri on Rate by Department


o Created a bar chart to show the a ri on rate for each department.
o Explana on: This visualiza on helps understand which departments
have the highest a ri on rates, allowing the organiza on to iden fy
areas that may need a en on for employee reten on.
 Visualiza on 2: Monthly Income Distribu on
o Created a histogram with capped values for MonthlyIncome to view
the distribu on of income across employees.
o Explana on: This helps analyze salary distribu on pa erns and
iden fy salary ranges. Capping at the 95th percen le provides a
clearer view without distor on from outliers.

You might also like