0% found this document useful (0 votes)

16 views5 pages

Sample Discovery

asdfghjkl;'

Uploaded by

Rishtha Kothuri03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views5 pages

Sample Discovery

asdfghjkl;'

Uploaded by

Rishtha Kothuri03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Sample_Discovery

August 6, 2024

[1]: import pandas as pd

# Load the dataset

df = pd.read_csv('messy_employee_data.csv')

# Descriptive Statistics
descriptive_stats = df.describe()
print(descriptive_stats)

Employee_ID Phone_Number Total_Work_Hour_per_Month Salary_per_Month

count 100.000000 9.000000e+01 79.000000 74.000000
mean 5874.450000 5.014917e+09 2631.392405 24305.405405
std 2698.235527 2.946591e+09 4317.578213 37333.773655
min 1139.000000 2.058916e+08 -10.000000 -5000.000000
25% 3639.250000 2.417400e+09 160.000000 8000.000000
50% 6168.000000 4.735941e+09 180.000000 10800.000000
75% 8183.250000 7.711621e+09 5099.500000 12600.000000
max 9957.000000 9.871025e+09 9999.000000 100000.000000

[2]: # Completeness Metric

completeness = df.notnull().mean() * 100
print("Completeness Metric:\n", completeness)

Completeness Metric:
Employee_Name 90.0
Employee_ID 100.0
Job_Role 91.0
Phone_Number 90.0
Email_ID 90.0
Total_Work_Hour_per_Month 79.0
Salary_per_Month 74.0
dtype: float64

[3]: # Accuracy Metric for Salary_per_Month (ensure non-negative values)

accuracy_salary = (df['Salary_per_Month'] >= 0).mean() * 100
print(f"Accuracy for Salary per Month: {accuracy_salary}%")

Accuracy for Salary per Month: 59.0%

1
[4]: # Consistency Metric for Job_Role
valid_job_roles = ['Engineer', 'Data Scientist', 'Manager', 'Analyst',␣
,→'Developer']

consistency_job_role = df['Job_Role'].isin(valid_job_roles).mean() * 100

print(f"Consistency for Job Role: {consistency_job_role}%")

Consistency for Job Role: 91.0%

[5]: # Correlation between Salary and Total Work Hours

correlation = df[['Total_Work_Hour_per_Month', 'Salary_per_Month']].corr()
print("Correlation:\n", correlation)

Correlation:
Total_Work_Hour_per_Month Salary_per_Month
Total_Work_Hour_per_Month 1.000000 0.600796
Salary_per_Month 0.600796 1.000000

[6]: import seaborn as sns

import matplotlib.pyplot as plt

# Plot the heatmap

plt.figure(figsize=(10, 8))
sns.heatmap(correlation, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Matrix Heatmap')
plt.show()

2
[7]: # Missing Data Analysis
missing_values_count = df.isnull().sum()
#missing_data_pattern = df.isnull()
print("Missing Values Count:\n", missing_values_count)
#print("Missing Data Pattern:\n", missing_data_pattern)

Missing Values Count:

Employee_Name 10
Employee_ID 0
Job_Role 9
Phone_Number 10
Email_ID 10
Total_Work_Hour_per_Month 21
Salary_per_Month 26
dtype: int64

3
[14]: # Fill missing values
df['Employee_Name'].fillna('Unknown', inplace=True)
df['Employee_ID'].fillna('Unknown', inplace=True)
df['Phone_Number'].replace('unknown', None, inplace=True)
df['Phone_Number'].fillna('Unknown', inplace=True)
df['Total_Work_Hour_per_Month'].fillna(df['Total_Work_Hour_per_Month'].mean(),␣
,→inplace=True)

df['Salary_per_Month'].fillna(df['Salary_per_Month'].mean(), inplace=True)

# Correct data types

df['Employee_ID'] = df['Employee_ID'].astype(str)
df['Total_Work_Hour_per_Month'] = df['Total_Work_Hour_per_Month'].astype(float)
df['Salary_per_Month'] = df['Salary_per_Month'].astype(float)
#df['Phone_Number']= df['Phone_Number'].astype(str)
# Normalize Email IDs
df['Email_ID'] = df['Email_ID'].apply(lambda x: x if pd.isna(x) or '@' in x else␣
,→x + '@example.com')

# Ensure consistent job role naming

df['Job_Role'] = df['Job_Role'].str.title()

# Format phone numbers (dummy formatting for demonstration)

df['Phone_Number'] = df['Phone_Number'].apply(lambda x: x if pd.isna(x) or x ==␣
,→'Unknown' else x.replace('-', ''))

df['Phone_Number'] = df['Phone_Number'].astype(str)
# Ensure Employee Names are title case
df['Employee_Name'] = df['Employee_Name'].str.title()

# Remove duplicate rows (if any)

df.drop_duplicates(inplace=True)

print("Structured and Formatted Dataset:\n", df)

Structured and Formatted Dataset:

Employee_Name Employee_ID Job_Role Phone_Number Email_ID
\
0 Ospjpqptpe 9516 Manager 4733377351.0 [email protected]
1 Arwarmgzmo 8444 Manager Unknown [email protected]
2 Qwxbncqkag 2420 Manager Unknown [email protected]
3 Akmmthjndy 3445 Developer 4901793467.0 [email protected]
4 Croxaopkbi 9378 Manager 8679802795.0 [email protected]
.. ... ... ... ... ...
95 Nsgdvhcolz 3727 Analyst 3957223288.0 [email protected]
96 Unknown 3857 Data Scientist 7050739609.0 [email protected]
97 Vbowqqbmye 2467 Data Scientist 8921949055.0 [email protected]
98 Kukowpctzv 3553 Developer 9197584574.0 NaN

4
99 Pvvicpbxnk 9638 Analyst 3176133724.0 [email protected]

Total_Work_Hour_per_Month Salary_per_Month
0 -10.000000 24305.405405
1 9999.000000 -5000.000000
2 180.000000 10800.000000
3 200.000000 12000.000000
4 180.000000 10800.000000
.. ... ...
95 200.000000 10000.000000
96 160.000000 11200.000000
97 160.000000 8000.000000
98 2631.392405 12000.000000
99 180.000000 10800.000000

[100 rows x 7 columns]

[17]: df['Employee_Name'].dtype

[17]: dtype('O')

[ ]:

Pandas
No ratings yet
Pandas
91 pages
Prints
No ratings yet
Prints
43 pages
Python
No ratings yet
Python
32 pages
Employee Data Analysis Report
No ratings yet
Employee Data Analysis Report
22 pages
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
No ratings yet
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
10 pages
Data Analytics
No ratings yet
Data Analytics
3 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
56 pages
Practical Questions
No ratings yet
Practical Questions
7 pages
Data Overview: 25480 Entries
No ratings yet
Data Overview: 25480 Entries
11 pages
Data Analysis with Pandas
No ratings yet
Data Analysis with Pandas
31 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Exp1d
No ratings yet
Exp1d
6 pages
EmployeeMgmt XII IP ProjectReprot 2022 23
No ratings yet
EmployeeMgmt XII IP ProjectReprot 2022 23
16 pages
Employee Management Project
No ratings yet
Employee Management Project
33 pages
SMARAN HR Analytics - Ipynb - Colab
No ratings yet
SMARAN HR Analytics - Ipynb - Colab
65 pages
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
No ratings yet
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
77 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
23 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Analyzing Crew Data Variables
No ratings yet
Analyzing Crew Data Variables
11 pages
DAP Writeups - Merged
No ratings yet
DAP Writeups - Merged
33 pages
Ip Project Dineshh
No ratings yet
Ip Project Dineshh
30 pages
Kushal Kadayat
No ratings yet
Kushal Kadayat
33 pages
Data Science with Python: Data Frames
No ratings yet
Data Science with Python: Data Frames
25 pages
Pandas
No ratings yet
Pandas
13 pages
Viksit Ip Project File
No ratings yet
Viksit Ip Project File
33 pages
Ip Kamalesh
No ratings yet
Ip Kamalesh
30 pages
Python Assignment-2
No ratings yet
Python Assignment-2
3 pages
Ip Practical
No ratings yet
Ip Practical
3 pages
Data Project
No ratings yet
Data Project
12 pages
Python2 Master
No ratings yet
Python2 Master
12 pages
Ip Kamalesh
No ratings yet
Ip Kamalesh
29 pages
2022ucd2164 1 2
No ratings yet
2022ucd2164 1 2
35 pages
Lab 3 - Working With Data Frames
No ratings yet
Lab 3 - Working With Data Frames
10 pages
Document (4) - 1
No ratings yet
Document (4) - 1
15 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Pandas
No ratings yet
Pandas
32 pages
Salary Prediction with Linear Regression
No ratings yet
Salary Prediction with Linear Regression
7 pages
Blended Data Cleaning
No ratings yet
Blended Data Cleaning
9 pages
Practical No. 01
No ratings yet
Practical No. 01
114 pages
AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
Geo Python Doc (1) 7,8 Bavesh
No ratings yet
Geo Python Doc (1) 7,8 Bavesh
9 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
EDA Cheat Sheet - Exploratory Data Analysis
No ratings yet
EDA Cheat Sheet - Exploratory Data Analysis
2 pages
ML - Preprocessing - Introduction
No ratings yet
ML - Preprocessing - Introduction
14 pages
Lab2 Day8 23BCSA84 AssignmentSolution
No ratings yet
Lab2 Day8 23BCSA84 AssignmentSolution
7 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Avinash DA 6
No ratings yet
Avinash DA 6
3 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
Employee Management System Overview
No ratings yet
Employee Management System Overview
29 pages
Pandas 1
No ratings yet
Pandas 1
13 pages
IP Project File 2
No ratings yet
IP Project File 2
34 pages
Data Science Salary Analysis 2023
No ratings yet
Data Science Salary Analysis 2023
17 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
Kunj Project 1
No ratings yet
Kunj Project 1
34 pages
Practical File Infomatics Practices 2024-25
No ratings yet
Practical File Infomatics Practices 2024-25
39 pages
MGNM - 801 - Ca1
No ratings yet
MGNM - 801 - Ca1
14 pages
Data Prep & EDA for Python Users
No ratings yet
Data Prep & EDA for Python Users
12 pages
Samba Sadashiva Shombho Sankara Charanam Metavva Charana Yugam by Ms Subba Lakshmi
100% (1)
Samba Sadashiva Shombho Sankara Charanam Metavva Charana Yugam by Ms Subba Lakshmi
5 pages
Intro to Agricultural Statistics
No ratings yet
Intro to Agricultural Statistics
15 pages
VB VC++ Oracle Practical Programs
50% (2)
VB VC++ Oracle Practical Programs
86 pages
RD Sharma Solutions For Class 9 Maths Chapter 3 Rationalisation
No ratings yet
RD Sharma Solutions For Class 9 Maths Chapter 3 Rationalisation
18 pages
City of Miami Employee Pay Rates
No ratings yet
City of Miami Employee Pay Rates
106 pages
World Englishes for Students
100% (1)
World Englishes for Students
2 pages
Rain Vocabulary
No ratings yet
Rain Vocabulary
3 pages
Lab Assignment 2 - Selection Control Structure Sep2019-Jan2020
No ratings yet
Lab Assignment 2 - Selection Control Structure Sep2019-Jan2020
2 pages
Cultural Exoticism in Music
No ratings yet
Cultural Exoticism in Music
29 pages
Releasing Your Angels To Work For You
100% (2)
Releasing Your Angels To Work For You
4 pages
Y8 Higher
No ratings yet
Y8 Higher
26 pages
Kodedkloud Instalation Hardway
No ratings yet
Kodedkloud Instalation Hardway
153 pages
The World Soul in Early Romanticism Revi
No ratings yet
The World Soul in Early Romanticism Revi
9 pages
Understanding Deductive and Inductive Logic
No ratings yet
Understanding Deductive and Inductive Logic
17 pages
Sheet 1 Solution
No ratings yet
Sheet 1 Solution
5 pages
Chapter 1 Register Transfer & Microoperation ملخص by Eng Emad Mahdy
No ratings yet
Chapter 1 Register Transfer & Microoperation ملخص by Eng Emad Mahdy
25 pages
Hele 6 Q3
No ratings yet
Hele 6 Q3
3 pages
Module 4 Grammar - Past Simple
No ratings yet
Module 4 Grammar - Past Simple
30 pages
Music Theory Lesson 11
No ratings yet
Music Theory Lesson 11
1 page
SE4151 Notes
No ratings yet
SE4151 Notes
113 pages
Data Dictionary: Table Number: One User Master
No ratings yet
Data Dictionary: Table Number: One User Master
7 pages
Understanding Percentages and Applications
No ratings yet
Understanding Percentages and Applications
5 pages
Orthotropic Composite Laminate Analysis
No ratings yet
Orthotropic Composite Laminate Analysis
20 pages
MYP Personal Project Guide Spanish
No ratings yet
MYP Personal Project Guide Spanish
81 pages
Enhancing Research On Engineering Education Empowering Research Skills Through Generative Artificial Intelligence For Systematic Literature Reviews
No ratings yet
Enhancing Research On Engineering Education Empowering Research Skills Through Generative Artificial Intelligence For Systematic Literature Reviews
8 pages
Quadratic Eq WS 2025-26
No ratings yet
Quadratic Eq WS 2025-26
5 pages
Optimize Your Resume with BestResumeHelp
100% (1)
Optimize Your Resume with BestResumeHelp
5 pages
Chapter 11 Slokas of Bhagavad Gita
100% (3)
Chapter 11 Slokas of Bhagavad Gita
94 pages
HTML Color Names and Codes
100% (1)
HTML Color Names and Codes
5 pages
Material Model
No ratings yet
Material Model
9 pages