0% found this document useful (0 votes)

13 views3 pages

Data Analytics Practical Assignment 1

data analytics

Uploaded by

Maureen Njiinu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views3 pages

Data Analytics Practical Assignment 1

data analytics

Uploaded by

Maureen Njiinu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Analytics

Assignment 1
Practical Question: Machine Learning Using Decision Tree on Employment
Dataset
Objective:

You are provided with an Employment Dataset containing information about

candidates who applied for jobs. Your task is to build a Decision Tree Classification
Model to predict whether a candidate should be employed or not based on various
features.

[Link]
dataset
Form groups with a minimum of 4 and a maximum of 6 members to complete the
task.

Dataset Description
Each row in the dataset represents a job applicant. The dataset includes the
following features:

 age
The age of the employee in years.

 education_level
The highest education level attained by the employee (e.g., High School,
Bachelor’s, Master’s, PhD).

 years_of_experience
Total number of years the employee has worked professionally.

 technical_test_score
Score obtained by the employee in a technical assessment (out of 100).

 interview_score
Score obtained by the employee during the interview process (out of 10).

 previous_employment
Whether the employee had previous employment experience (Yes/No).

 suitable_for_employment (Target)
Indicates if the candidate is suitable for employment (Yes/No).

Page 1 of 3
Tasks to Perform:
1. Data Loading and Exploration

o Load the dataset using Python libraries (e.g., pandas).

o Display the first few rows of the dataset.

o Perform basic EDA (Exploratory Data Analysis): Check for null values,
data types, and distribution of features.
2. Data Preprocessing

o Convert categorical variables into numeric format (e.g., one-hot

encoding or label encoding).

o Split the dataset into training and testing sets (e.g., 80% train, 20%
test).
3. Model Building

o Train a Decision Tree Classifier using the training data to predict

suitable_for_employment.
4. Model Visualization

o Visualize the decision tree using appropriate tools like plot_tree() or

graphviz.
5. Model Testing and Prediction

o Predict the labels for the test dataset.

o Test the model using at least 3 hypothetical candidate profiles and

interpret the predictions.
6. Model Evaluation

o Evaluate the model using:

 Accuracy Score
 Confusion Matrix

 Classification Report (Precision, Recall, F1-Score)

Bonus Task (Optional):

 Perform feature importance analysis to determine which features contribute

most to the employment decision.

Page 2 of 3
📦 Required Libraries:

pandas, numpy, sklearn, matplotlib, seaborn

Expected Output:

 Clear and well-commented Python code

 Visualized decision tree

 Model performance metrics

 Interpretation of predictions

Page 3 of 3

Mega Food Haven Project Proposal Presentation
No ratings yet
Mega Food Haven Project Proposal Presentation
15 pages
Presentation - Full-Stack Development With MERN
No ratings yet
Presentation - Full-Stack Development With MERN
9 pages
Scope - and - Nature - of - Production - and - Industrial - Economics - Topic 1
No ratings yet
Scope - and - Nature - of - Production - and - Industrial - Economics - Topic 1
13 pages
Student Project Proposal Guide
No ratings yet
Student Project Proposal Guide
4 pages
Mugambi Transport Management System
No ratings yet
Mugambi Transport Management System
12 pages
SDS
No ratings yet
SDS
15 pages
Notes in Environmental Data Analysis
100% (1)
Notes in Environmental Data Analysis
11 pages
Time Series Cheat Sheet
No ratings yet
Time Series Cheat Sheet
2 pages
SPC Solutions (Corrected)
No ratings yet
SPC Solutions (Corrected)
12 pages
Excel Data Analysis and Visualization Guide
No ratings yet
Excel Data Analysis and Visualization Guide
14 pages
FirstName LastName DA
No ratings yet
FirstName LastName DA
2 pages
MBA II Semester Syllabus JNTU
No ratings yet
MBA II Semester Syllabus JNTU
10 pages
Course Plan - Introduction To Research Methods
No ratings yet
Course Plan - Introduction To Research Methods
5 pages
Data Analytics Lab Exercise
No ratings yet
Data Analytics Lab Exercise
28 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
23 pages
Data Analytics Using WEKA
No ratings yet
Data Analytics Using WEKA
65 pages
Eco-Efficiency in Energy Consumption Analysis
No ratings yet
Eco-Efficiency in Energy Consumption Analysis
18 pages
Ralph Santos: RF Engineering Expertise
No ratings yet
Ralph Santos: RF Engineering Expertise
5 pages
Engineering-Coursera Courses
100% (1)
Engineering-Coursera Courses
13 pages
HOPE
No ratings yet
HOPE
76 pages
Econometrics Theory Assignment Questions
100% (3)
Econometrics Theory Assignment Questions
4 pages
Reading 1 Multiple Regression - Answers
No ratings yet
Reading 1 Multiple Regression - Answers
90 pages
MKTG & Sales MGT - Flowchart
No ratings yet
MKTG & Sales MGT - Flowchart
23 pages
Conceptual
No ratings yet
Conceptual
3 pages
Root Cause Analysis in Quality Problem Solving of Research Information Systems: A Case Study
No ratings yet
Root Cause Analysis in Quality Problem Solving of Research Information Systems: A Case Study
16 pages
1027 02 Swot Analysis Powerpoint
No ratings yet
1027 02 Swot Analysis Powerpoint
6 pages
ANOVA Farmakologi Spss
No ratings yet
ANOVA Farmakologi Spss
5 pages
Stata GMM Estimation Guide
100% (1)
Stata GMM Estimation Guide
10 pages
Practical Research 2 Module Oct 31 2023 1
No ratings yet
Practical Research 2 Module Oct 31 2023 1
10 pages
Writing Effective Research Proposals
No ratings yet
Writing Effective Research Proposals
41 pages
Python Time Series Forecasting Guide
No ratings yet
Python Time Series Forecasting Guide
23 pages
Early Prediction For Chronic Kidney Disease Detection A Progressive Approach To Health Management
No ratings yet
Early Prediction For Chronic Kidney Disease Detection A Progressive Approach To Health Management
34 pages
Lesson2 Shs
No ratings yet
Lesson2 Shs
4 pages
Teachers' MTB-MLE Experiences
No ratings yet
Teachers' MTB-MLE Experiences
3 pages
Data Analytics Sys
No ratings yet
Data Analytics Sys
1 page
SPSS Data Analysis: Anxiety & Demographics
No ratings yet
SPSS Data Analysis: Anxiety & Demographics
4 pages

Data Analytics Practical Assignment 1

Uploaded by

Data Analytics Practical Assignment 1

Uploaded by

Data Analytics

You are provided with an Employment Dataset containing information about

o Load the dataset using Python libraries (e.g., pandas).

o Display the first few rows of the dataset.

o Convert categorical variables into numeric format (e.g., one-hot

o Train a Decision Tree Classifier using the training data to predict

o Visualize the decision tree using appropriate tools like plot_tree() or

o Predict the labels for the test dataset.

o Test the model using at least 3 hypothetical candidate profiles and

o Evaluate the model using:

 Classification Report (Precision, Recall, F1-Score)

Bonus Task (Optional):

 Perform feature importance analysis to determine which features contribute

pandas, numpy, sklearn, matplotlib, seaborn

 Clear and well-commented Python code

 Visualized decision tree

 Model performance metrics

You might also like