Problem Statement

This document provides instructions for two questions involving data analysis and machine learning modeling. For question 1, it instructs to import a movie review dataset, perform 10-fold cross validation, extract TF-IDF features, train GaussianNB, BernoulliNB and MultinomialNB classifiers, and output accuracy, confusion matrices and predictions. For question 2, it instructs to import a diabetes dataset, handle missing values, visualize the data, perform 10-fold cross validation, train a logistic regression model, output coefficients and decision boundary, and compute accuracy and confusion matrix.

Uploaded by

Brianearl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views1 page

Problem Statement

Uploaded by

Brianearl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

Instructions

[Link] the instructions in each question carefully.

2. A Jupyter notebook along with output for each cell is expected.
3. Any assignment submitted using other python IDEs are not considered for
grading.
4. Use appropriate labels for all visualizations.
5. Upload the [Link] file along with the notebook when required.\
6. If dataset link is expired, search for the same dataset online from any
repository and use it.

Question 1

1. Import the dataset from [Link]

data/review_polarity.[Link] .
2. Split the data into training and testing. use 10-fold cross validation.
3. Extract features using TF-IDF and display the features.
4. Model the classifier using GaussianNB, BernoulliNB and MultinomialNB and
train the classifiers.
5. Compute the accuracy and confusion matrix for each models.
6. Create an output .csv file consisting actual Test set values of Y (column
name: Actual) and Predictions of Y(column name: Predicted).

Question 2

Consider the diabetes data ([Link]) has a response variable of whether a

person is having diabetes, which is given by a 1.

1. Import the dataset from [Link]

database.
2. Identify the columns with missing values (1 point). Fill the missing values
with mean value for numerical attributes and mode value for categorical attributes.

3. Extract X as all columns except the last column and Y as last column.
4. Visualize the dataset.
5. Split the data into training set and testing set. Perform 10-fold cross
validation.
6. Train a Logistic regression model for the dataset.
7. Display the coefficients and form the logistic regression equation.
8. Compute the accuracy and confusion matrix.
9. Plot the decision boundary.

MLT Lab1
No ratings yet
MLT Lab1
27 pages
PRNN 2023 Assignment1
No ratings yet
PRNN 2023 Assignment1
2 pages
ML - Practical List
No ratings yet
ML - Practical List
3 pages
Ritesh Mangla ML PracticalFile
No ratings yet
Ritesh Mangla ML PracticalFile
55 pages
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
No ratings yet
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
6 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
ML - LAB - FILE Amrit
No ratings yet
ML - LAB - FILE Amrit
13 pages
Python Data Preprocessing & Regression
No ratings yet
Python Data Preprocessing & Regression
68 pages
Sheet1 1
No ratings yet
Sheet1 1
2 pages
SUB Final Question
No ratings yet
SUB Final Question
2 pages
Cse Machine Learning Lab Manual
No ratings yet
Cse Machine Learning Lab Manual
22 pages
Solutions To Applied Data Science AI
No ratings yet
Solutions To Applied Data Science AI
9 pages
Python Machine Learning Practical Guide
No ratings yet
Python Machine Learning Practical Guide
13 pages
DIT865 2018 Mar Solution
No ratings yet
DIT865 2018 Mar Solution
9 pages
FOR23.9 cjcm7sj
No ratings yet
FOR23.9 cjcm7sj
5 pages
ML Lab: Healthcare Data Analysis
No ratings yet
ML Lab: Healthcare Data Analysis
16 pages
Data Science Assignment Guidelines
No ratings yet
Data Science Assignment Guidelines
3 pages
Machine Learning Coursework Guide
No ratings yet
Machine Learning Coursework Guide
10 pages
HW1 Final
No ratings yet
HW1 Final
4 pages
Important Questions
No ratings yet
Important Questions
4 pages
178 hw3
No ratings yet
178 hw3
3 pages
Questions
No ratings yet
Questions
7 pages
Machine Learning
No ratings yet
Machine Learning
27 pages
Disease Prediction ML Assignment
No ratings yet
Disease Prediction ML Assignment
7 pages
ML Lab Question Set - 1
No ratings yet
ML Lab Question Set - 1
5 pages
CPE531 S18 MT Sol PDF
No ratings yet
CPE531 S18 MT Sol PDF
3 pages
Programs
No ratings yet
Programs
18 pages
Python
No ratings yet
Python
38 pages
Machine Learning Lab Manual 2021-22
No ratings yet
Machine Learning Lab Manual 2021-22
23 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
CP4252 Machine Learning Laboratory
No ratings yet
CP4252 Machine Learning Laboratory
37 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
3 pages
Bayesian Decision Theory Quiz
No ratings yet
Bayesian Decision Theory Quiz
6 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Data Preprocessing Techniques in Python
No ratings yet
Data Preprocessing Techniques in Python
27 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Assignment 2: Hive
No ratings yet
Assignment 2: Hive
11 pages
EML Midterm Answer Keys
No ratings yet
EML Midterm Answer Keys
3 pages
Machine Learning Lab Manual SPCE
No ratings yet
Machine Learning Lab Manual SPCE
57 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
ML Lab Question Set - 2
No ratings yet
ML Lab Question Set - 2
5 pages
Shashank ML
No ratings yet
Shashank ML
23 pages
Question 1 The Given Dataset Can Be Visualized As Follows
No ratings yet
Question 1 The Given Dataset Can Be Visualized As Follows
13 pages
ML Ese 031223 Openbook
No ratings yet
ML Ese 031223 Openbook
4 pages
Sample QP For Mid-Semester Exam
No ratings yet
Sample QP For Mid-Semester Exam
5 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Titanic Data Analysis with Python
No ratings yet
Titanic Data Analysis with Python
20 pages
Project Report
100% (3)
Project Report
36 pages
Edited - Edited - Final ML Lab Manual Version11
No ratings yet
Edited - Edited - Final ML Lab Manual Version11
83 pages
Machine Learning Algorithms Syllabus
No ratings yet
Machine Learning Algorithms Syllabus
43 pages
Machine Learning Business Report PDF
No ratings yet
Machine Learning Business Report PDF
54 pages
ML Question
No ratings yet
ML Question
2 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
DSCI 303: Machine Learning For Data Science Fall 2020
No ratings yet
DSCI 303: Machine Learning For Data Science Fall 2020
5 pages
Original ML Lab Manual
No ratings yet
Original ML Lab Manual
22 pages
Machine Learning Lab Guide for CSE
No ratings yet
Machine Learning Lab Guide for CSE
26 pages

Problem Statement

Uploaded by

Problem Statement

Uploaded by

Instructions

[Link] the instructions in each question carefully.

1. Import the dataset from [Link]

Consider the diabetes data ([Link]) has a response variable of whether a

1. Import the dataset from [Link]

You might also like