0% found this document useful (0 votes)

15 views26 pages

Machine Learning Lab

The document outlines the Machine Learning Lab course at Maharaja Agrasen Institute of Technology, detailing the vision and mission of the Computer Science and Engineering Department. It includes rubrics for lab assessment and describes various machine learning experiments such as Linear Regression, Logistic Regression, Decision Trees, and clustering algorithms like K-Means and DBSCAN. Each experiment aims to demonstrate different machine learning concepts using Jupyter Notebook and relevant libraries.

Uploaded by

ipdistrib3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views26 pages

Machine Learning Lab

Uploaded by

ipdistrib3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

MACHINE LEARNING LAB

Paper Code: CIE-421P

Faculty Name: Student Name: Jatin Bansal

Dr. Sudha Narang Roll No: 03596402722
(Associate Professor) Semester: 7
Group: 7AIML-3C

Maharaja Agrasen Institute of Technology, PSP Area,

Sector – 22, Rohini, New Delhi - 110085
MAHARAJA AGRASEN INSTITUTE OF TECHNOLOGY
COMPUTER SCIENCE & ENGINEERING DEPARTMENT

VISION
"To attain global excellence through education, innovation, research, and work ethics in the
field of Computer Science and engineering with the commitment to serve humanity."

MISSION

M1: To lead in the advancement of computer science and engineering through internationally
recognized research and education.
M2: To prepare students for full and ethical participation in a diverse society and encourage
lifelong learning.
M3: To foster development of problem solving and communication skills as an integral
component of the profession.
M4: To impart knowledge, skills and cultivate an environment supporting incubation, product
development, technology transfer, capacity building and entrepreneurship in the field of
computer science and engineering.
M5: To encourage faculty, student’s networking with alumni, industry, institutions, and other
stakeholders for collective engagement.
Rubrics for Lab Assessment:
10 Marks POs and PSOs Covered
Rubrics
0 Marks 1 Marks 2 Marks PO PSO

Is able to identify and define

PSO1,
R1 the objective of the given No Partially Completely PO1, PO2
PSO2
problem?

Is proposed
PO1,PO2, PSO1,
R2 design/procedure/algorithm No Partially Completely
PO3 PSO2
solves the problem?

Has the understanding of the

tool/programming language PO1,PO3, PSO1,
R3 No Partially Completely
to implement the proposed PO5 PSO2
solution?

Are the result(s) verified

PO2,PO4,
R4 using sufficient test data to No Partially Completely PSO2
PO5
support the conclusions?

PSO1,
R5 Individuality of submission? No Partially Completely PO8, PO12
PSO3
INDEX
R1 R2 R3 R4 R5
Has the
understa
Is able Is nding of Are the
to propose the result(s)
identify d design tool/pro verified
and /proced grammi using
Individu
define ure ng sufficie
ality of
Date Of the /algorith languag nt test
submiss
Total Faculty
S.No Experiment objectiv m e to data to
Performance ion? Marks Signature
e of the solves implem support
given the ent the the
problem problem propose conclusi
? ? d ons?
solution
?

2 2 2 2 2
Marks Marks Marks Marks Marks

EXPERIMENT NO. 1
AIM: Introduction to JUPYTER IDE and its libraries Pandas and NumPy.

THEORY:
Jupyter Notebooks have become an integral tool in the fields of data science, machine
learning, and scientific research. It is an open-source web application that allows you to
create and share documents that contain live code, equations, visualizations, and narrative
text. It is part of Project Jupyter, which is a non-profit, open-source project that evolved from
the IPython Project in 2014. The name 'Jupyter' is a reference to the core supported
programming languages that it was designed to support: Julia, Python, and R, but today it
supports over 40 programming languages.
NumPy (Numerical Python) is a fundamental Python library used for scientific computing.
It provides:
 Support for multi-dimensional arrays (ndarray).
 Fast mathematical and statistical operations.
 Functions for linear algebra, random numbers, and numerical analysis.
 Much faster performance than normal Python lists when working with large datasets.
Pandas is a Python library built on top of NumPy, mainly used for data analysis and
manipulation.
It provides two main data structures:
 Series → One-dimensional labeled array (like a column in Excel).
 DataFrame → Two-dimensional labeled data structure (like an Excel table).
With Pandas, one can:
 Import/export data (CSV, Excel, SQL, JSON, etc.).
 Clean and transform datasets.
 Perform filtering, grouping, merging, and statistical analysis.

Features of Jupyter Notebook

1. Interactive Development Environment
2. Rich Text and Media Support
3. Collaboration and Sharing
4. Extensibility and Customization
5. Reproducibility and Portability
6. Data Science and Visualization
7. Open Source and Community-Driven

JUPYTER NOTEBOOK:
EXPERIMENT NO. 2

AIM: Program to demonstrate Simple Linear Regression.

THEORY:
Linear regression predicts the relationship between two variables by assuming a linear
connection between the independent and dependent variables. It seeks the optimal line that
minimizes the sum of squared differences between predicted and actual values. In a simple
linear regression, there is one independent variable and one dependent variable. The model
estimates the slope and intercept of the line of best fit, which represents the relationship
between the variables. The slope represents the change in the dependent variable for each unit
change in the independent variable, while the intercept represents the predicted value of the
dependent variable when the independent variable is zero. To calculate best-fit line linear
regression uses a traditional slope-intercept form which is given below, Yi = β0 + β1Xi

JUPYTER NOTEBOOK:
EXPERIMENT NO. 3

AIM: Program to demonstrate Logistic Regression.

THEORY:
Logistic regression is a supervised machine learning algorithm that accomplishes binary
classification tasks by predicting the probability of an outcome, event, or observation. The
model delivers a binary or dichotomous outcome limited to two possible outcomes: yes/no,
0/1, or true/false. Logical regression analyses the relationship between one or more
independent variables and classifies data into discrete classes. It is extensively used in
predictive modelling, where the model estimates the mathematical probability of whether an
instance belongs to a specific category or not.
Logistic regression uses a logistic function called a sigmoid function to map predictions and
their probabilities. The sigmoid function refers to an S-shaped curve that converts any real
value to a range between 0 and 1.
Moreover, if the output of the sigmoid function (estimated probability) is greater than a
predefined threshold on the graph, the model predicts that the instance belongs to that class.
If the estimated probability is less than the predefined threshold, the model predicts that the
instance does not belong to the class.

JUPYTER NOTEBOOK:
EXPERIMENT NO. 4

AIM: Program to demonstrate Decision Tree-ID3 Algorithm.

THEORY:
A decision tree is a structure that contains nodes (rectangular boxes) and edges (arrows) and
is built from a dataset (table of columns representing features/attributes and rows corresponds
to records). Each node is either used to make a decision (known as decision node) or
represent an outcome (known as leaf node). ID3 stands for Iterative Dichotomiser 3 and is
named such because the algorithm iteratively (repeatedly) dichotomizes (divides) features
into two or more groups at each step. Invented by Ross Quinlan, ID3 uses a top-down greedy
approach to build a decision tree.

JUPYTER NOTEBOOK:
EXPERIMENT NO. 5

AIM: To demonstrate k-Nearest Neighbor flowers classification.

THEORY:
The K-Nearest Neighbors (KNN) algorithm is a popular machine learning technique used for
classification and regression tasks. It relies on the idea that similar data points tend to have
similar labels or values. During the training phase, the KNN algorithm stores the entire
training dataset as a reference. When making predictions, it calculates the distance between
the input data point and all the training examples, using a chosen distance metric such as
Euclidean distance. Next, the algorithm identifies the K nearest neighbors to the input data
point based on their distances. In the case of classification, the algorithm assigns the most
common class label among the K neighbors as the predicted label for the input data point. For
regression, it calculates the average or weighted average of the target values of the K
neighbors to predict the value for the input data point. KNN Algorithm can be used for both
classification and regression predictive problems.

JUPYTER NOTEBOOK:
EXPERIMENT NO. 6

AIM: Program to demonstrate Naive-Bayes Classifier.

THEORY:
The Naïve Bayes Classifier is a probabilistic machine learning model used for classification
tasks based on Bayes’ Theorem, assuming independence among features.
Bayes’ theorem states:
P (X ∣C) P(C)
P(C ∣ X )=
P(X )

where:
 P(C ∣ X ): Posterior probability of class given features

 P( X ∣C ): Likelihood

 P(C ): Prior probability of the class

 P( X): Evidence (constant for comparison)

The “naïve” assumption simplifies the computation by treating all features as independent,
making it efficient for large datasets.

JUPYTER NOTBOOK:
EXPERIMENT NO. 7

AIM: To demonstrate Principal Component Analysis (PCA) and Linear Discriminant

Analysis (LDA) on the Iris dataset.

THEORY:
PCA (Principal Component Analysis):
PCA is a dimensionality reduction technique that transforms a large set of variables into a
smaller one while retaining most of the variance.
It works by:
1. Standardizing the data.
2. Computing the covariance matrix.
3. Calculating eigenvalues and eigenvectors.
4. Choosing the top k eigenvectors (principal components).
5. Projecting data onto the new feature space.
LDA (Linear Discriminant Analysis):
LDA is a supervised dimensionality reduction technique that aims to maximize class
separability.
It projects data onto a lower-dimensional space where the ratio of between-class variance to
within-class variance is maximized.
JUPYTER NOTEBOOK:
EXPERIMENT NO. 8

AIM: Program to demonstrate DBSCAN clustering algorithm

THEORY:
DBSCAN is a density-based clustering algorithm that groups together closely packed points
and marks points in low-density areas as outliers.
Key Parameters:
 eps: Maximum distance between two samples for them to be considered as neighbors.
 min_samples: Minimum number of points to form a dense region.
Advantages:
 Can find arbitrarily shaped clusters.
 Robust to noise.

JUPYTER NOTEBOOK:
EXPERIMENT NO. 9

AIM: Program to demonstrate K-Medoid clustering algorithm

THEORY:
K-Medoids is a partition-based clustering algorithm similar to K-Means but uses actual data
points as cluster centers (medoids), making it more robust to outliers.
Steps:
1. Choose k random medoids.
2. Assign each data point to the nearest medoid.
3. Update medoids by minimizing total dissimilarity.
4. Repeat until convergence.
Difference from K-Means:
 K-Means uses centroids (mean points),
 K-Medoids uses actual points → more stable with noisy data.

JUPYTER NOTEBOOK:
EXPERIMENT NO. 10

AIM: Program to demonstrate K-Means Clustering Algorithm on Handwritten Dataset

THEORY:
K-Means is an unsupervised learning algorithm used to partition a dataset into k clusters
based on feature similarity.
Steps:
1. Select k initial centroids randomly.
2. Assign each data point to the nearest centroid.
3. Recalculate centroids as the mean of all assigned points.
4. Repeat until centroids stabilize.
Applications:
Image compression, customer segmentation, pattern recognition.

JUPYTER NOTEBOOK:

Botany NEP Syllabus 2024
No ratings yet
Botany NEP Syllabus 2024
82 pages
Child Care Book
No ratings yet
Child Care Book
144 pages
DM Lahiru Sehan Orginal
No ratings yet
DM Lahiru Sehan Orginal
106 pages
Math 7 Fourth Periodical Exam Guide
No ratings yet
Math 7 Fourth Periodical Exam Guide
5 pages
SEAM 2 (Revised)
67% (3)
SEAM 2 (Revised)
11 pages
Indiana University Bloomington America S Legacy Campus J. Terry Clapacs Download
No ratings yet
Indiana University Bloomington America S Legacy Campus J. Terry Clapacs Download
71 pages
BMS 280 Course Outline
No ratings yet
BMS 280 Course Outline
6 pages
Daily Psych PH - Practice Test - Psychometrician (40 Items)
No ratings yet
Daily Psych PH - Practice Test - Psychometrician (40 Items)
6 pages
Mechanical Engineering - Wikipedia
100% (3)
Mechanical Engineering - Wikipedia
453 pages
HGP Action Plan 2024 - 2025
100% (2)
HGP Action Plan 2024 - 2025
3 pages
Technical Communication - 9
No ratings yet
Technical Communication - 9
171 pages
Problem Learner Concept and Types
No ratings yet
Problem Learner Concept and Types
10 pages
Introduction to Patanjali's Yoga Sutra
No ratings yet
Introduction to Patanjali's Yoga Sutra
4 pages
NSTP CWTS - Lesson 1
0% (1)
NSTP CWTS - Lesson 1
4 pages
Haircut and Dresscode Policy
No ratings yet
Haircut and Dresscode Policy
1 page
Origen Crouzel Henri
100% (2)
Origen Crouzel Henri
304 pages
Project Work Template 2081
No ratings yet
Project Work Template 2081
16 pages
HR Career Highlights
No ratings yet
HR Career Highlights
6 pages
Seating Plan CT-2 Shift-2 11-04-2025
No ratings yet
Seating Plan CT-2 Shift-2 11-04-2025
56 pages
Factorial and Permutations Lesson Plan
No ratings yet
Factorial and Permutations Lesson Plan
3 pages
Syllabus: Cambridge International AS & A Level Islamic Studies 9488
No ratings yet
Syllabus: Cambridge International AS & A Level Islamic Studies 9488
37 pages
Technical Writing A Practical Guide For Engineers Scientists and Nontechnical Professionals Second Edition Official Test Bank
No ratings yet
Technical Writing A Practical Guide For Engineers Scientists and Nontechnical Professionals Second Edition Official Test Bank
403 pages
C_THINK1_02 Design Thinking Quiz
No ratings yet
C_THINK1_02 Design Thinking Quiz
6 pages
13 Writing Reports and Working With Parents For The First Time
No ratings yet
13 Writing Reports and Working With Parents For The First Time
23 pages
Text Analysis Guide: Form, Structure, Language
No ratings yet
Text Analysis Guide: Form, Structure, Language
3 pages
The Learning Approaches of The Grade 9 Students of Concordia College S.Y. 2017 2018
No ratings yet
The Learning Approaches of The Grade 9 Students of Concordia College S.Y. 2017 2018
64 pages
Working in The Classroom With Migrant and Refugee Students The Practices and Needs of Italian Primary and Middle School Teachers
No ratings yet
Working in The Classroom With Migrant and Refugee Students The Practices and Needs of Italian Primary and Middle School Teachers
18 pages
Creed Danvers Statement
No ratings yet
Creed Danvers Statement
4 pages
Understanding the Scientific Method
No ratings yet
Understanding the Scientific Method
26 pages
Sales Manager's Correspondence
No ratings yet
Sales Manager's Correspondence
5 pages

Machine Learning Lab

Uploaded by

Machine Learning Lab

Uploaded by

MACHINE LEARNING LAB

Paper Code: CIE-421P

Faculty Name: Student Name: Jatin Bansal

Maharaja Agrasen Institute of Technology, PSP Area,

Is able to identify and define

Has the understanding of the

Are the result(s) verified

Features of Jupyter Notebook

AIM: Program to demonstrate Simple Linear Regression.

AIM: Program to demonstrate Logistic Regression.

AIM: Program to demonstrate Decision Tree-ID3 Algorithm.

AIM: To demonstrate k-Nearest Neighbor flowers classification.

AIM: Program to demonstrate Naive-Bayes Classifier.

 P(C ): Prior probability of the class

 P( X): Evidence (constant for comparison)

AIM: To demonstrate Principal Component Analysis (PCA) and Linear Discriminant

AIM: Program to demonstrate DBSCAN clustering algorithm

AIM: Program to demonstrate K-Medoid clustering algorithm

AIM: Program to demonstrate K-Means Clustering Algorithm on Handwritten Dataset

You might also like