Unit-1 - Session 1 - Supervised & Unsupervised

Uploaded by

Tarika Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views24 pages

Unit-1 - Session 1 - Supervised & Unsupervised

Uploaded by

Tarika Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

21CSC305P- MACHINE

LEARNING
UNIT-I

Machine learning- What and Why, Supervised Learning,

Unsupervised learning, Polynomial curve fitting , Probability theory-
discrete random variables, Fundamental rules, Bayes rule,
Independence and conditional independence , Continuous random
variables, Quantiles, mean and variance, Probability densities,
Expectation and covariance
Machine learning- What and Why
• The rise of big data demands machine learning for efficient data analysis and
decision-making.
• For instance, there are around 1 trillion web pages, and every second, one hour of
video content is uploaded to YouTube, equating to 10 years of content every day.
Additionally, thousands of human genomes, each consisting of approximately 3.8
billion base pairs, have been sequenced, and Walmart handles over 1 million
transactions per hour, resulting in databases containing more than 2.5 petabytes of
information.
• Machine learning comprises techniques that can automatically detect patterns within
data and leverage these patterns to predict future data or make decisions under
uncertainty.
• The optimal approach to addressing such challenges is through probability theory,
which applies to any problem involving uncertainty.
Types of Machine Learning
Predictive or Supervised Learning:
•Goal: Learn a mapping from inputs x to outputs y, given a labeled set of input-output pairs
•Training set D: Set of input-output pairs and N: Number of training examples.
•Training input xi:
• Typically a D-dimensional vector of numbers.
• Represents features, attributes, or covariates (e.g., height and weight of a person).
• Can be complex structured objects (e.g., images, sentences, time series, molecular shapes, graphs).
•Output or response variable yi:
• Can be categorical/nominal (e.g., male or female) or real-valued (e.g., income level).
• Categorical problems are known as classification or pattern recognition.
• Real-valued problems are known as regression.
• Ordinal regression: Label space Y has a natural ordering (e.g., grades A-F).
Descriptive or Unsupervised Learning:
•Goal: Find "interesting patterns" in the data.
•Given inputs .Also known as knowledge discovery.
•No well-defined problem as patterns are not specified in advance.
Reinforcement Learning:
•Useful for learning how to act or behave when given occasional reward or punishment signals.
•Example: How a baby learns to walk.
Supervised learning
1. Classification:
• Goal of Classification: Learn a mapping from inputs x to outputs y,
where y∈{1,…,C} with C being the number of classes.
• Binary Classification: When C=2, known as binary classification (e.g.,
y∈{0,1})
• Multiclass Classification: When C>2, known as multiclass classification.
• Multi-label Classification: When class labels are not mutually exclusive
(e.g., someone classified as tall and strong), predicting multiple related
binary class labels (multiple output model).
• One way to formalize the problem is as function approximation.
Assume y=f(x) for an unknown function f; learning aims to estimate f
using a labeled training set and predict with
• Generalization: The main goal is to make predictions on novel inputs not
seen before, emphasizing the importance of generalization over fitting the
Supervised learning-Cont.
Example
• Two classes of objects with labels 0 and 1.
• Inputs are colored shapes, described by D features or attributes.
• Features are stored in an N×D design matrix.
• Input features x can be discrete, continuous, or both. Vector of training labels y.
• Test objects: blue crescent, yellow circle, and blue arrow.
• These test objects have not been seen before, requiring generalization beyond the training set.
Generalization:
•Blue crescent likely has y=1 since all blue shapes in the training set are labeled 1.
•Yellow circle's label is unclear due to mixed labels for yellow objects and circles.
•Blue arrow's label is also unclear due to lack of specific information from the training set.
Supervised learning-Cont.

The need for probabilistic predictions:

• In classification, ambiguous cases should be handled by returning a probability
distribution over possible labels given the input and training set, denoted by p(y∣x,D)
•Compute best guess as the most probable class label using

•This is the mode of the distribution p(y∣x,D) and known as a MAP estimate (maximum a
posteriori).
• Confidence in predictions is crucial, especially in risk-averse domains like medicine
and finance.
• IBM's Watson for Jeopardy uses a confidence module to decide when to answer.
• Google's SmartASS (ad selection system) predicts the click-through rate (CTR) to
maximize expected profit.
• Systems like Watson and SmartASS assess the risk of their predictions, making
decisions based on confidence levels to optimize performance and minimize errors.
Supervised learning-Cont.

Real-world applications:
(i) Document classification and email spam filtering
• In document classification, the primary objective is to categorize documents like web
pages or email messages into predefined classes C, determining p(y=c∣x,D), where x
represents the document's text representation.
• A classic example is email spam filtering, where classes are typically labeled as spam (
y=1 ) or non-spam ( y=0).
• Most classifiers assume a fixed-size input vector x. To handle variable-length documents,
a common approach is the bag of words (BoW) representation.
• Bag of Words (BoW):
• Documents are transformed into fixed-size feature vectors.
• Each vector element corresponds to a word from a predefined vocabulary.
• If a word appears in the document, its corresponding vector element is set to 1; otherwise,
it remains 0.
Supervised learning-Cont.
(ii) Classifying flowers
• The goal is to classify iris flowers into three types: setosa, versicolor, and virginica,
based on four extracted features: sepal length, sepal width, petal length, and petal
width.
Supervised learning-Cont.
(iii) Image classification and handwriting recognition
• Image classification involves categorizing images
based on their content, such as indoor vs. outdoor
scenes, orientation (horizontal vs. vertical), or
presence of specific objects like dogs.
• MNIST (which stands for “Modified National
Institute of Standards” )is a widely used dataset for
handwritten digit recognition, containing 60,000
training images and 10,000 test images of digits (0-9).
• Each image is grayscale, sized 28x28 pixels, and
represents handwritten digits by various individuals.
• Images are represented as feature vectors, where each
pixel's grayscale value (ranging from 0 to 255) serves
as a feature.
Supervised learning-Cont.
(iv) Face detection and recognition
• Object detection, or localization, involves
identifying specific objects within an image. A
notable application is face detection, which is
crucial for tasks like autofocus in cameras and
privacy features in services like Google's
StreetView.
• One approach to face detection is the sliding
window detector method. It divides the image into
small overlapping patches at various locations,
scales, and orientations.
• Each patch is classified based on whether it
exhibits face-like textures or features. Locations
where the probability of containing a face is high
are identified as potential face locations.
• Modern digital cameras often integrate face
detection systems to assist with autofocus by
identifying and focusing on faces within the frame.
• Services like Google's StreetView use face
detection to automatically blur faces to protect
privacy.
Supervised learning-Cont.

2. Regression:
• Regression is just like classification except the response variable is continuous

Here are some examples of real-world regression problems.

• Predict tomorrow’s stock market price given current market conditions and other possible side information.
• Predict the age of a viewer watching a given video on YouTube.
• Predict the location in 3d space of a robot arm end effector, given control signals (torques) sent to its various
motors.
• Predict the amount of prostate specific antigen (PSA) in the body as a function of a number of different clinical
measurements.
• Predict the temperature at any location inside a building using weather data, time, door sensors, etc.
Unsupervised learning
• The goal is to discover “interesting structure” in the data; this is sometimes called
knowledge discovery.
• Unsupervised learning formalizes tasks as density estimation, aiming to model the
probability distribution p(xi∣θ) of input data xi given parameters θ.Unlike supervised
learning, where the focus is on predicting yi given xi and θ, unsupervised learning
directly estimates the density p(xi∣θ)
• Supervised learning involves conditional density estimation p(yi∣xi,θ), where yi is the
target variable. In contrast, unsupervised learning focuses on unconditional density
estimation p(xi∣θ), where xi represents feature vectors.
• In unsupervised learning, xi is typically a vector of features, necessitating the creation
of multivariate probability models to capture dependencies between different features.
• Supervised learning often uses simpler univariate probability models with input-
dependent parameters, focusing on predicting a single variable yi. This simplification is
not applicable in unsupervised settings due to the absence of labeled output.
• It is more widely applicable than supervised learning since it does not require costly
and often sparse labeled data, making it feasible for modeling complex systems where
labeled data is limited or unavailable.
Unsupervised learning-Cont.

1. Discovering clusters:

•Clustering involves grouping data points into clusters based on similarities in their
features, without predefined labels.
•The goal is to estimate the distribution p(K∣D) over the number of clusters K, indicating
the presence of subgroups within the data.
•Model selection in clustering aims to determine the optimal number of clusters K∗ often
approximated by the mode of p(K∣D). Unlike supervised learning where classes are
predefined, unsupervised learning allows flexibility in choosing the number of clusters that
best represent the underlying structure of the data.
•Each data point i is assigned to a cluster zi∈{1,…,K} based on the probability
p(zi=k∣xi,D), where xi is the feature vector of the data point.

•Assignments zi∗ are inferred to determine the cluster membership of each data point,
illustrated by different colors representing clusters in visualizations.
Unsupervised learning-Cont.

Applications of Clustering:
•Astronomy: Clustering methods like Autoclass have been used to discover new
types of stars based on astrophysical measurements.
•E-commerce: Clustering users based on purchasing or web-surfing behavior
allows for targeted advertising and personalized recommendations.
•Biology: Clustering flow-cytometry data helps identify different sub-populations of
cells, aiding in biological research such as understanding disease mechanisms.
Unsupervised learning-Cont.
2. Discovering latent factors:
• Dimensionality reduction involves projecting high-dimensional data into a lower-
dimensional subspace that captures essential characteristics of the data.
• Despite high-dimensional appearances, data often exhibit variability across a smaller
number of latent factors. Dimensionality reduction helps in focusing on these key
factors, such as lighting, pose, or identity in face image modeling.
• PCA is a common approach for dimensionality reduction, resembling an unsupervised
form of multi-output linear regression.
• Given high-dimensional responses y, PCA infers latent low-dimensional factors z that
explain most of the variability in y.
Unsupervised learning-Cont.

Applications:

• In biology, it is common to use PCA to interpret gene microarray data, to account for the
fact that each measurement is usually the result of many genes which are correlated in
their behavior by the fact that they belong to different biological pathways.
• In natural language processing, it is common to use a variant of PCA called latent
semantic analysis for document retrieval.
• In signal processing (e.g., of acoustic or neural signals), it is common to use ICA (which
is a variant of PCA) to separate signals into their different sources.
• In computer graphics, it is common to project motion capture data to a low dimensional
space, and use it to create animations.
Unsupervised learning-Cont.
3. Discovering graph structure
• Learning sparse graphical models involves representing relationships between correlated
variables using a graph G, where nodes depict variables and edges denote direct
dependencies. This approach is pivotal in both discovering new knowledge and enhancing
joint probability density estimators.
• In systems biology, sparse graphical models are used to uncover relationships among
biological entities. For instance, graphs derived from protein phosphorylation data reveal
complex interactions within cellular networks. Similarly, neural wiring diagrams in birds can
be reconstructed from EEG data, highlighting functional connectivity patterns.
• In fields like financial portfolio management, sparse graphs help model covariance between
stocks for better prediction and decision-making. Utilizing sparse graph structures has proven
beneficial in outperforming traditional methods, enabling more effective trading strategies.
• Applications extend to traffic prediction systems, such as JamBayes, which leverage learned
graphical models to forecast traffic flow dynamics. These models contribute to accurate
predictions and efficient management of transportation networks, illustrating the broad
applicability and utility of sparse graphical learning in real-world scenarios.
Unsupervised learning-Cont.
Unsupervised learning-Cont.
4. Matrix completion
• Sometimes we have missing data, that is, variables whose values are unknown. For example, we might have
conducted a survey, and some people might not have answered certain questions.
• The corresponding design matrix will then have “holes” in it; these missing entries are often represented by
NaN, which stands for “not a number”. The goal of imputation is to infer plausible values for the missing
entries. This is sometimes called matrix completion.
• Image Inpainting: Technique to fill in missing parts of images due to scratches or occlusions, achieved by
modeling joint probability of pixels from clean images.
• Collaborative Filtering: Predicting user preferences for items (like movies) based on sparse ratings matrices,
aiming to fill in missing ratings for better recommendation systems.
• Market basket analysis:
❖ Involves examining a large, sparse binary matrix where columns represent items/products and rows
represent transactions.
❖ Each entry in the matrix indicates whether an item was purchased in a specific transaction. By analyzing
correlations among items often bought together, predictions can be made about additional items a consumer
might buy based on partial transaction data.
❖ This technique is also applicable in other domains, such as predicting file dependencies in software systems.
❖ Common methods for market basket analysis include frequent itemset mining, which generates association
rules, and probabilistic modeling, which fits a joint density model to the data.
❖ Data mining emphasizes interpretability of models, whereas machine learning focuses on model accuracy.

Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
Complete Unit-1 Merged
No ratings yet
Complete Unit-1 Merged
74 pages
Unit-1 - Session 1 - Supervised & Unsupervised PDF
No ratings yet
Unit-1 - Session 1 - Supervised & Unsupervised PDF
53 pages
Machine Learning Basics for Beginners
100% (5)
Machine Learning Basics for Beginners
134 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
7 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
ML Notes Unit-1
No ratings yet
ML Notes Unit-1
11 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Week 01
No ratings yet
Week 01
37 pages
86 37 196 Mod 5
No ratings yet
86 37 196 Mod 5
52 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Intro To ML
No ratings yet
Intro To ML
107 pages
2.0 Machine Learning Introduction
No ratings yet
2.0 Machine Learning Introduction
24 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
Intro to Machine Learning Course
No ratings yet
Intro to Machine Learning Course
83 pages
4 DL
No ratings yet
4 DL
81 pages
Notes
No ratings yet
Notes
125 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Unit 5
No ratings yet
Unit 5
30 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
39 pages
Supervised vs. Unsupervised Learning Guide
No ratings yet
Supervised vs. Unsupervised Learning Guide
4 pages
5 Le
100% (1)
5 Le
36 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
PR & ML: CS5691: Machine Learning
No ratings yet
PR & ML: CS5691: Machine Learning
42 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Intro to Machine Learning Concepts
100% (1)
Intro to Machine Learning Concepts
58 pages
Introduction 1175
No ratings yet
Introduction 1175
58 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
49 pages
ML - 01 Supervised Learning
No ratings yet
ML - 01 Supervised Learning
26 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
33 pages
Geocluster Mod in Machine Learning
No ratings yet
Geocluster Mod in Machine Learning
124 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (2)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Lecture1 2015
No ratings yet
Lecture1 2015
52 pages
Machine Learning: Emmanuel Okafor, PH.D., M.SC., B.Eng
No ratings yet
Machine Learning: Emmanuel Okafor, PH.D., M.SC., B.Eng
13 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
14 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
No ratings yet
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
3 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Machine Learning: Supervised Learning Basics
No ratings yet
Machine Learning: Supervised Learning Basics
23 pages
06 Machine Learning
No ratings yet
06 Machine Learning
24 pages
Module 1
No ratings yet
Module 1
50 pages
01 Ml-Overview Notes
No ratings yet
01 Ml-Overview Notes
19 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
Mod 4
No ratings yet
Mod 4
45 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Malay Bhattacharyya Presentation - 22!07!2024
No ratings yet
Malay Bhattacharyya Presentation - 22!07!2024
45 pages
Kikfintask Compressed
No ratings yet
Kikfintask Compressed
15 pages
Suggestions Als
No ratings yet
Suggestions Als
4 pages
OCM Amritsar Textile Internship Overview
No ratings yet
OCM Amritsar Textile Internship Overview
69 pages
In Process Quality Control Pharma Pathway
No ratings yet
In Process Quality Control Pharma Pathway
6 pages
FRM Part 1: Basic Statistics
No ratings yet
FRM Part 1: Basic Statistics
28 pages
Vial Integrity
No ratings yet
Vial Integrity
87 pages
2010 A Pog Ce Conference Preview
No ratings yet
2010 A Pog Ce Conference Preview
44 pages
Computer Applications, Business Accounting and Multilingual DTP (Caba-Mdtp)
No ratings yet
Computer Applications, Business Accounting and Multilingual DTP (Caba-Mdtp)
5 pages
Rdshade Manual v1.0
No ratings yet
Rdshade Manual v1.0
11 pages
RF Hearing Effect Patent
No ratings yet
RF Hearing Effect Patent
8 pages
Easy Math: Interactive Learning App
No ratings yet
Easy Math: Interactive Learning App
9 pages
Decietful Spirits..........
No ratings yet
Decietful Spirits..........
12 pages
Unit-1 DM
No ratings yet
Unit-1 DM
16 pages
Ilonggo-Literature MT 3B
No ratings yet
Ilonggo-Literature MT 3B
10 pages
Safety Inspection Checklist Template
No ratings yet
Safety Inspection Checklist Template
7 pages
Thesis Statement For Ford Motor Company
100% (3)
Thesis Statement For Ford Motor Company
4 pages
Beyondheroesunlimiteduniverse15 The Bestiary 1
No ratings yet
Beyondheroesunlimiteduniverse15 The Bestiary 1
288 pages
Process Design Principles at BITS Pilani
No ratings yet
Process Design Principles at BITS Pilani
17 pages
Quarter I Week 2
No ratings yet
Quarter I Week 2
62 pages
Kimberly Akimbo Study Guide 2f430b6ac3
No ratings yet
Kimberly Akimbo Study Guide 2f430b6ac3
26 pages
Amazon Seller Acronyms Explained
No ratings yet
Amazon Seller Acronyms Explained
1 page
06 Gingerbread House Big Windows
No ratings yet
06 Gingerbread House Big Windows
3 pages
Pseudocode and Python Functions Guide
No ratings yet
Pseudocode and Python Functions Guide
29 pages
Bender Visual Motor Gestalt Test Overview
No ratings yet
Bender Visual Motor Gestalt Test Overview
46 pages
Lgc300a-R2 Rev1
No ratings yet
Lgc300a-R2 Rev1
16 pages
Gateway Error and SQL Used
No ratings yet
Gateway Error and SQL Used
2 pages
Cmmi Acquisition Module (CMMI-AM), Version 1.1
No ratings yet
Cmmi Acquisition Module (CMMI-AM), Version 1.1
49 pages
Dynamic Load Management for EV Charging
No ratings yet
Dynamic Load Management for EV Charging
2 pages
Chapter 6
No ratings yet
Chapter 6
8 pages
01-Introduction To Materials Science & Crystalline Structure
No ratings yet
01-Introduction To Materials Science & Crystalline Structure
38 pages
Sustainability Scorecard 2016
No ratings yet
Sustainability Scorecard 2016
30 pages

Unit-1 - Session 1 - Supervised & Unsupervised

Uploaded by

Unit-1 - Session 1 - Supervised & Unsupervised

Uploaded by

21CSC305P- MACHINE

Machine learning- What and Why, Supervised Learning,

The need for probabilistic predictions:

Here are some examples of real-world regression problems.

You might also like