0% found this document useful (0 votes)

7 views19 pages

Criterion Functions

The document discusses low-dimensional analysis, focusing on dimensionality reduction techniques such as feature selection and extraction, as well as methods like PCA, LDA, and t-SNE. It also covers clustering as an unsupervised learning technique, explaining criterion functions used to evaluate clustering quality, including WCSS, BCSS, Silhouette Score, and DBI. Real-world applications of these concepts include customer segmentation and bioinformatics.

Uploaded by

preethippalankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views19 pages

Criterion Functions

Uploaded by

preethippalankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Low Dimensional

Analysis
By : Preethi P Palankar
MCA23905
• Introduction to Low-Dimensional
Data

• Types of Low-Dimensional Analysis

Agenda • Techniques of Low-Dimensional
Analysis

• Applications of Low-Dimensional
Analysis
Introduction to Low-Dimensional Data

• Also Known as Dimensionality Reduced Data

• Dimensionality Reduction is the process of reducing the

number of input variables or features in a dataset while
preserving its essential patterns and structures.

• It aims to eliminate irrelevant, noisy, or redundant data

and convert high-dimensional data into a more
manageable and interpretable form.
Types of Dimensionality Reduction

1.Feature Selection
Selects a subset of the most relevant original features without altering
them.
Example: Removing unnecessary columns such as ID numbers or
constant values.
2.Feature Extraction
Transforms the data into a new feature space, often combining
multiple original features into a smaller number of informative ones.
Example: In House dataset , square footage and number of rooms
are related features, so combine them into a "house size" feature.
Techniques of Low-Dimensional
Analysis

1. Principal Component Analysis (PCA)

2. Linear Discriminant Analysis (LDA)

3. t-Distributed Stochastic Neighbor Embedding

(t-SNE)
Principal Component Analysis
(PCA)

• PCA is an unsupervised linear transformation

technique that projects data onto a lower-dimensional
space by identifying directions (principal components)
that maximize variance.

• Preserves maximum information with fewer

dimensions

• Used in exploratory data analysis and pre-processing

Linear Discriminant Analysis (LDA)

• LDA is a supervised technique that projects data in a

way that maximizes the separation between classes.

• Effective for classification tasks

• Considers both within-class and between-class

variance
t-Distributed Stochastic
Neighbor Embedding (t-SNE)

• t-SNE is a non-linear technique primarily used for visualizing

high-dimensional data by reducing it to two or three
dimensions while preserving local structure.
• Ideal for visualizing clusters and complex relationships
• Commonly used in image and text data
Criterion Functions for
Clustering
• Introduction to Clustering

Agenda • What Are Criterion Functions?

• Types of Criterion Functions

• Real-World Use Cases

Introduction to Clustering

• Clustering is an unsupervised machine learning technique that

involves partitioning a dataset into groups, or clusters, such that
data points within the same cluster are more similar to each
other than to those in other clusters.

• A company uses clustering to group customers based on age,

income, and buying habits.

• For example: one group may be frequent buyers, another

occasional buyers.
What Are Criterion Functions?

• Criterion functions are mathematical formulas or evaluation measures

used to evaluate the quality of clusters formed by a clustering algorithm.

• These functions help to optimize the clustering process by measuring:

• Intra-cluster similarity : How close data points in a cluster are to

each other

• Inter-cluster dissimilarity : How different one cluster is from another.

Types of Criterion Functions

• Within-Cluster Sum of Squares (WCSS)

• Between-Cluster Sum of Squares (BCSS)

• Silhouette Score

• Davies–Bouldin Index (DBI)

Within-Cluster Sum of Squares
(WCSS)
➤ WCSS measures how close the data points in a cluster are to
the centroid (mean point) of that cluster.
➤ A lower WCSS means that points are tightly packed (compact),
which is ideal in clustering.
Formula:
Between-Cluster Sum of Squares
(BCSS)
➤ BCSS measures how far apart the clusters are from each other.
It looks at the distance between each cluster centroid and the
overall dataset centroid.
➤ A higher BCSS is better because it shows that clusters are well-
separated.
Formula:
Silhouette Score

➤ Silhouette score checks how well each point fits in its own
cluster compared to other clusters.

➤ Value ranges from -1 to 1

• Close to 1: Good clustering

• Close to 0: Borderline

• Below 0: Wrong clustering

Formula:
Davies–Bouldin Index (DBI)

DBI measures the similarity between clusters, based on the

ratio of intra-cluster distances and inter-cluster separation.
➤ Lower DBI = better clusters
➤ Higher DBI = clusters are too similar/overlapping
Formula:
Real-World Use Cases

•Customer segmentation
•Image segmentation
•Document or topic clustering
•Bioinformatics (e.g., gene
clustering)

18
Thank you

Advanced Data Analysis Techniques 2
No ratings yet
Advanced Data Analysis Techniques 2
32 pages
Rangkuman Data Analitik Dan Big Data
No ratings yet
Rangkuman Data Analitik Dan Big Data
10 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
51 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
119 pages
Discriminant and Cluster Analysis Methods
No ratings yet
Discriminant and Cluster Analysis Methods
15 pages
Day School 03
No ratings yet
Day School 03
32 pages
Data Analytics 2marks PDF
100% (1)
Data Analytics 2marks PDF
13 pages
Unit 4
No ratings yet
Unit 4
65 pages
ML 4
No ratings yet
ML 4
14 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
Clustering Evaluation
No ratings yet
Clustering Evaluation
13 pages
Understanding Cluster Analysis Basics
No ratings yet
Understanding Cluster Analysis Basics
51 pages
Data Mining With Clustering: Dr. Mahesh Fernando
No ratings yet
Data Mining With Clustering: Dr. Mahesh Fernando
55 pages
Cluster Analysis Techniques
No ratings yet
Cluster Analysis Techniques
98 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
70 pages
Cluster Analysis in Construction
No ratings yet
Cluster Analysis in Construction
23 pages
UG BSF Clustering
No ratings yet
UG BSF Clustering
119 pages
Clustering in Data Mining Guide
No ratings yet
Clustering in Data Mining Guide
39 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Unit-4 Dimensionality Reduction
No ratings yet
Unit-4 Dimensionality Reduction
17 pages
Graph Partitioning & Clustering Techniques
No ratings yet
Graph Partitioning & Clustering Techniques
14 pages
Lecture 3.2.1 3.2.2
No ratings yet
Lecture 3.2.1 3.2.2
28 pages
Data Mining: Dimensionality Reduction
No ratings yet
Data Mining: Dimensionality Reduction
135 pages
Data Mining and Statistical Concepts Explained
No ratings yet
Data Mining and Statistical Concepts Explained
13 pages
Cluster Analysis in Data Mining
No ratings yet
Cluster Analysis in Data Mining
36 pages
Data Mining: Techniques and Methods
No ratings yet
Data Mining: Techniques and Methods
20 pages
Comparative Analysis of Clustering Techniques
No ratings yet
Comparative Analysis of Clustering Techniques
13 pages
Chapter4 Clustering
No ratings yet
Chapter4 Clustering
77 pages
ML Chapter 4
No ratings yet
ML Chapter 4
38 pages
#CH-2 2 2
No ratings yet
#CH-2 2 2
16 pages
Unit 2 Part 4
No ratings yet
Unit 2 Part 4
47 pages
Data Pre-processing Techniques Explained
No ratings yet
Data Pre-processing Techniques Explained
101 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Lecture 7
No ratings yet
Lecture 7
45 pages
Clustering 1
No ratings yet
Clustering 1
75 pages
MR Unit 3
No ratings yet
MR Unit 3
16 pages
Data User 0 Com - Microsoft.office - Officehubrow Files Tempoffice OfficeMobilePdf DWDM UNIT-4
No ratings yet
Data User 0 Com - Microsoft.office - Officehubrow Files Tempoffice OfficeMobilePdf DWDM UNIT-4
81 pages
Unit 5
No ratings yet
Unit 5
13 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Data Viz For GC High Dim Data
No ratings yet
Data Viz For GC High Dim Data
82 pages
Week 8 Notes - DM
No ratings yet
Week 8 Notes - DM
26 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Chapter 7. Cluster Analysis
No ratings yet
Chapter 7. Cluster Analysis
120 pages
Unit VI Clustering
No ratings yet
Unit VI Clustering
72 pages
10-2 Data Analysis and Pre-Processing Part 4 PDF
No ratings yet
10-2 Data Analysis and Pre-Processing Part 4 PDF
23 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Week 2
No ratings yet
Week 2
96 pages
Data Mining Basics for Beginners
100% (1)
Data Mining Basics for Beginners
7 pages
Cluster Analysis Techniques Guide
No ratings yet
Cluster Analysis Techniques Guide
97 pages
Tasks and Functionalities of Data Mining
No ratings yet
Tasks and Functionalities of Data Mining
3 pages
FML 4
No ratings yet
FML 4
26 pages
ML Unit 4
No ratings yet
ML Unit 4
20 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
QA Test - 05
No ratings yet
QA Test - 05
8 pages
R Unit 2 Notes
No ratings yet
R Unit 2 Notes
14 pages
Claims Reserving Manual V1 Complete
100% (3)
Claims Reserving Manual V1 Complete
470 pages
Energy Analysis On A Crude Preheat Train
No ratings yet
Energy Analysis On A Crude Preheat Train
12 pages
Simulation Tutorial
No ratings yet
Simulation Tutorial
11 pages
Linear vs Binary Search Explained
No ratings yet
Linear vs Binary Search Explained
17 pages
Apple - LeetCode PDF
No ratings yet
Apple - LeetCode PDF
11 pages
Clasificacion de Graficos
No ratings yet
Clasificacion de Graficos
5 pages
Machine Learning Methods in Environmental Sciences
100% (2)
Machine Learning Methods in Environmental Sciences
365 pages
Petri Nets - PETERSON 1977
No ratings yet
Petri Nets - PETERSON 1977
30 pages
6th Semester Course Offerings
No ratings yet
6th Semester Course Offerings
3 pages
Matrices: Elementary Matrix Theory
No ratings yet
Matrices: Elementary Matrix Theory
17 pages
AIMO 2017 Trial G8 Paper
No ratings yet
AIMO 2017 Trial G8 Paper
6 pages
Grade 6 Curriculum Map 2018-19 Key: Math in Focus Course 1 (MIF)
No ratings yet
Grade 6 Curriculum Map 2018-19 Key: Math in Focus Course 1 (MIF)
14 pages
Babloyantz (1985a)
No ratings yet
Babloyantz (1985a)
5 pages
Confidence Intervals and Sample Size: © Mcgraw-Hill, Bluman, 5 Ed., Chapter 7
No ratings yet
Confidence Intervals and Sample Size: © Mcgraw-Hill, Bluman, 5 Ed., Chapter 7
75 pages
Physics Chapter 3 f4 KSSM (SPM Notes 4.0)
No ratings yet
Physics Chapter 3 f4 KSSM (SPM Notes 4.0)
18 pages
Essential Postulates of Euclidean Geometry
No ratings yet
Essential Postulates of Euclidean Geometry
4 pages
Mechanical Properties of Metals Explained
No ratings yet
Mechanical Properties of Metals Explained
18 pages
8 - Maths 16 Playing With Numbers Ex - 16.2
No ratings yet
8 - Maths 16 Playing With Numbers Ex - 16.2
2 pages
Iso 2768-2, 1989 Geometrical Tolerances For Features Without Individual Tolerance Indications
No ratings yet
Iso 2768-2, 1989 Geometrical Tolerances For Features Without Individual Tolerance Indications
11 pages
Two Pointers for Coders
No ratings yet
Two Pointers for Coders
19 pages
ITTC Symbols and Terminology List
No ratings yet
ITTC Symbols and Terminology List
131 pages
Teacher's Guide in Math For Grade 5: Sherlyn T. Angeles
No ratings yet
Teacher's Guide in Math For Grade 5: Sherlyn T. Angeles
40 pages
CO 5 SKILL MATERIAL 22 23maths
No ratings yet
CO 5 SKILL MATERIAL 22 23maths
73 pages
Real Numbers and Their Operations Addition and Subtraction
No ratings yet
Real Numbers and Their Operations Addition and Subtraction
30 pages
Multi-Quadcopter Fault Control
No ratings yet
Multi-Quadcopter Fault Control
7 pages
Statistics Assignment
No ratings yet
Statistics Assignment
13 pages
Compressive Membrane Action in Prestressed Concrete Deck Slabs
No ratings yet
Compressive Membrane Action in Prestressed Concrete Deck Slabs
317 pages
F1 Math Syllabus 2223
No ratings yet
F1 Math Syllabus 2223
7 pages

Criterion Functions

Uploaded by

Criterion Functions

Uploaded by

Low Dimensional

• Types of Low-Dimensional Analysis

• Also Known as Dimensionality Reduced Data

• Dimensionality Reduction is the process of reducing the

• It aims to eliminate irrelevant, noisy, or redundant data

1. Principal Component Analysis (PCA)

2. Linear Discriminant Analysis (LDA)

3. t-Distributed Stochastic Neighbor Embedding

• PCA is an unsupervised linear transformation

• Preserves maximum information with fewer

• Used in exploratory data analysis and pre-processing

• LDA is a supervised technique that projects data in a

• Effective for classification tasks

• Considers both within-class and between-class

• t-SNE is a non-linear technique primarily used for visualizing

Agenda • What Are Criterion Functions?

• Types of Criterion Functions

• Real-World Use Cases

• Clustering is an unsupervised machine learning technique that

• A company uses clustering to group customers based on age,

• For example: one group may be frequent buyers, another

• Criterion functions are mathematical formulas or evaluation measures

• These functions help to optimize the clustering process by measuring:

• Intra-cluster similarity : How close data points in a cluster are to

• Inter-cluster dissimilarity : How different one cluster is from another.

• Within-Cluster Sum of Squares (WCSS)

• Between-Cluster Sum of Squares (BCSS)

• Davies–Bouldin Index (DBI)

➤ Value ranges from -1 to 1

• Close to 1: Good clustering

• Below 0: Wrong clustering

DBI measures the similarity between clusters, based on the

You might also like