0% found this document useful (0 votes)
7 views19 pages

Criterion Functions

The document discusses low-dimensional analysis, focusing on dimensionality reduction techniques such as feature selection and extraction, as well as methods like PCA, LDA, and t-SNE. It also covers clustering as an unsupervised learning technique, explaining criterion functions used to evaluate clustering quality, including WCSS, BCSS, Silhouette Score, and DBI. Real-world applications of these concepts include customer segmentation and bioinformatics.

Uploaded by

preethippalankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views19 pages

Criterion Functions

The document discusses low-dimensional analysis, focusing on dimensionality reduction techniques such as feature selection and extraction, as well as methods like PCA, LDA, and t-SNE. It also covers clustering as an unsupervised learning technique, explaining criterion functions used to evaluate clustering quality, including WCSS, BCSS, Silhouette Score, and DBI. Real-world applications of these concepts include customer segmentation and bioinformatics.

Uploaded by

preethippalankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Low Dimensional

Analysis
By : Preethi P Palankar
MCA23905
• Introduction to Low-Dimensional
Data

• Types of Low-Dimensional Analysis


Agenda • Techniques of Low-Dimensional
Analysis

• Applications of Low-Dimensional
Analysis
Introduction to Low-Dimensional Data

• Also Known as Dimensionality Reduced Data

• Dimensionality Reduction is the process of reducing the


number of input variables or features in a dataset while
preserving its essential patterns and structures.

• It aims to eliminate irrelevant, noisy, or redundant data


and convert high-dimensional data into a more
manageable and interpretable form.
Types of Dimensionality Reduction

1.Feature Selection
Selects a subset of the most relevant original features without altering
them.
Example: Removing unnecessary columns such as ID numbers or
constant values.
2.Feature Extraction
Transforms the data into a new feature space, often combining
multiple original features into a smaller number of informative ones.
Example: In House dataset , square footage and number of rooms
are related features, so combine them into a "house size" feature.
Techniques of Low-Dimensional
Analysis

1. Principal Component Analysis (PCA)

2. Linear Discriminant Analysis (LDA)

3. t-Distributed Stochastic Neighbor Embedding


(t-SNE)
Principal Component Analysis
(PCA)

• PCA is an unsupervised linear transformation


technique that projects data onto a lower-dimensional
space by identifying directions (principal components)
that maximize variance.

• Preserves maximum information with fewer


dimensions

• Used in exploratory data analysis and pre-processing


Linear Discriminant Analysis (LDA)

• LDA is a supervised technique that projects data in a


way that maximizes the separation between classes.

• Effective for classification tasks

• Considers both within-class and between-class


variance
t-Distributed Stochastic
Neighbor Embedding (t-SNE)

• t-SNE is a non-linear technique primarily used for visualizing


high-dimensional data by reducing it to two or three
dimensions while preserving local structure.
• Ideal for visualizing clusters and complex relationships
• Commonly used in image and text data
Criterion Functions for
Clustering
• Introduction to Clustering

Agenda • What Are Criterion Functions?

• Types of Criterion Functions

• Real-World Use Cases


Introduction to Clustering

• Clustering is an unsupervised machine learning technique that


involves partitioning a dataset into groups, or clusters, such that
data points within the same cluster are more similar to each
other than to those in other clusters.

• A company uses clustering to group customers based on age,


income, and buying habits.

• For example: one group may be frequent buyers, another


occasional buyers.
What Are Criterion Functions?

• Criterion functions are mathematical formulas or evaluation measures


used to evaluate the quality of clusters formed by a clustering algorithm.

• These functions help to optimize the clustering process by measuring:

• Intra-cluster similarity : How close data points in a cluster are to


each other

• Inter-cluster dissimilarity : How different one cluster is from another.


Types of Criterion Functions

• Within-Cluster Sum of Squares (WCSS)

• Between-Cluster Sum of Squares (BCSS)

• Silhouette Score

• Davies–Bouldin Index (DBI)


Within-Cluster Sum of Squares
(WCSS)
➤ WCSS measures how close the data points in a cluster are to
the centroid (mean point) of that cluster.
➤ A lower WCSS means that points are tightly packed (compact),
which is ideal in clustering.
Formula:
Between-Cluster Sum of Squares
(BCSS)
➤ BCSS measures how far apart the clusters are from each other.
It looks at the distance between each cluster centroid and the
overall dataset centroid.
➤ A higher BCSS is better because it shows that clusters are well-
separated.
Formula:
Silhouette Score

➤ Silhouette score checks how well each point fits in its own
cluster compared to other clusters.

➤ Value ranges from -1 to 1

• Close to 1: Good clustering

• Close to 0: Borderline

• Below 0: Wrong clustering


Formula:
Davies–Bouldin Index (DBI)

DBI measures the similarity between clusters, based on the


ratio of intra-cluster distances and inter-cluster separation.
➤ Lower DBI = better clusters
➤ Higher DBI = clusters are too similar/overlapping
Formula:
Real-World Use Cases

•Customer segmentation
•Image segmentation
•Document or topic clustering
•Bioinformatics (e.g., gene
clustering)

18
Thank you

You might also like