0% found this document useful (0 votes)
22 views3 pages

Testing Unsupervised Learning

The document discusses techniques for testing overfitting in customer segmentation models using unsupervised learning, particularly through adapted cross-validation. It outlines the process of training a clustering algorithm on a training set and evaluating its performance on a validation set to check for generalization. Additionally, it explains the use of silhouette scores to identify overfitting by comparing the scores of training clusters to those of new data.

Uploaded by

Swaroop Vanteru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views3 pages

Testing Unsupervised Learning

The document discusses techniques for testing overfitting in customer segmentation models using unsupervised learning, particularly through adapted cross-validation. It outlines the process of training a clustering algorithm on a training set and evaluating its performance on a validation set to check for generalization. Additionally, it explains the use of silhouette scores to identify overfitting by comparing the scores of training clusters to those of new data.

Uploaded by

Swaroop Vanteru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

TESTING

UNSUPERVISED
LEARNING
This slide provides an overview of the techniques used
to test for overfitting in customer segmentation models.
CROSS-VALIDATION

Adapting Cross-
Training on the Applying to the Evaluating Cluster
Validation for
Training Set Validation Set Similarity
Unsupervised Learning

While cross-validation is The first step is to train the After training on the training If the cluster structures
typically used in supervised clustering algorithm on the set, the next step is to apply observed in the validation set
learning, it can be adapted for training set. This allows the the clustering algorithm to the are similar to those in the
unsupervised scenarios as model to learn the underlying validation set. This allows you training set, it suggests that the
well. The idea is to randomly patterns and structure in the to assess how well the model model has learned the true
split the data into training and data. generalizes to new, unseen underlying patterns in the data
'validation' sets, train the data. and is not overfitting.
clustering algorithm on the Significant differences in the
training set, and then apply it cluster structures may indicate
to the validation set to see if overfitting.
similar cluster structures
emerge.
SCORE ANALYSIS

What are Silhouette Scores? Identifying Overfitting with


Silhouette scores measure how similar an object is to Silhouette Scores
its own cluster compared to other clusters. A high
If the training clusters have very high silhouette scores
silhouette score indicates the object is well-matched to
but these scores drastically drop when new data is
its own cluster and poorly matched to neighboring
clustered, it might indicate that the model has overfit
clusters.
to the training data and is not generalizing well to new
samples.

You might also like