Unsupervised learning is a type of machine learning where models are trained on
unlabeled data, meaning there are no predefined output labels. The goal is to
discover underlying patterns, structures, or relationships within the data without
prior knowledge of the outcomes. Unsupervised learning is often used for
clustering, dimensionality reduction, and anomaly detection.
Some common types of unsupervised learning methods include:
1. Clustering - Groups similar data points together. Popular algorithms
include:
o k-Means: Partitions data into a predefined number of clusters by
minimizing the distance between points and their respective cluster
centroids.
o Hierarchical Clustering: Builds a hierarchy of clusters, which can
be visualized as a tree (dendrogram).
o DBSCAN: Groups points based on density, allowing for discovery of
clusters of varying shapes and sizes.
2. Dimensionality Reduction - Reduces the number of features while
retaining the essential information. Techniques include:
o Principal Component Analysis (PCA): Finds directions (principal
components) that maximize variance, projecting data into a lower-
dimensional space.
o t-SNE: Projects high-dimensional data into two or three dimensions
for visualization, preserving local relationships.
3. Anomaly Detection - Identifies rare or unusual data points that deviate
from the norm, useful in detecting fraud, network intrusions, or equipment
malfunctions.
Unsupervised learning is widely used in exploratory data analysis,
recommendation systems, and image and text processing to uncover structure in
data that can later inform other types of machine learning or business decisions.