Presentation Number – 1
Anomaly Detection Using Unsupervised Learning
Jaya Vishnu Priya
22W91A6620
B.Tech. IV Year – I Sem. CSM-A
Deep Learning
Under the Guidance
Of
Mr.T.Sathish
Assistant Professor
Department of Computer Science and Engineering
Malla Reddy Institute of Engineering & Technology
Introduction to Anomaly Detection Using Unsupervised
Learning
• Anomaly detection involves identifying data
points that deviate significantly from the
norm.
• Unsupervised learning techniques are
commonly used when labeled data is
unavailable or scarce.
• This approach is essential in various
applications such as fraud detection,
network security, and fault diagnosis.
What Are Anomalies?
• Anomalies, also known as outliers or
novelties, are data points that differ
markedly from other observations.
• They can indicate critical incidents, such as
system failures or security breaches.
• Detecting anomalies helps in maintaining
system integrity and improving decision-
making processes.
Types of Anomalies
• Point anomalies are individual data points
that are significantly different from the rest.
• Contextual anomalies depend on the context
or environment, such as seasonality in time
series data.
• Collective anomalies involve a group of data
points that collectively exhibit abnormal
behavior.
Unsupervised Learning Techniques for Anomaly Detection
• Clustering algorithms like k-means and
DBSCAN are used to identify data points
that do not belong to any cluster.
• Density-based methods detect regions of
high density separated by low-density
regions, revealing outliers.
• Dimensionality reduction techniques, such
as PCA, help visualize and detect anomalies
in high-dimensional data.
Clustering-Based Anomaly Detection
• Clustering groups similar data points
together, making outliers stand out as points
that do not belong to any cluster.
• DBSCAN is particularly effective because it
can detect arbitrarily shaped clusters and
noise points.
• The choice of parameters like epsilon and
minimum samples greatly influences the
detection performance.
Density-Based Methods
• Density-based algorithms identify anomalies
as points in low-density regions relative to
their surroundings.
• These methods are robust to clusters of
varying shapes and sizes.
• Popular algorithms include Local Outlier
Factor (LOF) and Density-Based Spatial
Clustering (DBSCAN).
Dimensionality Reduction Techniques
• Techniques like Principal Component
Analysis (PCA) reduce data complexity
while preserving variance.
• PCA helps visualize high-dimensional data
and spot outliers that deviate from the main
distribution.
• Combining dimensionality reduction with
clustering enhances anomaly detection
performance.
Challenges in Unsupervised Anomaly Detection
• The lack of labeled data makes it difficult to
evaluate the accuracy of anomaly detection
models.
• High-dimensional data can cause the "curse
of dimensionality," complicating the
detection process.
• Choosing appropriate parameters and
methods requires domain knowledge and
experimentation.
Applications of Unsupervised Anomaly Detection
• Fraud detection in finance relies heavily on
identifying unusual transactions.
• Network security uses anomaly detection to
identify potential cyber-attacks.
• Manufacturing and industrial systems
employ anomaly detection for predictive
maintenance.
References
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly Detection: A Survey. ACM Computing
Surveys, 41(3), 1-58.
Hodge, V. J., & Jobson, D. J. (2004). Survey of Outlier Detection Methodologies. Artificial
Intelligence Review, 22(2), 85-126.
Ahmed, M., Mahmood, A. N., & Hu, J. (2016). A Survey of Anomaly Detection Techniques in
Financial Domains. Future Generation Computer Systems, 55, 278-288.