Pattern Recognition and Anomaly Detection
Assignment – Models for Anomaly detection
Submitted by:
Vinit Malik
500107219
R2142220298
B1 AIML NH SEM 6
Classification Based
Model : One-Class Support Vector Machine (OCSVM)
OCSM learns a boundary that covers the majority of normal data points in a high
dimensional space (features). The datapoints that fall outside of the boundary can be
classified as anomalies.
Nearest Neighbour Based
Model : k-Nearest Neighbours (k-NN)
During this approach, we assume that anomalies are isolated and normal data points
exist in dense neighbourhoods. The distance to its kth nearest neighbour for each
datapoint is calcluated. Points whose distance is significantly farther than the rest of the
points is considered and anomaly.
Clustering Based
Model : DBSCAN (Density based spatial clustering of applications with
noise)
DBSCAN specifically groups datapoints that are closely packed together, marking
points in low-density regions as outliers. These noise points in low-density regions are
marked as anomalies because they do not belong to any specific group/dense cluster of
data.
Statistical Methods
Model : Gaussian Model based detection
These methods assume that teh normal data points are generated from some
underlying statistical distribution, the parameters of this distribution are estimated from
the data. The data points with very low probability of being generated from the
distribution are then considered as anomalies.
Contextual Anomaly Detection
Model : Conditional / Contextual Statistical models
This method identifies anomalies based on specific contexts. Contextual attributes and
Behaviourial attributes are used as markers for identification of anomalies.
Collective Anomaly Detection
Model : Markov Models or LSTMs
These anomalies are a sequence of data points that are anomalous together, even if
individual data points may not be anomalous on their own. In this format of detectoin our
model looks for unusual subsequences or patterns on ordered data like time series or
log files.
Online Anomaly Detection
Model : Streaming KNN, Incremental PCA, Hoeffding Trees and Half-Space
trees
These methods are designed to process data points sequentially as they arrive, without
requiring storage of the entire dataset.
The model can be updated in increments with the addition of each new data point or
small batch.
Distributed Anomaly Detection
Model : Federated Algorithms
Each node processes its local data subset to compute partial results liek local anomaly
scores or model updates. The generated partial results are then aggregated either using
a central coordinator or by using peer to peer communication, hence being able to
minimize overhead while still mainting high detection accuracy.