0% found this document useful (0 votes)
18 views42 pages

Unit V Unsupervised Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views42 pages

Unit V Unsupervised Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

UNIT V UNSUPERVISED LEARNING

Unsupervised Learning - Principle Component Analysis - Neural Network: Fixed Weight


Competitive Nets - Kohonen Self-Organizing Feature Maps - Clustering: Definition - Types of
Clustering-Hierarchical clustering algorithms -k-means algorithm

What is Unsupervised Learning

Unsupervised learning is a type of machine learning that analyzes and models data without
labelled responses or predefined categories. Unlike supervised learning, where the algorithm
learns from input-output pairs, unsupervised learning algorithms work solely with input data
and aim to discover hidden patterns, structures or relationships within the dataset
independently, without any human intervention or prior knowledge of the data's meaning.

The image shows set of animals like elephants, camels and cows that represents raw data that
the unsupervised learning algorithm will process.

• The "Interpretation" stage signifies that the algorithm doesn't have predefined labels
or categories for the data. It needs to figure out how to group or organize the data
based on inherent patterns.

• An algorithm represents unsupervised learning process which can be clustering,


dimensionality reduction or anomaly detection to identify patterns in the data.

• The processing stage shows the algorithm working on the data.

The output shows the results of the unsupervised learning process. In this case, the algorithm
might have grouped the animals into clusters based on their species (elephants, camels,
cows).

1
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

Working of Unsupervised Learning

The working of unsupervised machine learning can be explained in these steps:

1. Collect Unlabeled Data

• Gather a dataset without predefined labels or categories.

• Example: Images of various animals without any tags.

2. Select an Algorithm

• Choose a suitable unsupervised algorithm such as clustering like K-Means, association


rule learning like Apriori or dimensionality reduction like PCA based on the goal.

3. Train the Model on Raw Data

• Feed the entire unlabeled dataset to the algorithm.

• The algorithm looks for similarities, relationships or hidden structures within the data.

4. Group or Transform Data

• The algorithm organizes data into groups (clusters), rules or lower-dimensional forms
without human input.

• Example: It may group similar animals together or extract key patterns from large
datasets.

5. Interpret and Use Results

• Analyze the discovered groups, rules or features to gain insights or use them for further
tasks like visualization, anomaly detection or as input for other models.

Unsupervised Learning Algorithms

There are mainly 3 types of Unsupervised Algorithms that are used:

1. Clustering Algorithms

Clustering is an unsupervised machine learning technique that groups unlabeled data into
clusters based on similarity. Its goal is to discover patterns or relationships within the data
without any prior knowledge of categories or labels.

• Groups data points that share similar features or characteristics.

• Helps find natural groupings in raw, unclassified data.

• Commonly used for customer segmentation, anomaly detection and data


organization.

• Works purely from the input data without any output labels.

2
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

• Enables understanding of data structure for further analysis or decision-making.

Some common clustering algorithms:

• K-means Clustering: Groups data into K clusters based on how close the points are to
each other.

• Hierarchical Clustering: Creates clusters by building a tree step-by-step, either merging


or splitting groups.

• Density-Based Clustering (DBSCAN): Finds clusters in dense areas and treats scattered
points as noise.

• Mean-Shift Clustering: Discovers clusters by moving points toward the most crowded
areas.

• Spectral Clustering: Groups data by analyzing connections between points using


graphs.

2. Association Rule Learning

Association rule learning is a rule-based unsupervised learning technique used to discover


interesting relationships between variables in large datasets. It identifies patterns in the form
of “if-then” rules, showing how the presence of some items in the data implies the presence
of others.

• Finds frequent item combinations and the rules connecting them.

• Commonly used in market basket analysis to understand product purchase


relationships.

• Helps retailers design promotions and cross-selling strategies.

Some common Association Rule Learning algorithms:

• Apriori Algorithm: Finds patterns by exploring frequent item combinations step-by-


step.

• FP-Growth Algorithm: An Efficient Alternative to Apriori. It quickly identifies frequent


patterns without generating candidate sets.

• Eclat Algorithm: Uses intersections of itemsets to efficiently find frequent patterns.

• Efficient Tree-based Algorithms: Scales to handle large datasets by organizing data in


tree structures.

3. Dimensionality Reduction

Dimensionality reduction is the process of decreasing the number of features or variables in


a dataset while retaining as much of the original information as possible. This technique helps

3
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

simplify complex data making it easier to analyze and visualize. It also improves the efficiency
and performance of machine learning algorithms by reducing noise and computational cost.

• It reduces the dataset’s feature space from many dimensions to fewer, more
meaningful ones.

• Helps focus on the most important traits or patterns in the data.

• Commonly used to improve model speed and reduce overfitting.

Here are some popular Dimensionality Reduction algorithms:

• Principal Component Analysis (PCA): Reduces dimensions by transforming data into


uncorrelated principal components.

• Linear Discriminant Analysis (LDA): Reduces dimensions while maximizing class


separability for classification tasks.

• Non-negative Matrix Factorization (NMF): Breaks data into non-negative parts to


simplify representation.

• Locally Linear Embedding (LLE): Reduces dimensions while preserving the


relationships between nearby points.

• Isomap: Captures global data structure by preserving distances along a manifold.

Applications of Unsupervised learning

Unsupervised learning has diverse applications across industries and domains. Key
applications include:

• Customer Segmentation: Algorithms cluster customers based on purchasing behavior


or demographics, enabling targeted marketing strategies.

• Anomaly Detection: Identifies unusual patterns in data, aiding fraud detection,


cybersecurity and equipment failure prevention.

• Recommendation Systems: Suggests products, movies or music by analyzing user


behavior and preferences.

• Image and Text Clustering: Groups similar images or documents for tasks like
organization, classification or content recommendation.

• Social Network Analysis: Detects communities or trends in user interactions on social


media platforms.

Advantages

• No need for labeled data: Works with raw, unlabeled data hence saving time and
effort on data annotation.

4
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

• Discovers hidden patterns: Finds natural groupings and structures that might be
missed by humans.

• Handles complex and large datasets: Effective for high-dimensional or vast amounts
of data.

• Useful for anomaly detection: Can identify outliers and unusual data points without
prior examples.

Challenges

Here are the key challenges of unsupervised learning:

• Noisy Data: Outliers and noise can distort patterns and reduce the effectiveness of
algorithms.

• Assumption Dependence: Algorithms often rely on assumptions (e.g., cluster shapes)


which may not match the actual data structure.

• Overfitting Risk: Overfitting can occur when models capture noise instead of
meaningful patterns in the data.

• Limited Guidance: The absence of labels restricts the ability to guide the algorithm
toward specific outcomes.

• Cluster Interpretability: Results such as clusters may lack clear meaning or alignment
with real-world categories.

• Sensitivity to Parameters: Many algorithms require careful tuning of hyperparameters


such as the number of clusters in k-means.

• Lack of Ground Truth: Unsupervised learning lacks labeled data making it difficult to
evaluate the accuracy of results.

Principal Component Analysis(PCA)

PCA (Principal Component Analysis) is a dimensionality reduction technique used in data


analysis and machine learning. It helps you to reduce the number of features in a dataset
while keeping the most important information. It changes your original features into new
features these new features don’t overlap with each other and the first few keep most of the
important differences found in the original data.

PCA is commonly used for data preprocessing for use with machine learning algorithms. It
helps to remove redundancy, improve computational efficiency and make data easier to
visualize and analyze especially when dealing with high-dimensional data.

5
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

How Principal Component Analysis Works

PCA uses linear algebra to transform data into new features called principal components. It
finds these by calculating eigenvectors (directions) and eigenvalues (importance) from the
covariance matrix. PCA selects the top components with the highest eigenvalues and projects
the data onto them simplify the dataset.

Note: It prioritizes the directions where the data varies the most because more variation =
more useful information.

Imagine you’re looking at a messy cloud of data points like stars in the sky and want to simplify
it. PCA helps you find the "most important angles" to view this cloud so you don’t miss the big
patterns. Here’s how it works step by step:

Step 1: Standardize the Data

Different features may have different units and scales like salary vs. age. To compare them
fairly PCA first standardizes the data by making each feature have:

• A mean of 0

• A standard deviation of 1

Z=X−μσZ=σX−μ

where:

• μμ is the mean of independent features μ={μ1,μ2,⋯,μm}μ={μ1,μ2,⋯,μm}

• σσ is the standard deviation of independent features σ={σ1,σ2,⋯,σm}σ={σ1,σ2,⋯,σm


}

Step 2: Calculate Covariance Matrix

Next PCA calculates the covariance matrix to see how features relate to each other whether
they increase or decrease together. The covariance between two features x1x1 and x2x2 is:

cov(x1,x2)=∑i=1n(x1i−x1ˉ)(x2i−x2ˉ)n−1cov(x1,x2)=n−1∑i=1n(x1i−x1ˉ)(x2i−x2ˉ)

Where:

• xˉ1andxˉ2xˉ1andxˉ2 are the mean values of features x1andx2x1andx2

• nn is the number of data points

The value of covariance can be positive, negative or zeros.

Step 3: Find the Principal Components

PCA identifies new axes where the data spreads out the most:

• 1st Principal Component (PC1): The direction of maximum variance (most spread).

6
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

• 2nd Principal Component (PC2): The next best direction, perpendicular to PC1 and so
on.

These directions come from the eigenvectors of the covariance matrix and their importance
is measured by eigenvalues. For a square matrix A an eigenvector X (a non-zero vector) and
its corresponding eigenvalue λ satisfy:

AX=λXAX=λX

This means:

• When A acts on X it only stretches or shrinks X by the scalar λ.

• The direction of X remains unchanged hence eigenvectors define "stable directions"


of A.

Eigenvalues help rank these directions by importance.

Step 4: Pick the Top Directions & Transform Data

After calculating the eigenvalues and eigenvectors PCA ranks them by the amount of
information they capture. We then:

1. Select the top k components hat capture most of the variance like 95%.

2. Transform the original dataset by projecting it onto these top components.

This means we reduce the number of features (dimensions) while keeping the important
patterns in the data.

7
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

In the above image the original dataset has two features "Radius" and "Area" represented by
the black axes. PCA identifies two new directions: PC₁ and PC₂ which are the principal
components.

• These new axes are rotated versions of the original ones. PC₁ captures the maximum
variance in the data meaning it holds the most information while PC₂ captures the
remaining variance and is perpendicular to PC₁.

• The spread of data is much wider along PC₁ than along PC₂. This is why PC₁ is chosen
for dimensionality reduction. By projecting the data points (blue crosses) onto PC₁ we
effectively transform the 2D data into 1D and retain most of the important structure
and patterns.

Implementation of Principal Component Analysis in Python

Hence PCA uses a linear transformation that is based on preserving the most variance in the
data using the least number of dimensions. It involves the following steps:

Step 1: Importing Required Libraries

We import the necessary library like pandas, numpy, scikit learn, seaborn and matplotlib to
visualize results.

import numpy as np

import pandas as pd

from [Link] import StandardScaler

from [Link] import PCA

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from [Link] import confusion_matrix

import [Link] as plt

import seaborn as sns

Step 2: Creating Sample Dataset

We make a small dataset with three features Height, Weight, Age and Gender.

data = {

'Height': [170, 165, 180, 175, 160, 172, 168, 177, 162, 158],

'Weight': [65, 59, 75, 68, 55, 70, 62, 74, 58, 54],

'Age': [30, 25, 35, 28, 22, 32, 27, 33, 24, 21],

8
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

'Gender': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0] # 1 = Male, 0 = Female

df = [Link](data)

print(df)

Output:

Step 3: Standardizing the Data

Since the features have different scales Height vs Age we standardize the data. This makes all
features have mean = 0 and standard deviation = 1 so that no feature dominates just because
of its units.

X = [Link]('Gender', axis=1)

y = df['Gender']

scaler = StandardScaler()

X_scaled = scaler.fit_transform(df)

Step 4: Applying PCA algorithm

• We reduce the data from 3 features to 2 new features called principal components.
These components capture most of the original information but in fewer dimensions.

• We split the data into 70% training and 30% testing sets.

9
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

• We train a logistic regression model on the reduced training data and predict gender
labels on the test set.

pca = PCA(n_components=2)

X_pca = pca.fit_transform(X_scaled)

X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.3, random_state=42)

model = LogisticRegression()

[Link](X_train, y_train)

y_pred = [Link](X_test)

Step 5: Evaluating with Confusion Matrix

The confusion matrix compares actual vs predicted labels. This makes it easy to see where
predictions were correct or wrong.

cm = confusion_matrix(y_test, y_pred)

[Link](figsize=(5,4))

[Link](cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Female', 'Male'],


yticklabels=['Female', 'Male'])

[Link]('Predicted Label')

[Link]('True Label')

[Link]('Confusion Matrix')

[Link]()

Output:

10
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

Step 6: Visualizing PCA Result

y_numeric = [Link](y)[0]

[Link](figsize=(12, 5))

[Link](1, 2, 1)

[Link](X_scaled[:, 0], X_scaled[:, 1], c=y_numeric, cmap='coolwarm', edgecolor='k', s=80)

[Link]('Original Feature 1')

[Link]('Original Feature 2')

[Link]('Before PCA: Using First 2 Standardized Features')

[Link](label='Target classes')

11
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

[Link](1, 2, 2)

[Link](X_pca[:, 0], X_pca[:, 1], c=y_numeric, cmap='coolwarm', edgecolor='k', s=80)

[Link]('Principal Component 1')

[Link]('Principal Component 2')

[Link]('After PCA: Projected onto 2 Principal Components')

[Link](label='Target classes')

plt.tight_layout()

[Link]()

Output:

• Left Plot Before PCA: This shows the original standardized data plotted using the first
two features. There is no guarantee of clear separation between classes as these are
raw input dimensions.

• Right Plot After PCA: This displays the transformed data using the top 2 principal
components. These new components capture the maximum variance often showing
better class separation and structure making it easier to analyze or model.

Advantages of Principal Component Analysis

1. Multicollinearity Handling: Creates new, uncorrelated variables to address issues


when original features are highly correlated.

2. Noise Reduction: Eliminates components with low variance enhance data clarity.

3. Data Compression: Represents data with fewer components reduce storage needs
and speeding up processing.

12
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

4. Outlier Detection: Identifies unusual data points by showing which ones deviate
significantly in the reduced space.

Disadvantages of Principal Component Analysis

1. Interpretation Challenges: The new components are combinations of original


variables which can be hard to explain.

2. Data Scaling Sensitivity: Requires proper scaling of data before application or results
may be misleading.

3. Information Loss: Reducing dimensions may lose some important information if too
few components are kept.

4. Assumption of Linearity: Works best when relationships between variables are linear
and may struggle with non-linear data.

5. Computational Complexity: Can be slow and resource-intensive on very large


datasets.

6. Risk of Overfitting: Using too many components or working with a small dataset might
lead to models that don't generalize well.

Introduction to Competitive Networks

Competitive networks are a family of neural network models in which neurons compete
among themselves to become active. Unlike feedforward networks where many neurons can
be active simultaneously, competitive nets implement a mechanism where typically only one
neuron, or a small subset of neurons, is declared the "winner." This principle is often called
the winner-takes-all (WTA) strategy.

The primary motivation behind competitive networks is to enable a system to categorize


inputs into classes or clusters without requiring complex weight adjustments. Such networks
are particularly useful when the categories are already known or when the system must
operate in an environment with fixed prototypes.

Concept of Fixed Weight Competitive Nets

In fixed weight competitive networks, the weights of the neurons are predefined and do not
change during learning. Each neuron corresponds to a prototype or class, and the neuron
whose weight vector most closely resembles the input vector becomes the winner.

The fixed weights act as reference vectors or templates against which input data is compared.
This allows the network to classify new inputs efficiently without performing iterative
learning.

Structure of a Fixed Weight Competitive Net

13
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

A typical fixed weight competitive net consists of the following components:

1. Input Layer:

o Accepts the input vector x=(x1,x2,…,xn)x = (x_1, x_2, \dots, x_n)x=(x1,x2,…,xn


).

o Inputs are usually normalized to ensure fair competition.

2. Competition Layer (Output Layer):

o Contains a set of neurons, each associated with a fixed weight vector wjw_jwj
.

o Each neuron computes a similarity measure between the input and its weight
vector.

3. Winner-Takes-All Mechanism:

o The neuron with the highest similarity score produces an output of 1 (winner).

o All other neurons output 0 (losers).

Working Principle

The working of a fixed weight competitive net can be explained step by step:

1. Input Presentation:
An input vector is presented to all neurons simultaneously.

2. Activation Calculation:
Each neuron computes its activation using a similarity measure, often the dot
product:

yj=x⋅wj=∑i=1nxiwjiy_j = x \cdot w_j = \sum_{i=1}^n x_i w_{ji}yj=x⋅wj=i=1∑nxiwji

where wjw_jwj is the weight vector of neuron jjj.

3. Competition Phase:
The network compares activations across all neurons.

4. Selection of Winner:
The neuron with the highest activation is selected as the winner.

5. Output Generation:
The winner neuron outputs 1, and all other neurons output 0.

This mechanism ensures that only the neuron whose weight vector is closest to the input
vector represents the input.

14
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

5. Mathematical Representation

• Suppose there are mmm neurons, each with an nnn-dimensional weight vector.

• Input vector: x=(x1,x2,…,xn)x = (x_1, x_2, \dots, x_n)x=(x1,x2,…,xn).

• Weight vectors: wj=(wj1,wj2,…,wjn)w_j = (w_{j1}, w_{j2}, \dots, w_{jn})wj=(wj1,wj2


,…,wjn) for j=1,2,…,mj = 1, 2, \dots, mj=1,2,…,m.

• Activation of neuron jjj:

yj=∑i=1nxiwjiy_j = \sum_{i=1}^{n} x_i w_{ji}yj=i=1∑nxiwji

• Winner selection rule:

Winner=arg max (yj)\text{Winner} = \arg \max_j (y_j)Winner=argjmax(yj)

Example

Let us consider a competitive network with three neurons:

• w1=(1,0)w_1 = (1, 0)w1=(1,0)

• w2=(0,1)w_2 = (0, 1)w2=(0,1)

• w3=(1,1)w_3 = (1, 1)w3=(1,1)

If input vector x=(0.9,0.2)x = (0.9, 0.2)x=(0.9,0.2):

• y1=(0.9)(1)+(0.2)(0)=0.9y_1 = (0.9)(1) + (0.2)(0) = 0.9y1=(0.9)(1)+(0.2)(0)=0.9

• y2=(0.9)(0)+(0.2)(1)=0.2y_2 = (0.9)(0) + (0.2)(1) = 0.2y2=(0.9)(0)+(0.2)(1)=0.2

• y3=(0.9)(1)+(0.2)(1)=1.1y_3 = (0.9)(1) + (0.2)(1) = 1.1y3=(0.9)(1)+(0.2)(1)=1.1

Neuron 3 has the maximum response (1.1), so it wins.


Output = (0, 0, 1).

This shows that the input is most similar to the prototype represented by w3w_3w3.

Characteristics

• Weights are fixed → no training or adaptation is required.

• Fast classification → simple dot product or distance calculations.

• Prototype-based → neurons represent predefined categories.

• Winner-takes-all mechanism ensures clarity in decision making.

Applications

Fixed weight competitive networks are used in areas where prototypes are known in advance,
and rapid classification is required. Some applications include:

15
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

• Pattern Classification: Recognizing input patterns against stored templates (e.g.,


character recognition).

• Clustering with predefined centroids: Assigning inputs to the nearest group.

• Vector Quantization: Used in image compression and speech processing.

• Signal Processing: Mapping noisy inputs to standard reference signals.

• Speech Recognition: Classifying speech features into known phoneme categories.

• Biological Modeling: Mimicking competitive behavior observed in brain neurons.

Advantages

• Simple and computationally efficient.

• Requires no training or weight adjustment.

• Robust for applications where reference prototypes are stable.

• Can handle real-time classification tasks.

Limitations

• Lack of adaptability: Cannot learn new patterns or adjust to changes in input


distribution.

• Performance heavily depends on the choice of fixed weights.

• Not suitable for dynamic or evolving datasets.

Fixed weight competitive networks are simple yet powerful models for classification and
pattern recognition tasks where reference prototypes are predetermined. Their strength lies
in their efficiency and ability to perform fast recognition without requiring complex learning
algorithms. However, their limitation is their inability to adapt, which makes them suitable
only for static problems.

Self Organizing Maps - Kohonen Maps

A Self Organizing Map (SOM) or Kohonen Map is an unsupervised neural network algorithm
based on biological neural models from the 1970s. It uses a competitive learning approach
and is primarily designed for clustering and dimensionality reduction. SOM effectively maps
high-dimensional data to a lower-dimensional grid enabling easier interpretation and
visualization of complex datasets.

16
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

It consists of two primary layers:

• Input Layer: Represents the features of the data.

• Output Layer: Arranged as a 2D grid of neurons with each neuron representing a


cluster in the data.

Working of Self Organizing Maps (SOM)

Here are the step by step explanation of its working:

1. Initialization

The weights of the output neurons are randomly initialized. These weights represent the
features of each neuron and will be adjusted during training.

2. Competition

For each input vector, SOM computes the Euclidean distance between the input and the
weight vectors of all neurons. The neuron with the smallest distance is the winning neuron.

Formula :D(j)=∑i=1n(wij−xi)2D(j)=∑i=1n(wij−xi)2

where

• D(j)D(j) is the distance for neuron

• jj and nn is the number of features.

3. Weight Update

17
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

The winning neuron’s weights are updated to move closer to the input vector. The weights of
neighboring neurons are also adjusted but with smaller changes.

Formula: wij(new)=wij(old)+α⋅(xi−wij(old))wij(new)=wij(old)+α⋅(xi−wij(old))

where

• αα is the learning rate

• xixi is the input feature.

4. Learning Rate Decay

The learning rate α\alphaα decreases over time allowing the map to converge to stable values.

Formula: α(t+1)=0.5⋅α(t)α(t+1)=0.5⋅α(t)

5. Stopping Condition

The training stops when the maximum number of epochs is reached or when the weights
converge.

Implementation of SOM in Python

Now let’s walk through a Python implementation of the SOM algorithm. The code is divided
into blocks for clarity.

1. Importing Required Libraries

We will use the math library to compute the Euclidean distance between the input vector and
the weight vector.

import math

2. Defining the SOM Class

In this class, we define two important functions, winner() to compute the winning neuron by
calculating the Euclidean distance between the input and weight vectors of each cluster and
update() to update the weight vectors of the winning neuron according to the weight update
rule.

class SOM:

def winner(self, weights, sample):

D0 = 0

D1 = 0

for i in range(len(sample)):

D0 += [Link]((sample[i] - weights[0][i]), 2)

18
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

D1 += [Link]((sample[i] - weights[1][i]), 2)

return 0 if D0 < D1 else 1

def update(self, weights, sample, J, alpha)

for i in range(len(weights[0])):

weights[J][i] = weights[J][i] + alpha * (sample[i] - weights[J][i])

return weights

3. Defining the Main Function

In this section, we define the training data and initialize the weights. We also specify the
number of epochs and the learning rate.

• T: This is the training data with four examples, each having four features.

• weights: These are the initial weights for two clusters, each with four features.

• epochs: Specifies the number of iterations for training.

• alpha: The learning rate used for updating weights.

def main():

T = [[1, 1, 0, 0], [0, 0, 0, 1], [1, 0, 0, 0], [0, 0, 1, 1]]

m, n = len(T), len(T[0])

weights = [[0.2, 0.6, 0.5, 0.9], [0.8, 0.4, 0.7, 0.3]]

ob = SOM()

epochs = 3

alpha = 0.5

4. Training the SOM Network

Here, we loop through each training example for the specified number of epochs, compute
the winning cluster and update the weights. For each epoch and training sample we:

• Compute the winning cluster using the winner() method.

• Update the winning cluster’s weights using the update() method.

# Inside the "main" function

for i in range(epochs):
19
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

for j in range(m):

sample = T[j]

J = [Link](weights, sample)

weights = [Link](weights, sample, J, alpha)

5. Classifying Test Sample

After training the SOM network, we use a test sample s and classify it into one of the clusters
by computing which cluster has the closest weight vector to the input sample. Finally, we print
the cluster assignment and the trained weights for each cluster.

# Inside the "Main" function

s = [0, 0, 0, 1]

J = [Link](weights, s)

print("Test Sample s belongs to Cluster: ", J)

print("Trained weights: ", weights)

6. Running the Main Function

The following block runs the main() function when the script is executed.

if __name__ == "__main__":

main()

Output:

The output will display which cluster the test sample belongs to and the final trained weights
of the clusters.

Test Sample s belongs to Cluster: 0


Trained weights: [[0.6000000000000001, 0.8, 0.5, 0.9], [0.3333984375, 0.0666015625, 0.7,
0.3]]

• Cluster 0 is assigned to the test sample [0, 0, 0, 1].

• The trained weights for both clusters are displayed after 3 epochs.

Clustering in Machine Learning

Clustering is an unsupervised machine learning technique that groups similar data points
together into clusters based on their characteristics, without using any labeled data. The

20
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

objective is to ensure that data points within the same cluster are more similar to each other
than to those in different clusters, enabling the discovery of natural groupings and hidden
patterns in complex datasets.

• Goal: Discover the natural grouping or structure in unlabeled data without


predefined categories.

• How: Data points are assigned to clusters based on similarity or distance measures.

• Similarity Measures: Can include Euclidean distance, cosine similarity or other


metrics depending on data type and clustering method.

• Output: Each group is assigned a cluster ID, representing shared characteristics within
the cluster.

For example, if we have customer purchase data, clustering can group customers with similar
shopping habits. These clusters can then be used for targeted marketing, personalized
recommendations or customer segmentation.

Types of Clustering

Let's see the types of clustering,

21
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

1. Hard Clustering: In hard clustering, each data point strictly belongs to exactly one cluster,
no overlap is allowed. This approach assigns a clear membership, making it easier to interpret
and use for definitive segmentation tasks.

• Example: If clustering customer data into 2 segments, each customer belongs fully to
either Cluster 1 or Cluster 2 without partial memberships.

• Use cases: Market segmentation, customer grouping, document clustering.

• Limitations: Cannot represent ambiguity or overlap between groups; boundaries are


crisp.

Let's see an example to see the difference between the hard and soft clustering using a
distribution,

Data Point Hard Clustering Soft Clustering

A Cluster 1 Cluster 1: 0.91, Cluster 2: 0.09

B Cluster 2 Cluster 1: 0.30, Cluster 2: 0.70

C Cluster 3 Cluster 1: 0.17, Cluster 2: 0.83

D Cluster 4 Cluster 1: 1.00, Cluster 2: 0.00

2. Soft Clustering: Soft clustering assigns each data point a probability or degree of
membership to multiple clusters simultaneously, allowing data points to partially belong to
several groups.

• Example: A data point may have a 70% membership in Cluster 1 and 30% in Cluster 2,
reflecting uncertainty or overlap in group characteristics.

• Use cases: Situations with overlapping class boundaries, fuzzy categories like customer
personas or medical diagnosis.

• Benefits: Captures ambiguity in data, models gradual transitions between clusters.

Types of Clustering Methods

Clustering methods can be classified on the basis of how they for clusters,

1. Centroid-based Clustering (Partitioning Methods)

22
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

Centroid-based clustering organizes data points around central prototypes called centroids,
where each cluster is represented by the mean (or medoid) of its members. The number of
clusters is specified in advance and the algorithm allocates points to the nearest centroid,
making this technique efficient for spherical and similarly sized clusters but sensitive to
outliers and initialization.

Algorithms:

• K-means: Iteratively assigns points to nearest centroid and recalculates centroids to


minimize intra-cluster variance.

• K-medoids: Similar to K-means but uses actual data points (medoids) as centers, robust
to outliers.

Pros:

• Fast and scalable for large datasets.

• Simple to implement and interpret.

Cons:

• Requires pre-knowledge of kk.

• Sensitive to initialization and outliers.

• Not suitable for non-spherical clusters.

2. Density-based Clustering (Model-based Methods)

Density-based clustering defines clusters as contiguous regions of high data density separated
by areas of lower density. This approach can identify clusters of arbitrary shapes, handles
noise well and does not require predefining the number of clusters, though its effectiveness
depends on chosen density parameters.

Algorithms:

• DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Groups points


with sufficient neighbors; labels sparse points as noise.

• OPTICS (Ordering Points To Identify Clustering Structure): Extends DBSCAN to handle


varying densities.

Pros:

• Handles clusters of varying shapes and sizes.

• Does not require cluster count upfront.

• Effective in noisy datasets.

23
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

Cons:

• Difficult to choose parameters like epsilon and min points.

• Less effective for varying density clusters (except OPTICS).

3. Connectivity-based Clustering (Hierarchical Clustering)

Connectivity-based (or hierarchical) clustering builds nested groupings of data by evaluating


how data points are connected to their neighbors. It creates a dendrogram—a tree-like
structure—that reflects relationships at various granularity levels and does not require
specifying cluster numbers in advance, but can be computationally intensive.

Approaches:

• Agglomerative (Bottom-up): Start with each point as a cluster; iteratively merge


closest clusters.

• Divisive (Top-down): Start with one cluster; iteratively split into smaller clusters.

Pros:

• Provides a full hierarchy, easy to visualize.

• No need to specify number of clusters upfront.

Cons:

• Computationally intensive for large datasets.

• Merging/splitting decisions are irreversible.

4. Distribution-based Clustering

Distribution-based clustering assumes data is generated from a mixture of probability


distributions, such as Gaussian distributions and assigns points to clusters based on statistical
likelihood. This method supports clusters with flexible shapes and overlaps, but usually
requires specifying the number of distributions.

Algorithm:

• Gaussian Mixture Model (GMM): Fits data as a weighted mixture of Gaussian


distributions; assigns data points based on likelihood.

Pros:

• Flexible cluster shapes.

• Provides probabilistic memberships.

• Suitable for overlapping clusters.

24
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

Cons:

• Requires specifying number of components.

• Computationally more expensive.

• Sensitive to initialization.

5. Fuzzy Clustering

Fuzzy clustering extends traditional methods by allowing each data point to belong to multiple
clusters with varying degrees of membership. This approach captures ambiguity and soft
boundaries in data and is particularly useful when the clusters overlap or boundaries are not
clear-cut.

Algorithm:

• Fuzzy C-Means: Similar to K-means but with fuzzy memberships updated iteratively.

Pros:

• Models data ambiguity explicitly.

• Useful for complex or imprecise data.

Cons:

• Choosing fuzziness parameter can be tricky.

• Computational overhead compared to hard clustering.

Use Cases

• Customer Segmentation: Grouping customers based on behavior or demographics for


targeted marketing and personalized services.

• Anomaly Detection: Identifying outliers or fraudulent activities in finance, network


security and sensor data.

• Image Segmentation: Dividing images into meaningful parts for object detection,
medical diagnostics or computer vision tasks.

• Recommendation Systems: Clustering user preferences to recommend movies,


products or content tailored to different groups.

• Market Basket Analysis: Discovering products frequently bought together to optimize


store layouts and promotions.

25
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

Hierarchical Clustering in Machine Learning

Hierarchical clustering is an unsupervised learning technique used to group similar data points
into clusters by building a hierarchy (tree-like structure). Unlike flat clustering like k-
means hierarchical clustering does not require specifying the number of clusters in advance.

The algorithm builds clusters step by step either by progressively merging smaller clusters or
by splitting a large cluster into smaller ones. The process is often visualized using
a dendrogram, which helps to understand data similarity.

Imagine we have four fruits with different weights: an apple (100g), a banana (120g), a cherry
(50g) and a grape (30g). Hierarchical clustering starts by treating each fruit as its own group.

• Start with each fruit as its own cluster.

• Merge the closest items: grape (30g) and cherry (50g) are grouped first.

• Next, apple (100g) and banana (120g) are grouped.

• Finally, these two clusters merge into one.

Finally all the fruits are merged into one large group, showing how hierarchical clustering
progressively combines the most similar data points.

Dendrogram

A dendrogram is like a family tree for clusters. It shows how individual data points or groups
of data merge together. The bottom shows each data point as its own group and as we move
up, similar groups are combined. The lower the merge point, the more similar the groups are.
It helps us see how things are grouped step by step.

26
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

• At the bottom of the dendrogram the points P, Q, R, S and T are all separate.

• As we move up, the closest points are merged into a single group.

• The lines connecting the points show how they are progressively merged based on
similarity.

• The height at which they are connected shows how similar the points are to each
other; the shorter the line the more similar they are

Types of Hierarchical Clustering

Now we understand the basics of hierarchical clustering. There are two main types of
hierarchical clustering.

1. Agglomerative Clustering

2. Divisive clustering

1. Hierarchical Agglomerative Clustering

It is also known as the bottom-up approach or hierarchical agglomerative clustering (HAC).


Bottom-up algorithms treat each data as a singleton cluster at the outset and then
successively agglomerate pairs of clusters until all clusters have been merged into a single
cluster that contains all data.

27
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

Workflow for Hierarchical Agglomerative clustering

1. Start with individual points: Each data point is its own cluster. For example if we have
5 data points we start with 5 clusters each containing just one data point.

2. Calculate distances between clusters: Calculate the distance between every pair of
clusters. Initially since each cluster has one point this is the distance between the two
data points.

3. Merge the closest clusters: Identify the two clusters with the smallest distance and
merge them into a single cluster.

4. Update distance matrix: After merging we now have one less cluster. Recalculate the
distances between the new cluster and the remaining clusters.

5. Repeat steps 3 and 4: Keep merging the closest clusters and updating the distance
matrix until we have only one cluster left.

6. Create a dendrogram: As the process continues we can visualize the merging of


clusters using a tree-like diagram called a dendrogram. It shows the hierarchy of how
clusters are merged.

Implementation

Let's see the implementation of Agglomerative Clustering,

• Start with each data point as its own cluster.

28
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

• Compute distances between all clusters.

• Merge the two closest clusters based on a linkage method.

• Update the distances to reflect the new cluster.

• Repeat merging until the desired number of clusters or one cluster remains.

• The dendrogram visualizes these merges as a tree, showing cluster relationships and
distances.

import numpy as np

import [Link] as plt

from [Link] import AgglomerativeClustering

from [Link] import dendrogram

from [Link] import make_blobs

X, _ = make_blobs(n_samples=30, centers=3, cluster_std=10, random_state=42)

clustering = AgglomerativeClustering(n_clusters=3)

labels = clustering.fit_predict(X)

agg = AgglomerativeClustering(distance_threshold=0, n_clusters=None)

[Link](X)

def plot_dendrogram(model, **kwargs):

counts = [Link](model.children_.shape[0])

n_samples = len(model.labels_)

for i, merge in enumerate(model.children_):

current_count = 0

for child_idx in merge:

29
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

if child_idx < n_samples:

current_count += 1

else:

current_count += counts[child_idx - n_samples]

counts[i] = current_count

linkage_matrix = np.column_stack(

[model.children_, model.distances_, counts]).astype(float)

dendrogram(linkage_matrix, **kwargs)

fig, (ax1, ax2) = [Link](1, 2, figsize=(14, 6))

[Link](X[:, 0], X[:, 1], c=labels, cmap='viridis', s=70)

ax1.set_title("Agglomerative Clustering")

ax1.set_xlabel("Feature 1")

ax1.set_ylabel("Feature 2")

[Link](ax2)

plot_dendrogram(agg, truncate_mode='level', p=5)

[Link]("Hierarchical Clustering Dendrogram")

[Link]("Sample index")

[Link]("Distance")

plt.tight_layout()

[Link]()

Output :

30
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

2. Hierarchical Divisive clustering

Divisive clustering is also known as a top-down approach. Top-down clustering requires a


method for splitting a cluster that contains the whole data and proceeds by splitting clusters
recursively until individual data have been split into singleton clusters.

Workflow for Hierarchical Divisive clustering :

1. Start with all data points in one cluster: Treat the entire dataset as a single large
cluster.

2. Split the cluster: Divide the cluster into two smaller clusters. The division is typically
done by finding the two most dissimilar points in the cluster and using them to
separate the data into two parts.

3. Repeat the process: For each of the new clusters, repeat the splitting process: Choose
the cluster with the most dissimilar points and split it again into two smaller clusters.

4. Stop when each data point is in its own cluster: Continue this process until every data
point is its own cluster or the stopping condition (such as a predefined number of
clusters) is met.

31
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

Implementation

Let's see the implementation of Divisive Clustering,

• Starts with all data points as one big cluster.

• Finds the largest cluster and splits it into two using KMeans.

• Repeats splitting the largest cluster until reaching the desired number of clusters.

• Assigns cluster labels to each data point based on the splits.

• Returns history of clusters at each step and final labels.

• Visualizes data points colored by their final cluster.

import numpy as np

import [Link] as plt

from [Link] import KMeans

from [Link] import make_blobs

from [Link] import dendrogram, linkage

X, _ = make_blobs(n_samples=30, centers=5, cluster_std=10, random_state=42)

32
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

def divisive_clustering(data, max_clusters=3):

while len(clusters) < max_clusters:

cluster_to_split = max(clusters, key=lambda x: len(x))

[Link](cluster_to_split)

kmeans = KMeans(n_clusters=2, random_state=42).fit(cluster_to_split)

cluster1 = cluster_to_split[kmeans.labels_ == 0]

cluster2 = cluster_to_split[kmeans.labels_ == 1]

[Link]([cluster1, cluster2])

return clusters

clusters = divisive_clustering(X, max_clusters=3)

[Link](figsize=(12, 5))

[Link](1, 2, 1)

colors = ['r', 'g', 'b', 'c', 'm', 'y']

for i, cluster in enumerate(clusters):

[Link](cluster[:, 0], cluster[:, 1], s=50,

c=colors[i], label=f'Cluster {i+1}')

[Link]('Divisive Clustering Result')

[Link]()

linked = linkage(X, method='ward')

33
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

[Link](1, 2, 2)

dendrogram(linked, orientation='top',

distance_sort='descending', show_leaf_counts=True)

[Link]('Hierarchical Clustering Dendrogram')

plt.tight_layout()

[Link]()

Output:

Computing Distance Matrix

While merging two clusters we check the distance between two every pair of clusters and
merge the pair with the least distance/most similarity. But the question is how is that distance
determined. There are different ways of defining Inter Cluster distance/similarity. Some of
them are:

• Min Distance: Find the minimum distance between any two points of the cluster.

• Max Distance: Find the maximum distance between any two points of the cluster.

• Group Average: Find the average distance between every two points of the clusters.

• Ward's Method: The similarity of two clusters is based on the increase in squared error
when two clusters are merged.

34
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

The image compares cluster distance methods:

• Min uses the shortest distance between clusters

• Max uses the longest

• Group Average computes the mean of all pairwise distances

• Ward’s method minimizes the increase in within-cluster variance during merging

K means Clustering
K-Means Clustering is an unsupervised machine learning algorithm that helps group data
points into clusters based on their inherent similarity. Unlike supervised learning, where we
train models using labeled data, K-Means is used when we have data that is not labeled and
the goal is to uncover hidden patterns or structures. For example, an online store can use K-
Means to segment customers into groups like "Budget Shoppers," "Frequent Buyers," and
"Big Spenders" based on their purchase history.

Working of K-Means Clustering

Suppose we are given a data set of items with certain features and values for these features
like a vector. The task is to categorize those items into groups. To achieve this we will use the
K-means algorithm. "kk" represents the number of groups or clusters we want to classify our
items into.

The algorithm will categorize the items into "kk" groups or clusters of similarity. To calculate
that similarity we will use the Euclidean distance as a measurement. The algorithm works as
follows:
1. Initialization: We begin by randomly selecting k cluster centroids.
35
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

2. Assignment Step: Each data point is assigned to the nearest centroid, forming clusters.
3. Update Step: After the assignment, we recalculate the centroid of each cluster by
averaging the points within it.
4. Repeat: This process repeats until the centroids no longer change or the maximum
number of iterations is reached.

The goal is to partition the dataset into kk clusters such that data points within each cluster
are more similar to each other than to those in other clusters.

Why Use K-Means Clustering?


K-Means is popular in a wide variety of applications due to its simplicity, efficiency and
effectiveness. Here’s why it is widely used:
1. Data Segmentation: One of the most common uses of K-Means is segmenting data
into distinct groups. For example, businesses use K-Means to group customers based
on behavior, such as purchasing patterns or website interaction.
2. Image Compression: K-Means can be used to reduce the complexity of images by
grouping similar pixels into clusters, effectively compressing the image. This is useful
for image storage and processing.
3. Anomaly Detection: K-Means can be applied to detect anomalies or outliers by
identifying data points that do not belong to any of the clusters.
4. Document Clustering: In natural language processing (NLP), K-Means is used to group
similar documents or articles together. It’s often used in applications like
recommendation systems or news categorization.
5. Organizing Large Datasets: When dealing with large datasets, K-Means can help in
organizing the data into smaller, more manageable chunks based on similarities,
improving the efficiency of data analysis.

Implementation of K-Means Clustering

We will be using blobs datasets and show how clusters are made using Python programming
language.
Step 1: Importing the necessary libraries
We will be importing the following libraries.
• Numpy: for numerical operations (e.g., distance calculation).
• Matplotlib: for plotting data and results.
• Scikit learn: to create a synthetic dataset using make_blobs

import numpy as np
import [Link] as plt
from [Link] import make_blobs

Step 2: Creating Custom Dataset


We will generate a synthetic dataset with make_blobs.
• make_blobs(n_samples=500, n_features=2, centers=3): Generates 500 data points in
a 2D space, grouped into 3 clusters.
• [Link](X[:, 0], X[:, 1]): Plots the dataset in 2D, showing all the points.
• [Link](): Displays the plot

36
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

X,y = make_blobs(n_samples = 500,n_features = 2,centers = 3,random_state = 23)

fig = [Link](0)
[Link](True)
[Link](X[:,0],X[:,1])
[Link]()
Output:

Step 3: Initializing Random Centroids


We will randomly initialize the centroids for K-Means clustering
• [Link](23): Ensures reproducibility by fixing the random seed.
• The for loop initializes k random centroids, with values between -2 and 2, for a 2D
dataset.
k=3

clusters = {}
[Link](23)

for idx in range(k):


center = 2*(2*[Link](([Link][1],))-1)
points = []
cluster = {
'center' : center,
'points' : []
}

37
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

clusters[idx] = cluster

clusters
Output:

Step 4: Plotting Random Initialized Center with Data Points

We will now plot the data points and the initial centroids.

• [Link](): Plots a grid.

• [Link](center[0], center[1], marker='*', c='red'): Plots the cluster center as a red


star (* marker).

[Link](X[:,0],X[:,1])

[Link](True)

for i in clusters:

center = clusters[i]['center']

[Link](center[0],center[1],marker = '*',c = 'red')

[Link]()

Output:

38
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

Step 5: Defining Euclidean Distance

To assign data points to the nearest centroid, we define a distance function:

• [Link](): Computes the square root of a number or array element-wise.

• [Link](): Sums all elements in an array or along a specified axis

def distance(p1,p2):

return [Link]([Link]((p1-p2)**2))

Step 6: Creating Assign and Update Functions

Next, we define functions to assign points to the nearest centroid and update the centroids
based on the average of the points assigned to each cluster.

• [Link](dis): Appends the calculated distance to the list dist.

• curr_cluster = [Link](dist): Finds the index of the closest cluster by selecting the
minimum distance.

• new_center = [Link](axis=0): Calculates the new centroid by taking the mean


of the points in the cluster.

def assign_clusters(X, clusters):

for idx in range([Link][0]):

dist = []

curr_x = X[idx]

for i in range(k):

dis = distance(curr_x,clusters[i]['center'])

[Link](dis)

curr_cluster = [Link](dist)

clusters[curr_cluster]['points'].append(curr_x)

return clusters

def update_clusters(X, clusters):

39
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

for i in range(k):

points = [Link](clusters[i]['points'])

if [Link][0] > 0:

new_center = [Link](axis =0)

clusters[i]['center'] = new_center

clusters[i]['points'] = []

return clusters

Step 7: Predicting the Cluster for the Data Points

We create a function to predict the cluster for each data point based on the final centroids.

• [Link]([Link](dist)): Appends the index of the closest cluster (the one with
the minimum distance) to pred.

def pred_cluster(X, clusters):

pred = []

for i in range([Link][0]):

dist = []

for j in range(k):

[Link](distance(X[i],clusters[j]['center']))

[Link]([Link](dist))

return pred

Step 8: Assigning, Updating and Predicting the Cluster Centers

We assign points to clusters, update the centroids and predict the final cluster labels.

• assign_clusters(X, clusters): Assigns data points to the nearest centroids.

• update_clusters(X, clusters): Recalculates the centroids.

• pred_cluster(X, clusters): Predicts the final clusters for all data points.

clusters = assign_clusters(X,clusters)

clusters = update_clusters(X,clusters)

pred = pred_cluster(X,clusters)

40
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

Step 9: Plotting Data Points with Predicted Cluster Centers

Finally, we plot the data points, colored by their predicted clusters, along with the updated
centroids.

• center = clusters[i]['center']: Retrieves the center (centroid) of the current cluster.

• [Link](center[0], center[1], marker='^', c='red'): Plots the cluster center as a red


triangle (^ marker).

[Link](X[:,0],X[:,1],c = pred)

for i in clusters:

center = clusters[i]['center']

[Link](center[0],center[1],marker = '^',c = 'red')

[Link]()

Output:

Challenges with K-Means Clustering

K-Means algorithm has the following limitations:

• Choosing the Right Number of Clusters (kk): One of the biggest challenges is deciding
how many clusters to use.

41
OCS351 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FUNDAMENTALS

• Sensitive to Initial Centroids: The final clusters can vary depending on the initial
random placement of centroids.

• Non-Spherical Clusters: K-Means assumes that the clusters are spherical and equally
sized. This can be a problem when the actual clusters in the data are of different shapes
or densities.

• Outliers: K-Means is sensitive to outliers, which can distort the centroid and,
ultimately, the clusters.

42

You might also like