MODULE 5
Clustering, Artificial Neural Network And Instance Based Learning
Clustering: k-means, Hierarchical Clustering Artificial Neural Networks: Neural
Network representation, Perceptron, Multi-Layer Networks and Back
propagation algorithm. Instance-Based Learning: k-Nearest Neighbor Learning,
Ensemble Learning-Random Forest classifier
K-Means Clustering
Problem:
Use K-means clustering to divide these points into 2 clusters:
Point Coordinates
a₁ (1,1) Step 1: Initialization
Selecting Randomly 2 clusters::
a₂ (2,1)
• P0=(2,1)
a₃ (2,3)
• P1=(2,3)
a₄ (3,2)
a₅ (4,3)
a₆ (5,5)
Step 2: Calculate Euclidean Distances and Assign Clusters
Step 3: Recalculate Cluster Centers (means)
Step 4: Recalculate Cluster Centers (means)
Step 5: Recalculate Cluster Centers Again
Step 6: Final Distance Calculation and Assignment
Cluster elements are same
as the previous iteration
Cluster 1 = { (1,1), (2,1), (2,3), (3,2) }
Cluster 2 = { (4,3), (5,5) }
Biological Neural Network.
Artificial Neural Network.
Dendrites from Biological Neural
Network represent inputs in Artificial
Neural Networks, cell nucleus represents
Nodes, synapse represents Weights, and
Axon represents Output.
BNN ANN
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
Artificial Neurons
Artificial Neural Networks
This computation is represented in the form of a transfer function.
Neural Network Types
Types of Neural Networks
Perceptron
A Perceptron is a single-layer neural network used for binary classification (like yes/no or
spam/not spam). It classifies input based on weighted sum and a threshold.
Multi-Layer Perceptron (MLP)
An MLP has multiple layers – input, hidden, and output. It can model non-linear relationships and solve
problems like XOR, image recognition, etc.
Backpropagation Algorithm
Backpropagation is the training algorithm for MLP. It adjusts weights using gradient
descent to minimize the prediction error.
Backpropagation in MLP
Algorithm:
Initialize weights randomly
Repeat:
For each training example:
- Forward pass: compute prediction
- Compute error: prediction - actual
- Backward pass: compute gradients
- Update weights
Until all examples classified or stopping condition met
Return trained network
Perceptrons ANN PROBLEMS
Ensemble Learning is a technique in machine learning where
multiple models (learners) are combined to solve a problem
and improve performance.
Why use Ensemble Learning?
• More accurate than individual models
• Reduces overfitting and variance
• Makes robust predictions
Random Forest
Random Forest is an Ensemble Learning technique that builds multiple
Decision Trees and combines their outputs to make a final prediction.
It is mainly used for:
• Classification (e.g., predicting yes/no)
• Regression (e.g., predicting a number)
Simple Definition
A Random Forest creates many decision trees using random subsets of data and features,
and combines their outputs to make more accurate and stable predictions.
Random Forest Classifier
Random Forest Classifier is a supervised machine learning algorithm that
combines multiple Decision Trees using an ensemble method (specifically,
bagging) to improve prediction accuracy and control overfitting.
Random Forest Classifier Algorithm
1.Input:
1. Training dataset with features and labels (X, Y)
2. Number of trees (N) to build
3. Number of features to consider for each split (optional)
2.For each of the N trees:
a. Bootstrap Sampling:
1. Randomly select data points from the training set with replacement to form a new training set.
b. Train a Decision Tree:
2. At each node, randomly select a subset of features and choose the best feature for splitting (not all features).
c. Grow the tree fully (no pruning).
3.Prediction:
1. For a new data point, get predictions from all decision trees.
2. Use majority voting to assign the final class label.