0 ratings 0% found this document useful (0 votes) 198 views 14 pages Self-Organizing Map Implementation - CodeProject
SOM / Kohonen Implementation by CodeProject
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Self-Organizing Map Implementation - CodeProject For Later
sri92015 ‘Seforgerizing Map implementation - CodeProject
1,467,502 members (64185 onfne) Member 12691953 104 Sign out ®)
00) CODE
PROJECT
Search for articles, questions, ips &
Q&A forums lounge
Self-organizing Map Implementation
Peter Leow, 25 Jul 2014 CPOL Rate:
KK HK KH 5.00 6 votes)
Get real with an implementation of a mini SOM project.
Download source - 122.6 KB
Click on the following image to view a demo video on YouTube
SSGGGSSSS5
SSGSS5GSS55
SSoSogngGSE55
ASEGDnness
SSSGcmn2as
SSGUUsnaTE
SSSGGSNcES
SEERSSS5E5
ntroduction
In my previous article Self-organizing Map Demystified, you have learned the concept, architecture, and
algorithm of self-organizing map (SOM). From here on, you will embark on a journey to designing and
implementing a mini SOM for clustering of handwitten digits. For training the SOM, I have obtained the
training dataset from Machine Learning Repository of the Center for Machine Learning and Intelligent
Systems, MATLAB will be used for programming the SOM. Some pf the design considerations are generic
and can be applied in other types of machine learning projects. Let's get started.
ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion aasri92015 ‘Seforgerizing Map implementation - CodeProject
Preparing the Ingredients
The original training dataset file is called “optdigits-orig.tra.2". It consists of a total of 1934 handwritten
digits from 0 to 9 collected from a total of 30 people. Each digit sample has been normalized to 32x32
binary image that has 0 or 1 for each pixel. The distribution of the training dataset is considered well
balanced as shown in Table 1. It is important to have balanced training dataset where the number of
samples in every class is more or less equal so as to prevent biases in training, e.g. classes with
overwhelmingly large number tend to get chosen more often than those in minority thus affecting the
accuracy and reliability of the training result.
Table 1: Distribution of Classes
Class Number of Samples
0 189
1 198
2 195
3 199
4 186
5 187
6 195
7 201
8 180
9 204
Figure 1 shows some of the binary images contained in the training dataset:
ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaionsri92015 ‘Selforgerizing Map implementation - CodeProject
ee SRE se NSE 5 se eee SRE se ce Noe CSREES wae ow
SS
Figure 1: Some of the Binary Images in the Training Dataset
In the original data file, each block of 32x32 binary image is followed by a class label that indicates the
digit that this sample belongs to. For example, the "8" bitmap block (in Figure 1) is followed by its class
label of 8 and so on. To facilitate processing by MATLAB, I have further pre-processed the data as such
1. Separate the class labels from their bitmaps in the original data file into two different files.
2. Make the class labels, a total of 1934 digits, into a 141934 vector called "train_labeland save it as
a MATLAB data file called "raining tabel.mat’.
3. Make the bitmap data, from its original form of 61888(rows)x32(columns) bits after removing their
class labels, into a 1024x1934 matrix called "train data’, and save it as “training data.mat’. In
the train_data matrix, each column represents a training sample and each sample has 1024 bits, ie.
each training sample has been transformed from its original 32x32 bitmap into a 1024x1 vector.
These two files - "training labelmat” and “training data.mat” are available for download. Unzip and
placed them in a folder, say "som_experiment".
If you are curious to see how these digits look like, fire up MATLAB, set the Current Folder to
“som_experiment”, enter the following code inside the Command Window, and run it.
Hide Copy Code
% View a Digit from the train_data
clear
cle
load ‘training data’ ;
img = reshape(train_data(:,10), 32, 32)'s
imshow(double(ime))
This code will:
* load the “training labelmat’ file that contains the 1024x1934 train_data matrix into the memory;
* train_data(:, n) will extract the nt” column vector (1024x1) vector from train_data matrix, where n
coincides with the position of the digits in the dataset. In this code, the n is 10, so it will extract the
tenth digit, you can change it to any number up to the total number of digits in the dataset, ie.
1934;
ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion anasroz01s Satt-rgeizing Map implemertaton- CodeProject
* reshape(train_data(:,10), 32, 32)" will first convert the column vector into a row vector and then
reshape the row vector into a 32x32 matrix, effectively reverting it back to its original shape like
those shown in Figure 1;
* imshow(double(img)) will display the digit as binary image where pixels with the value zero are
shown as black and 1 as white, such as
Biot eG
Fi Ec Vii Ins To Des Win He »
Oe asls|2z- 7
i
Figure 2: A Binary Image on Screen
Setting the Parameters
Let's set the parameters for subsequent development.
* The size of the SOM map is 10x10.
© The total iteration is set at 1000. In this experiment, we will only attempt the first sub-phase of
the adaptation phase.
* The neighborhood function:
Is en(n) = exp — Fae
where
5 ,c(x)is the Euclidean distance from neuron to the winner neuron ¢(X).
@(n)is the effective width of the topological neighborhood at n'” iteration,
v
» n=0,1,2.. , n=
log a0
a(n) = 09 exp(—
where
ois the initial effective width which is set to be 5, ie. the radius of the
10x10 map.
Tis the time constant
* The weight updating equation:
ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion anasri92015 ‘Seforgerizing Map implementation - CodeProject
w(n + 1) = wj(n) +7(r)hj ¢2)(m)(a — w,(n))
where
7)(1)is the time-varying leaning rate at n** iteration and is computed as such
n(n) =moexp(——) ,
where
Nois the initial learning rate which is set to 0.1.
Tyis a time constant which is set to \V
We have designed the parameters for the mini SOM. Its time to make things happend.
Ready to Cook (Code)
The MATLAB script for implementing the min SOM is saved in this file called “training_som.m" and is
available for download. I have created this script as “proof of concept" for the sole purpose of reinforcing
the learning of SOM algorithm. You are free to implement it in any programming languages. After all,
programming languages are just media of implementation of problem solving techniques using
computer.
Unzip and placed it in the "som_experiment" folder. Open the "training_som.m" script in MATLAB's Editor,
and you will see the recipe as shown below. The code has been painstakingly commented and therefore
sufficiently self-explanatory. Nevertheless, I will still round up those parts of the code that correspond to
the various phases of SOM algorithm so that you can understand and related them better.
Hide Shrink & Copy
Self-organizing Map
Clustering of Handwritten Digits
‘training_som.m
Peter Leow
10 July 2014
% clean up the previous act
close all;
clear; % delete all memory
cic; % clear windows screen
clf; —% clear figure screen
she; % put figure screen on top of all screens
COIR LARISA COO ISLC OOO LILO IIIT,
% Ground breaking!
OOOO ICIS ICO CLIO IIITE,
% Load training data.mat that consists of train_data
% of 1024x1934 matrix
% Consists of a total of 1934 input samples and
% each sample posseses 1024 attributes (dimenstions)
load training data;
% datarow
% datacol
number of attributes (dimensions) of each sample, i.e. 1024
total number of training samples, i.e. 1934
ipa codepoject com/Articless7SG537/Set-o garizing-Map-implemertation snasri92015 ‘Seforgerizing Map implementation - CodeProject
[dataRow, dataol] = size(train_data);
[OOOO OOOO OOO SANTOS TUCO ALTO OSSSTA,
% SOM Architecture
OOOO RAC OOS AXON OOOO SALOON LOIS IITA,
% Determine the number of rows and colurns of som map
somRow = 10;
somCol = 10;
% Initialize 10x10x1024 som matrix
% The is SOM Map of 10x10 neurons
% Each neuron carries a weight vector of 1024 elements
som = zeros(sonRow, sonCol, dataRow);
OOO OOIOOI ROCCO OOOO SITTIN COO ALTO OOISATL,
% Paraneters Settings
CROISSANCE LLL SSS,
% Max number of iteration
N = 1000;
% Initial effective width
signalnitial = 5;
% Time constant for signa
t1 = N/ log(signatnitial);
% Initialize 1x10 matrix to store Euclidean distances
% of each neurons on the map
euclidean = zeros(somRow, sonCol);
% Initialize 1x10 matrix to store neighbourhood functions
% of each neurons on the map
neighbourhoodF = zeros(somRow, somCol);
% initial learning rate
etainitial = 0.1;
% time constant for eta
t2=N5
ORRIN LEO ILLICIT ILO SINT,
% Initialization
OOOO COCCI ALTO OO SITTIN OS IITA,
% Generate random weight vectors [dataRow x 1]
% and assign it to the third dimension of som
for r=1:somRow
for ‘omCol
som(r, cy
end
rand(dataRow, 1);
end
% Initialize iteration count to one
nea
OOOO IRIE IOI ITO IOI L IIIS ITE,
% Start of Iterative Training
SOCORRO ISCO TOO OCIA STOO ITAA CO IIIS SSSL,
% Start of one iterative loop
While n <= N
signa = sigmaInitial * exp(-n/t1);
variance = signa*2;
eta = etalnitial * exp(-n/t2);
% Prevent eta from falling below 0.01
if (eta < 0.01)
ipa. codepoject com/Aricless7SG537/Set-o gaizing-Map-implemertsion en