Unsupervised Classification
Unsupervised Classification
• Commonly referred to as Clustering
• Unsupervised classification normally requires only a minimal
amount of initial input from the analyst
• This is because clustering does not normally require training
data
Unsupervised Classification
• Process that searches for natural groupings of the spectral properties of
pixels
• The clustering process results in a classification map consisting of m spectral
classes (clusters)
• The analyst then attempts using prior knowledge to assign the spectral
classes to thematic information classes of interest (e.g., forest, agriculture)
• Some spectral clusters may represent mixed classes of Earth surface
materials
• The analyst must understand the spectral characteristics of the terrain well
enough to be able to label spectral clusters as specific information classes
Unsupervised Classification
• Two step process
• Iterative process of Clusters forming
• Assignment of clusters to thematic classes
Unsupervised Classification -Methods
• Two widely used unsupervised classification methods
• K-Mean
• Iterative Self-Organizing Data Analysis Technique (ISODATA)
• ‘K’ in the name of the algorithm represents the number of
groups/clusters we want to classify our items into
K Means
• K-Means Clustering is an Unsupervised Machine Learning
algorithm, which groups the unlabeled dataset into different
clusters
• The main idea of the clustering is to specify number of clusters
(as a constraint) to be generated
• Define a set of arbitrary pixels as cluster mean, i.e. one for each
cluster (as the no of clusters are known a priori)
• Clustering starts with the fixed a priori specified number of
clusters to be created
K Means
• It calculates the spectral distance between the initial cluster
centroids and each pixel of the image, and then,
• Each pixel in the image is assigned to the cluster with the closest
means
• It iteratively calculates the new cluster means and the pixels
assignment is updated to the relevant clusters
• This iterative assignment process continues until there is no
substantial change between the successive iterations
ISODATA Clustering
• ISODATA algorithm is a modification of the k-means
clustering algorithm, which includes
– Merging and Splitting of the clusters
• ISODATA is iterative because it makes many passes through
the remote sensing dataset until specified results are
obtained
ISODATA Clustering
• Uses the minimum spectral distance formula to form clusters
• Begins with either arbitrary cluster means or the means of an existing
signature set
• Each time the clustering repeats, the means of these clusters are shifted
• The new cluster means are used for the next iteration
• The process stops when
– Maximum number of iterations has been performed
– Maximum percentage of unchanged pixels has been reached
between two iterations
ISODATA Clustering
• ISODATA algorithms normally require the analyst to specify the
following criteria:
• Maximum number of clusters (classes) to be identified by the
algorithm
• Minimum Distance: Cluster pairs that have a distance, less
than this value will be merged into one cluster
• No of iterations-maximum number of times ISODATA is to
classify pixels and recalculate cluster mean vectors. The ISODATA
algorithm terminates when this number is reached
ISODATA Clustering
• Minimum members in a cluster (%):
– If a cluster contains less than the minimum percentage of pixels, it is
deleted, and the pixels are assigned to an alternative cluster (percent
of pixels in a cluster relative to the total number of pixels in the image)
• Maximum standard deviation (smax):
– When the standard deviation for a cluster exceeds the specified
maximum standard deviation in any band, the cluster is split into two
clusters
Post Clustering
• In this phase, analyst compare spectral classes with some
reference data for identification
• Spectral reflectance curves can be used to identify the
spectral classes
• Defining the level of classification ( Level I,II etc.)
• Merging different classes to reach final outcome
• Accuracy assessment through field truthing or reference data
Unsupervised Classification
Initialize from Statistics generate arbitrary
clusters from the file statistics for the image file
Use Signature Means use only the selected
signatures to generate the clusters
Convergence Threshold: is the maximum
percentage of pixels whose cluster assignments
can go unchanged between iterations
Skip Factors: X,Y:
1 processes all pixels,
2 processes every other pixel,
Assignment of color to the clusters
Image
ISODATA Cluster Map
(15 clusters)
Key
Classes
Reference/Reading
• Chapter 12,Section 12.3, Campbell
• Chapter 8, Section 8.3.2, Mather