Digital Image Processing
Thematic Information Extraction III:
Unsupervised Classification
Arnis Asmat
Environmental information System
EVT 552
Outline
• Introduction
• The “Chain Method”
• ISODATA
• Unsupervised “Cluster Busting”
Introduction
• Unsupervised classification requires minimal
input from the analyst.
• Numerical operators search for natural groupings
of the spectral properties of pixels in
multispectral feature space.
• After classification, the analyst must assign
spectral classes to information classes of interest.
Introduction
• This is not always an easy task.
• Some clusters are meaningless because they
represent mixed classes of Earth surface
materials.
• An analyst must understand the spatial and
spectral characteristics of the terrain in order to
accurately label spectral clusters.
The “Chain Method”
• This algorithm works in two passes (examines
the data set twice).
• The first pass reads through the data and builds
spectral clusters (calculates mean vectors).
• The second pass uses a minimum distance
decision rule to assign pixels to a cluster.
The “Chain Method”
• During the first pass, the analyst must supply four pieces of
information:
– R, radius in multispectral space used to determine when a new
cluster should be formed (e.g. 15);
– C, a spectral space distance used when merging clusters (e.g. 30);
– N, the number of pixels to be evaluated between each major
merging (e.g. 2000 pixels).
– Cmax, the maximum number of clusters to be identified (e.g. 20).
Consider an image with 2
bands only.
Algorithm looks at one pixel
at a time beginning with the
first line of data (this is
where the term “chain”
comes from).
Pixel values are plotted in
multispectral space.
Adapted from Jensen (1996)
Pixel 1 (10,10) becomes
Cluster 1.
R is set to 15.
Pixel 2 is plotted in
multispectral space and
compared to Cluster 1.
The distance (D) between
Pixel 2 and Cluster 1 is
14.14.
Thus, Pixel 2 does not
satisfy the criteria of being Adapted from Jensen (1996)
its own cluster, as D < 15.
The mean data vectors for
Pixel 1 and Pixel 2 are
averaged, yielding a new
center for Cluster 1.
Next, Pixel 3 is evaluated.
D between Pixel 3 and
Cluster 1 is > 15, so Pixel 3
becomes the center of a new
cluster (Cluster 2).
Adapted from Jensen (1996)
The next pixel will be
compared to both clusters 1
and 2.
The cluster accumulation continues until the number of
pixels evaluated is greater than N.
At that point, the program stops evaluating individual
pixels and looks at the nature of the clusters formed so far.
A distance between each cluster and every other cluster is
calculated.
Any two clusters separated by a distance less than C are
merged into a single cluster.
This process proceeds until there are no clusters with
separation less than C.
Then, the next pixel is considered, and the process iterates
until the entire data set is examined.
The “Chain Method”
• During the second pass, final cluster means are used
to classify all image pixels into one of the Cmax
clusters.
• The process follows the minimum distance approach
(minimizes the distance between cluster mean and
pixel value).
• Then comes the difficult task … assigning class
names (informational classes) to spectral clusters.
ISODATA
• ISODATA stands for Iterative Self-
Organizing Data Analysis Technique.
• ISODATA is iterative because it makes
several passes through the data set.
• ISODATA is self-organizing because it
requires little human input.
ISODATA
• ISODATA does not allocate its initial mean
vectors based upon the analysis of pixels in
the first line of data.
• Rather, an initial arbitrary assignment of
Cmax cluster means takes place in feature
space (based on band means and standard
deviations).
Here, five mean
vectors are distributed
along the vector
beginning at 3 - 3, 4
- 4 and ending at 3 +
3, 4 + 4.
This ensures that the
first few lines of data
do not bias the
creation of cluster
means.
Note that the
parallelpiped does not
capture all pixel Adapted from Jensen (1996)
values in this example.
ISODATA
• First iteration:
– When Cmax clusters are in place, each pixel in
the image is compared to each cluster mean and
assigned to a cluster based upon minimum
distance.
– Those pixels with values outside a distance
threshold are excluded from clustering.
Adapted from Jensen (1996)
ISODATA
• Second through nth iterations:
– After the first iteration, a new mean for each cluster is
calculated based upon actual spectral locations of pixels
included in each cluster.
– The process is repeated with each candidate pixel again
compared and assigned to the nearest cluster mean.
– Some pixels switch class alignment.
Adapted from Jensen (1996)
ISODATA
• The process continues until:
– 1) There is little change in class assignment
between iterations (a threshold is reached); or
– 2) A maximum number of iterations is
reached.
Adapted from Jensen (1996)
Unsupervised “Cluster Busting”
• It is not uncommon to be able to label only
about 60-70% of spectral clusters with
confidence.
• Confused classes result from terrain
distortion, soil moisture and soil type
variation, and spectral mixing within pixels.
Unsupervised “Cluster Busting”
• When this occurs, it is possible to perform
cluster busting.
• First, mixed clusters are recoded to a value
of 1 to create a binary mask.
• The mask is used to extract confused pixels
from the original data set.
Unsupervised “Cluster Busting”
• The output file is a new multi-band image
file consisting of only those pixels that
could not be labeled with confidence.
• A new unsupervised classification is run on
only those pixels, yielding an additional
Cmax number of clusters.