Pattern Recognition
1
Lecturer
Prof. Dr. Khaled M. Mahr
email:
[email protected] Contents
⬥ Introduction to Patterns and Pattern
Recognition
⬥ Clustering vs. Classification
⬥ Template Matching
⬥ Features
⬥ Classification
Object recognition and
interpretation
⬥ Having segmented regions of interest out of an image the next task
is to identify them. Identification is usually divided into two tasks:
evaluating suitable quantitative descriptors and then matching the
descriptors to known objects. In this lecture we will look at
matching the descriptors to known objects.
⬥ Sub topics
■ Pattern matching
■ Pattern classification
■ Decision theoretic methods
■ Correlation matching
■ Structural methods
Object recognition and interpretation
■ The principal goal of image analysis by computer is to endow a
machine with the capability to approximate a similar capability in
humans.
■ Thus, an automated image analysis system (e.g. on a robot) should
be capable of exhibiting a degree of intelligence.
■ What do we mean by intelligence?
- The capability to learn from examples and to generalize this
knowledge so that it will apply in new and different
circumstances.
- The ability to make inferences from incomplete information.
(Gonzalez and Woods)
Pattern
A pattern is the opposite of a chaos, it is an
entity that can be given a name
6
Recognition
⬥ Identification of a pattern as a member of a
category
■ Classification (Supervised: known categories)
■ Clustering (Unsupervised: learning categories)
7
Classification vs. Clustering
Category “A”
Category “B”
Clustering
Classification
8
Pattern Recognition
Given an input pattern, make a decision about
the “category” or “class” of the pattern
9
Pattern Class
⬥ A collection of similar (not necessarily identical)
objects
⬥ Intra-class variability
⬥ Inter-class similarity
10
Pattern Class
Intra-class variability
The letter “T” in different typefaces
Same face under different expression, pose, illumination 11
Pattern Class
Inter-class similarity
Characters that look similar
Identical twins
12
Object Recognition
■ We can define a pattern as an arrangement of descriptors (‘features’).
■ A pattern class is a set of patterns ω1, ω 2, …… ωn which share a
common set of characteristics.
■ Object recognition can be described as the task of automatically
assigning a pattern to a pattern class which is labelled with the object
name.
Three common structures are used to define a pattern :
• Vectors (for quantitative descriptions)
• Strings (for structural descriptions)
• Trees (for structural descriptions)
Here we will concentrate on the descriptor vector.
Object RecognitionApproaches
⬥ Statistical PR:
■ Pattern classes represented by statistical measures.
⬥ Structural (or Syntactic) PR:
■ Pattern classes represented by means of formal structures as
grammars, automata, strings, etc.
14
Template Matching
Image is converted into 12x12 bitmap.
15
Template Matching
Bitmap is represented by 12x12-matrix or by 144-vector
with 0 and 1 coordinates.
0 0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 1 0 1 1 0 0 0
0 0 0 0 1 1 0 1 1 0 0 0
0 0 0 0 1 0 0 0 1 0 0 0
0 0 0 1 1 0 0 0 1 1 0 0
0 0 0 1 1 0 0 0 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 0
0 0 1 1 0 0 0 0 0 1 1 0
0 1 1 0 0 0 0 0 0 1 1 0
0 1 1 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 1 1
16
Template Matching
Training samples – templates with corresponding class:
Template of the image to be recognized:
?
Algorithm:
17
Template Matching
Number of templates to store:
If fewer templates are stored, some images might not
be recognized.
Improvements?
18
Features
⬥ Features are the individual measurable properties of the
signal being observed.
⬥ The set of features used for pattern recognition is called
feature vector.
⬥ The number of used features is the dimensionality of
the feature vector.
⬥ n-dimensional feature vectors can be represented as
points in n-dimensional feature space
19
The Feature vector
Assuming we are using a set of descriptors x1, x2, …. xn to describe
a particular object then we can define the descriptor vector as
The use of discriminant analysis was pioneered by Fisher in 1936
and his test data on separating out three species of iris flowers is still
a classic illustration.
In this example the vector has two elements: petal width and length.
Clearly only the pattern class for Iris setosa is clearly discriminated.
Petal width versus length for 3
types of iris
Feature Vector
height
weight
Class 1 Class 1
Class 2 Class 2
22
Features
⬥ Feature extraction aims to create discriminative features good
for classification
⬥ Good Features
• Objects from the same class have similar feature values.
• Objects from different classes have different values.
“Good” features “Bad” features
23
Features
⬥ Use fewer features if possible
⬥ Use features that differentiate classes well
⬥ Character recognition example
■ Good features: aspect ratio, presence of loops
■ Bad features: number of black pixels, number of connected
components
24
Classifier
Identify the class of a given pattern
25
Classifier
Distance between Feature Vectors
⬥ Instead of finding template exactly matching input template look
at how close feature vectors are
⬥ Nearest neighbor classification algorithm:
1. Find template closest to the input Class 1
pattern.
Class 2
2. Classify pattern to the same class
as closest template.
26
Classifier
K-Nearest Neighbor Classifier
⬥ Use k nearest neighbors instead of 1 to classify pattern.
Class 1
Class 2
27
Classifier
A classifier partitions feature space X into class-labeled
regions such that:
and
The classification consists of determining to which region a feature vector x belongs to.
Borders between regions are called decision boundaries 28
A (Simplified )PR System
Two Modes:
Classification
Mode
test Feature
Preprocessing Classification
pattern Measurement
training Feature
pattern Preprocessing Extraction/ Learning
Selection
Training
Mode 29
Classification - Example
⬥ Sorting incoming Fish on a
conveyor according to
species using optical
sensing
⬥ Salmon or Sea Bass?
30
Classification - Example
⬥ Set up a camera and take some sample images to
extract features
■ Length
■ Lightness
■ Width
■ Number and shape of fins
■ Position of the mouth, etc…
31
Classification - Example
32
Classification - Example
The length is a poor feature alone!
Select the lightness as a possible feature.
33
Classification - Example
34
Classification - Example
Customers do not want sea bass in
their cans of salmon
35
Classification - Example
⬥ Decision boundary threshold and cost
relationship
⬥ Move our decision boundary toward
smaller values of lightness in order to
minimize the cost (reduce the number
of sea bass that are classified salmon!)
36
Classification - Example
FISH x = [x1, x2]
Lightness Width
37
Classification - Example
38
Classification - Example
⬥ We might add other features that are not correlated with the
ones we already have. A precaution should be taken not to
reduce the performance by adding such “noisy features”
⬥ Ideally, the best decision boundary should be the one which
provides an optimal performance
39
Classification - Example
Generalization?
40
Decision-Theoretic methods
■ The main problem is: how do we use such pattern arrangements in the
recognition stage of our image analysis system?
■ Decision-theoretic approaches to recognition are based on the use of
decision (or discriminant) functions.
■ If x = (x1, x2, …, xn)T is an n-dimensional pattern vector, for M
pattern classes ω1, ω2, …, ωM, the basic problem in decision-theoretic
pattern recognition is to find M decision functions d1(x),d2(x), … dM(x)
with the property that if pattern x belongs to class ωI then:
di(x) > dj(x) j = 1, 2, …, M; i ≠ j
■ The decision boundary separating class ωi from ωj is given by the
values for which di(x) = dj(x) i.e. di(x) - dj(x) = 0
■ So how do we find the required decision function?
Pattern matching
• To assign a specific vector to a particular class we need to define a
distance classifier.
• One approach (but not the only one) is to define the prototype of a
class by the mean vector representing all the instances of the class:
Where W is the number of classes
For an unknown vector x we can assign it to the class of its closest
prototype. Often we interpret ‘closest’ as the minimum Euclidean
distance:
Evaluating the Euclidean
decision boundary
For any vector a then
Since Dj(x) > 0 we can ignore the square root factor, then
Note that the first term is independent of ‘j’ and so choosing the smallest
value of ‘D’ is equivalent to choosing the largest value of the second
term which is normally taken as the decision variable.
The decision boundary
If the above function records a maximum for a particular ‘j’ then we can
assign the region with vector x to class ωj. If we have only two classes
‘j’ and ‘k’ we can describe the boundary for a minimum distance
between the classes as :
For two element vectors the decision boundary is a line, for three
element vectors the boundary is a plane. For more than three
elements it is a hyperplane.
Example of calculating a decision
boundary
In the original data of Fisher if the two classes Iris versicolor (ω1)
and Iris setosa (ω2) have class mean vectors of
Show that the equation of the decision boundary line is :
2.8x1 + 1.0x2 – 8.9 = 0
Solution
Similarly:
Hence:
Graphical representation
y
2.8x1+1.0x2-8.9=0
1.3
0.3
1.5 4.3 x
Minimum distance classifier works well when the distance
between means is large compared to the spread of each class
with respect to its mean.
Correlation matching
■ In some circumstances if we have available a copy of the object(s)
we are attempting to locate within an image we can bypass the use
of descriptors and attempt a direct correlation matching.
■ For example consider the task of finding all the occurrences of the
letter ‘a’ in the following image with the available standard mask.
image mask
An obvious approach would
be to pass the mask over the
image looking for points at
which the fit is best.
Correlation of f(x,y) and w(x,y) at
point
N (s,t)
x
Image origin
y
w(x-s,y-t)
M J
(s,t)
f(x,y)
Notes about previous figure
■ The origin of subimage w(x,y) is at its centre.
■ In its simplest form, the correlation between f(x,y) and w(x,y) is
given by:
■ As s and t are varied, w(x, y) moves around the image area giving one
value for c(s, t) for every position (s, t).
■ The maximum value of c(s, t) indicates the position where w(x, y) best
matches f(x, y).
■ Disadvantage: sensitive to changes in amplitude of f(x, y) and w(x, y).
Definition of correlation
coefficient
An approach frequently used to overcome this disadvantage is to
perform matching via the correlation coefficient given by :
Note when using this function both matrices must be the same
size. This can be achieved by padding out the w array with an
appropriate number of zeros.