0% found this document useful (0 votes)
51 views68 pages

02-AI and Data Analytics For Computer Vision

Uploaded by

caroliotsui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views68 pages

02-AI and Data Analytics For Computer Vision

Uploaded by

caroliotsui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

THE HONG KONG POLYTECHNIC UNIVERSITY

DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING

EIE1005 Fundamental AI and Data Analytics


Lecture 2: AI and Data Analytics for Computer Vision

Lecturer: Dr. Y.L.Chan


Room: DE606
Tel: 27666213
Email: [email protected]

1
Schedule
Week 4 (10/2) Lecture AI and Data Analytics for
Computer Vision
Weeks 5 Workshop
Lab001: 19/2, Wed., 12:30-14:20
Lab002: 17/2, Mon., 14:30-16:20
Lab003: 21/2, Fri., 12:30-14:30
Lab004: 17/2, Mon., 12:30-14:20
Weeks 6 (Optional)
Lab001: 26/2, Wed., 12:30-14:20
Lab002: 24/2, Mon., 14:30-16:20
Lab003: 28/2, Fri., 12:30-14:30
Lab004: 24/2, Mon., 12:30-14:20 2
Submission
• Quiz at week 6
• Group project
• This group project and demonstration provides students the opportunity to get familiar with the training
process of using neural networks to a computer vision problem. The open-ended task is to use Teachable
Machine (https://teachablemachine.withgoogle.com/) to train an AI model to recognize objects that may
be applied to their disciplines. It aims to encourage self-learning, to train logical thinking, and to
practice reporting/demonstration skill.
• Form group made up of four to five students who will work together to select a topic and develop an AI
model to recognize objects.
• Demonstration: create a demonstration video in 3-5 minutes.
• Report writing
• Deadlines and important dates:
• Submission of the group composition via Blackboard by 7 Mar., 2025
• Submission of the group report and video demonstration via Blackboard by 11 Apr., 2025

3
• Sometimes children mix up cats and dogs especially if they were not exposed to different breeds or colors or
sizes of animals before.

4
As soon as they have seen
enough cats and dogs and other
furry animals they have learned
the difference.
5
6
Pattern recognition (PR): we rely on
patterns or features to figure out Receptors
what type of objects.
Big pointy ears

Visual Cortex
a long tail
fur

Eye

7
How
COMPUTER VISION
Application works

8
Computer Vision
• “Computer Vision” is an area of Machine Learning that deals with
image recognition and classification.
• Computer Vision models can be developed to accomplish tasks
• like facial recognition, identifying which breed a dog belongs to and even
identifying a tumor from CT scans, …..

Possibilities are endless.

9
Face Detection and Recognition

10
Face Detection
for Privacy
Protection
Human Action/Activity Recognition

Step Run Hop Jump

Skip Swivel turn Crawl 12


Optical Character
Recognition (OCR)
• Printing
• Handwritten

License plate Recognition (LPR) is the


capacity to capture photographic video
or images from license plates and
transform the optical data into digital
information in real-time.

widely used technology for vehicle


management operations such as
Ticketless Parking (off-street and on-
street), Tolling, stolen vehicles detection,
smart billing, ….
Identification
and
Authentication
Components of an Image Recognition and
Classification/Pattern Recognition (PR) system

Pre-processing/ Feature Classification


Sensor
Real world enhancement extraction algorithm Class
(Observed world (Measurement) Assignment
pattern data)
/Label

• PR system needs some input from the real world that it


perceives with sensors.
• Such a system can work with any type of data: images,
videos, numbers, or texts.
15
How do computers “see” images?

• Computers don’t see images the


same way humans do
• They can only understand
numbers.
• So the first step in any computer
vision problem is to convert the
information that an image
contains to a machine-readable
form.
How do computers “see” images?
Camera can capture and convert photos into digital form.
Understanding what's in
the photo is much more
difficult.

Closely mimic how the human


eye can capture light and color.

easy task
17
Digital Image Basics
• All images are displayed in the form of a two dimensional matrix of
individual picture elements called pixels.
Horizontal resolution
pixel value: 0 to 255

Vertical resolution Resolution =


Horizontal resolution×Vertical resolution

18
• Grayscale Image: Good quality black-and-white pictures can be obtained by using
8 bits per picture element (pixel)
• 256 different levels of gray per element.

• Color Image: A whole spectrum of colors can be produced by using different


proportions of the 3 primary colors - red, green and blue.

19
8 bits

R G B

24 bits

20
• Derive the size of the image with RGB digitization
format. Assume that each has the resolution of
1920×1080 and the pixel of each component is
represented by 8 bits

1920×1080×3×8 =
Exercise
49766400 bits

21
Pre-processing/ Feature Classification
Sensor
Real world enhancement extraction algorithm Class
Assignment

• Having received some information as the input,


the algorithm performs pre-processing and
enhancement.
• segmenting something interesting from
the background. e.g: given a group photo
and a familiar face attracts your attention.

22
Image Enhancement

• Sharpening image features such as


edges, boundaries, or contrast to
make an image easier for
analysis/feature extraction.
• The enhancement does not increase
the inherent information content of
the data, but it increases the
dynamic range of the chosen
features so that they can be
detected easily.

23
Pre-processing/ Feature Classification
Sensor
Real world enhancement extraction algorithm Class
Assignment
/Label

A feature extractor is a program that inputs the


data (image) and extracts features that can be
used in classification/clustering.

24
Pre-processing/ Feature Classification
Sensor
Real world enhancement extraction algorithm Class
Assignment
/ Label

• For the machine to search for patterns in data, it should be pre-processed


and converted into a form (features) that a computer can understand.
• Classification algorithms:
• depending on the information available about the problem, can be used to get
valuable results.
• In classification, the algorithm assigns labels to data based on the predefined
features.

25
• A class is a set of objects having some
important properties in common.
• A feature extractor is a program that inputs
the image and extracts features that can be
Classification used in classification.
in PR system • A classifier is a program that inputs the
feature vector and assigns it to one of a set
of designated classes or to the “reject”
class.

26
• It is important to choose and to extract
features that
1. are computationally feasible
Feature 2. lead to “good” PR system
3. reduce the data to a manageable scale without
Selection losing valuable information
and • Feature selection is the process of choosing
input to the classifier and involves
Extraction judgement.
• Extracted features should be relevant to the PR task at
hand.

27
Feature Selection
• Discriminating between Apples and Apricots:

x1= radius
x2= R+G+B (color)

Which feature is more discriminative?

28
Possible features for Character Recognition

(class) number number (cx,cy)


area height width … …
character #holes #strokes center

‘A’ medium high 3/4 1 3 1/2,2/3 … …

‘B’ medium high 3/4 2 1 1/3,1/2 … …

‘8’ medium high 2/3 2 0 1/2,1/2 … …

‘0’ medium high 2/3 1 0 1/2,1/2 … …

‘1’ low high 1/4 0 1 1/2,1/2 … …

‘W’ high high 1 0 4 1/2,2/3 … …

‘I’ high high 3/4 0 2 1/2,1/2 … …

‘*’ medium low 1/2 0 0 1/2,1/2 … …

‘-’ low low 2/3 0 1 1/2,1/2 … …

‘/’ low high 2/3 0 1 1/2,1/2 … …

29
Feature Space – Scatter plots
• Scatter plots are plots of sample feature vectors, x, Color
in feature space.
Radius
• Excellent visualization tools for determining …
feature vector distribution in Rn, where 𝑛 ≤ 3.
• Scatter plots often facilitate identifying natural or
obvious clustering of class-specific feature data
and the partitioning of Rn into “decision regions”
for classification. :

Radius

30
Color
Decision Regions & Boundaries
• A classifier partitions feature space into class-labeled decision regions.
• The border of each decision region is a decision boundary.
• PR system is looking for these decision boundaries.

R2 R1 R2
R1
R3
R2
R1
R2 R3
R1
R3 R4

Linear (piecewise) Quadratic (hyperbolic) (Relatively) general


31
Training Set

• Training set, H – a set of “typical” patterns, where typical


attributes/features or the class or structure of each is known
Þ provide significant information on how to associate input data with
output decision.
• H contains many samples/observations of input/output pair:
Samples: x(1) x(2) x(3). . . X(m) Number of samples: m

Class Label: w1, w2, w1,. . ., wc Number of classes: c

32
Given a collection of records (training set )
Each record contains a set of features, and the class.

Instances (samples, observations)

x(1) x(2) … x(50) … x(m)


(#) (!) (%) … (&') … (()
𝑥! 𝑥! 𝑥! 𝑥! 𝑥!

(#) (!) (%) … (&') … (()


𝑥% 𝑥% 𝑥% 𝑥% 𝑥%

Features : : : : : : :
(#) (!) (%) … (&') … (()
𝑥)*! 𝑥)*! 𝑥)*! 𝑥)*! 𝑥)*!

(#) (!) (%) … (&') … (()


𝑥) 𝑥) 𝑥) 𝑥) 𝑥)

Class w1 w3 … w1 … w2

Classes
(Labels, Target)
33
Classification: Definition
• Given a collection of records (training set )
• Each record contains a set of features, and the class.
• Find a model for the class as a function of the values of other
features.
• Goal: previously unseen records should be assigned a class as
accurately as possible.
• A test set is used to determine the accuracy of the model. Usually, the given data set is
divided into training and test sets, with training set used to build the model and test set used
to validate it.

34
Illustrating Classification Task
Training set

Tid Attrib1 Attrib2 Attrib3 Class


Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training xSet
x(m+1) (m+2) …
(#)
𝑥! 𝑥!
(!) (%)
𝑥! … Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
(#)
𝑥%
11 No (!)
𝑥%
Small (%)
𝑥%
55K ?

12 Yes Medium 80K ?
: 13 Yes : Large :110K ? : Deduction
14 No (!) Small 95K ?
(#)
𝑥& 𝑥
(%)
𝑥&67K …
15 No & Large ?
10

Class ? Test Set? … Test set 35


Classification

1. Learn from training data 2. Map unseen (new) data

36
Example: Image Classification
Image Classification: Problem
How
Neural Network
help Pattern
Recognition

39
Feature Selection
• Discriminating between Apples, Apricots, Oranges:

Pre-processing/ Feature Classification


Sensor
Real world enhancement extraction algorithm Class
Assignment

40
The Brain
• 2% of the weight of a person,
but consumes about 20% of the
energy.
• This energy is being used to
process and transmit all kinds
of signals, in particular, visual
information.
• The brain is nothing but a
network of neurons: very
complex network that is able to
allow us to do all the
sophisticated perception tasks

Eye
41
Whenever we see a new type of cat,
S

the same connection strengthens


ION

making it easier for us to recognize


CT

an animal as a cat next time


NE
CON

42
ARTIFICIAL
NEURAL NETWORK NEURAL NETWORK

Disclaimer: we are using simplified explanations and


formulation to make concepts more digestible
43
Neuron
‘hidden’ layer
‘input’ layer
Simply receive data
‘output’ layer
and pass it on

Fully connected layer: all pairwise neurons between layers


are connected
44
• Neural networks are loosely inspired by biology.
• But they certainly are not a model of how the brain works, or even how neurons work.
The dendrites actually receive signals from, for instance, other neurons. And then the signals

Neuron come in. It's a very simple computation to produce an output. The output then travels
through what's called the axon, and goes to the other end and goes to what are called
synapses, which is where one neuron makes a connection with another neuron.

/synapses

𝑥!
𝑤!
𝑤%
𝑥% Σ 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑤! 𝑥! + 𝑤% 𝑥% + 𝑤+ 𝑥+

𝑥+ 𝑤+
45
How many neurons (perceptrons)? 4+2=6

How many weights (edges)? (3 x 4) + (4 x 2) = 20

46
A Neural Network with multiple hidden layers
𝒙𝟏

𝒙𝟐
Single neuron in the output layer => final output

𝒙𝟑 𝑎
Note: it may be a set of
𝒙𝟒 neurons, which gives you
multiple outputs
depending on your
𝒙𝟓 applications.

Layer 1 Layer 2 Layer 3 Layer 4 But how does a computer know


Input Layer Hidden Layer Output Layer which weights to multiply by?
47
Case 1: Learning/Training

Know nothing about animals


Start to recognize “Cat” and “Dog”

Cat Dog

Learning based on the obvious feature – Color Classification Label


White Pet à Cat
Black Pet à Dog 48
Case 2: Testing

Learned feature – based on Color


White Pet à Cat
Black Pet à Dog

? ?

49
Case 3: Learning

Color à Cannot Classify them


Learning abstract and hidden feature features:
Ears and nose(whisker)

Prick

Ears

Lop Cat

Image
Long
Dog
whisker
nose

Short/no
whisker

Cat Dog
50
Do the same thing with machines we show them pictures tell them what's in them and hope they
figure out all the important features by themselves

𝒙𝟏

𝒙𝟐

𝒙𝟑 𝑎

𝒙𝟒

𝒙𝟓

Layer 1 Layer 2 Layer 3 Layer 4


Input Layer Hidden Layer Output Layer

Learning abstract and hidden feature features 51


Recognizing Handwritten Decimal Digitals
• MNIST Database:
• Handwritten digits database.
• widely used for testing the performance
of various networks.
• each handwritten digits are binarized,
and then rescaled into an image that is
roughly 20 by 20 pixels.
• Image is centered in a larger image, which
is about 28 by 28 pixels in size.

52
0 No
1 No
2 No
3 Yes
Neural 4
3
No
5 No
Network 6 No
7 No High Activation or
Normalized 8 Output for 3
No
Input 9 No

53
Character Recognition Network
Input Layer Hidden Layer Output Layer
784 neurons 30 neurons 10 neurons
0
𝒙𝟏 0
1
𝒙𝟐 0 0
2 0
𝒙𝟑 0
0
3 0
0
0
4 𝑎=
0 1
0


5
1 0
Input 6 0
28 x 28 0 0
784 pixels 7
0
𝒙𝟕𝟔𝟑 8
0 Activation
𝒙𝟕𝟖𝟒 9 or output 54
0
What will happen if we simply randomize the weights?

Input Layer Hidden Layer Output Layer


764 neurons 30 neurons 10 neurons
0
𝒙𝟏 0.2
1
𝒙𝟐 0.1 0.2
2 0.1
𝒙𝟑 0.3
0.3
3 0.3
0.3
𝟎. 𝟖
4 𝑎=
0.8 0.4
0.4


5
0.4 0.2
Input 6 0.6
28 x 28 0.4 0.4
784 pixels 7
0.2
𝒙𝟕𝟔𝟑 8
0.6
𝒙𝟕𝟖𝟒 9 Results
0.4
55
How to adjust the weights in such a way
that our outputs end up approaching the
desired results?

Training

56
Training Data
• Using training data with known desired activations.
• MINIST Dataset: 60,000 Sample Training Data
Training
Image
x
Label 0 4 1 5 9 3 9 5 6 6
1 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Desired 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
Activation
0 0 0 1 0 0 0 1 0 0
𝑎(𝑥)
4 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 1 0 0 0
57
Training Process
Training Data

Training Network
Images Neural Activation
Network

1) Randomly initialize Weights


2) Determine network activation/result for each training image

58
Compute Activations: Feedforward
Input Layer Hidden Layer Output Layer
764 neurons 30 neurons 10 neurons
0
𝒙𝟏 0.2
1
𝒙𝟐 0.1 0.2
2 0.1
𝒙𝟑 0.3 0.3
3
0.3
0.3
𝟎. 𝟖
4 𝑎=
0.8 0.4
0.4


5

0.4 0.2
0.6
6
0.4 0.4
7
0.2
𝒙𝟕𝟔𝟑 8
0.6
𝒙𝟕𝟖𝟒 9 Network
59
0.4 Activations
Training Process
Desired Activations

Training Network
Images Neural Activation Loss
Compute
Network Loss

Training Data
1) Randomly initialize Weights and Biases
2) Determine network activation for each training image
3) Compute Loss for the entire training image

The weaker the performance of the network, the larger the Loss.
60
Training Process
Desired Activations

Training Network
Images Activation
Loss
Neural Compute Gradient
Network Loss Descent

Training Data Update Weights & Biases

1) Randomly initialize Weights and Biases


2) Determine network activation for each training image
3) Compute Loss for the entire training image
4) Update Weights using Gradient Descent
5) Epoch: Repeat Steps 2 - 4 until Loss reduces to an Acceptable Level

62
https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw 63
Convolution Neural
Network (CNN)
neural network
architecture designed for
image/video applications

64
Image Classification: Character Recognition
Input Layer Hidden Layer Output Layer
784 neurons 30 neurons 10 neurons
0
𝒙𝟏 One-hot vector
1
𝒙𝟐 0 0.2
2 0 0.1
𝒙𝟑 0 0.2
3 0 0.3
0 0.1
784×30 4 𝑎4 = 𝑎=
1 0.8
weights 0

… 5 0.1
0 0.1
Input 6 0 0.1
28 x 28 0 0.2
784 pixels 7 Desired
Activation Activation
𝒙𝟕𝟔𝟑 8

𝒙𝟕𝟖𝟒 9 65
Error/loss
Image Object Classification
Input Layer Hidden Layer Output Layer
65,536 neurons 30 neurons 10 neurons
..
𝒙𝟏 One-hot vector
..
𝒙𝟐 0 0.2
.. 0 0.1
𝒙𝟑 0 0.2
cat 0 0.3
0 0.1
65,536×30 dog 𝑎4= 𝑎=
1 0.8
weights 0

… bird 0.1
0 0.1
Input tree 0 0.1
256 × 256 0 0.2
65,536 pixels flower Desired
Activation Activation
𝒙𝟔𝟓𝟓𝟑𝟒
..
𝒙𝟔𝟓𝟓𝟑𝟓
.. 66
Error/loss
Observation: Some patterns are smaller than
the whole image.
A neuron does not have to see the whole image to discover the pattern.

“dog nose”
detector

Objective: connecting to small region with less parameters


67
Recognizing some critical patterns for dogs

Input Layer 1 Layer 2


x1 ……
x2 Dog?
……
……

……

……
xN ……

Inspiration: It is likely that human identify dogs in a


similar way …
68
CNN - Summary
• Neurons is to judge whether there are any patterns appearing
• we don't necessarily need that much neurons to take the whole
picture as input.
• They only need to take a small part of the picture as input, which is
enough for them to detect whether there exists some critical
patterns.

69

You might also like