02-AI and Data Analytics For Computer Vision
02-AI and Data Analytics For Computer Vision
1
Schedule
Week 4 (10/2) Lecture AI and Data Analytics for
Computer Vision
Weeks 5 Workshop
Lab001: 19/2, Wed., 12:30-14:20
Lab002: 17/2, Mon., 14:30-16:20
Lab003: 21/2, Fri., 12:30-14:30
Lab004: 17/2, Mon., 12:30-14:20
Weeks 6 (Optional)
Lab001: 26/2, Wed., 12:30-14:20
Lab002: 24/2, Mon., 14:30-16:20
Lab003: 28/2, Fri., 12:30-14:30
Lab004: 24/2, Mon., 12:30-14:20 2
Submission
• Quiz at week 6
• Group project
• This group project and demonstration provides students the opportunity to get familiar with the training
process of using neural networks to a computer vision problem. The open-ended task is to use Teachable
Machine (https://teachablemachine.withgoogle.com/) to train an AI model to recognize objects that may
be applied to their disciplines. It aims to encourage self-learning, to train logical thinking, and to
practice reporting/demonstration skill.
• Form group made up of four to five students who will work together to select a topic and develop an AI
model to recognize objects.
• Demonstration: create a demonstration video in 3-5 minutes.
• Report writing
• Deadlines and important dates:
• Submission of the group composition via Blackboard by 7 Mar., 2025
• Submission of the group report and video demonstration via Blackboard by 11 Apr., 2025
3
• Sometimes children mix up cats and dogs especially if they were not exposed to different breeds or colors or
sizes of animals before.
4
As soon as they have seen
enough cats and dogs and other
furry animals they have learned
the difference.
5
6
Pattern recognition (PR): we rely on
patterns or features to figure out Receptors
what type of objects.
Big pointy ears
Visual Cortex
a long tail
fur
Eye
7
How
COMPUTER VISION
Application works
8
Computer Vision
• “Computer Vision” is an area of Machine Learning that deals with
image recognition and classification.
• Computer Vision models can be developed to accomplish tasks
• like facial recognition, identifying which breed a dog belongs to and even
identifying a tumor from CT scans, …..
9
Face Detection and Recognition
10
Face Detection
for Privacy
Protection
Human Action/Activity Recognition
easy task
17
Digital Image Basics
• All images are displayed in the form of a two dimensional matrix of
individual picture elements called pixels.
Horizontal resolution
pixel value: 0 to 255
18
• Grayscale Image: Good quality black-and-white pictures can be obtained by using
8 bits per picture element (pixel)
• 256 different levels of gray per element.
19
8 bits
R G B
24 bits
20
• Derive the size of the image with RGB digitization
format. Assume that each has the resolution of
1920×1080 and the pixel of each component is
represented by 8 bits
1920×1080×3×8 =
Exercise
49766400 bits
21
Pre-processing/ Feature Classification
Sensor
Real world enhancement extraction algorithm Class
Assignment
22
Image Enhancement
23
Pre-processing/ Feature Classification
Sensor
Real world enhancement extraction algorithm Class
Assignment
/Label
24
Pre-processing/ Feature Classification
Sensor
Real world enhancement extraction algorithm Class
Assignment
/ Label
25
• A class is a set of objects having some
important properties in common.
• A feature extractor is a program that inputs
the image and extracts features that can be
Classification used in classification.
in PR system • A classifier is a program that inputs the
feature vector and assigns it to one of a set
of designated classes or to the “reject”
class.
26
• It is important to choose and to extract
features that
1. are computationally feasible
Feature 2. lead to “good” PR system
3. reduce the data to a manageable scale without
Selection losing valuable information
and • Feature selection is the process of choosing
input to the classifier and involves
Extraction judgement.
• Extracted features should be relevant to the PR task at
hand.
27
Feature Selection
• Discriminating between Apples and Apricots:
x1= radius
x2= R+G+B (color)
28
Possible features for Character Recognition
29
Feature Space – Scatter plots
• Scatter plots are plots of sample feature vectors, x, Color
in feature space.
Radius
• Excellent visualization tools for determining …
feature vector distribution in Rn, where 𝑛 ≤ 3.
• Scatter plots often facilitate identifying natural or
obvious clustering of class-specific feature data
and the partitioning of Rn into “decision regions”
for classification. :
Radius
30
Color
Decision Regions & Boundaries
• A classifier partitions feature space into class-labeled decision regions.
• The border of each decision region is a decision boundary.
• PR system is looking for these decision boundaries.
R2 R1 R2
R1
R3
R2
R1
R2 R3
R1
R3 R4
32
Given a collection of records (training set )
Each record contains a set of features, and the class.
Features : : : : : : :
(#) (!) (%) … (&') … (()
𝑥)*! 𝑥)*! 𝑥)*! 𝑥)*! 𝑥)*!
Class w1 w3 … w1 … w2
Classes
(Labels, Target)
33
Classification: Definition
• Given a collection of records (training set )
• Each record contains a set of features, and the class.
• Find a model for the class as a function of the values of other
features.
• Goal: previously unseen records should be assigned a class as
accurately as possible.
• A test set is used to determine the accuracy of the model. Usually, the given data set is
divided into training and test sets, with training set used to build the model and test set used
to validate it.
34
Illustrating Classification Task
Training set
3 No Small 70K No
6 No Medium 60K No
Training xSet
x(m+1) (m+2) …
(#)
𝑥! 𝑥!
(!) (%)
𝑥! … Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
(#)
𝑥%
11 No (!)
𝑥%
Small (%)
𝑥%
55K ?
…
12 Yes Medium 80K ?
: 13 Yes : Large :110K ? : Deduction
14 No (!) Small 95K ?
(#)
𝑥& 𝑥
(%)
𝑥&67K …
15 No & Large ?
10
36
Example: Image Classification
Image Classification: Problem
How
Neural Network
help Pattern
Recognition
39
Feature Selection
• Discriminating between Apples, Apricots, Oranges:
40
The Brain
• 2% of the weight of a person,
but consumes about 20% of the
energy.
• This energy is being used to
process and transmit all kinds
of signals, in particular, visual
information.
• The brain is nothing but a
network of neurons: very
complex network that is able to
allow us to do all the
sophisticated perception tasks
Eye
41
Whenever we see a new type of cat,
S
42
ARTIFICIAL
NEURAL NETWORK NEURAL NETWORK
Neuron come in. It's a very simple computation to produce an output. The output then travels
through what's called the axon, and goes to the other end and goes to what are called
synapses, which is where one neuron makes a connection with another neuron.
/synapses
𝑥!
𝑤!
𝑤%
𝑥% Σ 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑤! 𝑥! + 𝑤% 𝑥% + 𝑤+ 𝑥+
𝑥+ 𝑤+
45
How many neurons (perceptrons)? 4+2=6
46
A Neural Network with multiple hidden layers
𝒙𝟏
𝒙𝟐
Single neuron in the output layer => final output
𝒙𝟑 𝑎
Note: it may be a set of
𝒙𝟒 neurons, which gives you
multiple outputs
depending on your
𝒙𝟓 applications.
Cat Dog
? ?
49
Case 3: Learning
Prick
Ears
Lop Cat
Image
Long
Dog
whisker
nose
Short/no
whisker
Cat Dog
50
Do the same thing with machines we show them pictures tell them what's in them and hope they
figure out all the important features by themselves
𝒙𝟏
𝒙𝟐
𝒙𝟑 𝑎
𝒙𝟒
𝒙𝟓
52
0 No
1 No
2 No
3 Yes
Neural 4
3
No
5 No
Network 6 No
7 No High Activation or
Normalized 8 Output for 3
No
Input 9 No
53
Character Recognition Network
Input Layer Hidden Layer Output Layer
784 neurons 30 neurons 10 neurons
0
𝒙𝟏 0
1
𝒙𝟐 0 0
2 0
𝒙𝟑 0
0
3 0
0
0
4 𝑎=
0 1
0
…
…
5
1 0
Input 6 0
28 x 28 0 0
784 pixels 7
0
𝒙𝟕𝟔𝟑 8
0 Activation
𝒙𝟕𝟖𝟒 9 or output 54
0
What will happen if we simply randomize the weights?
…
5
0.4 0.2
Input 6 0.6
28 x 28 0.4 0.4
784 pixels 7
0.2
𝒙𝟕𝟔𝟑 8
0.6
𝒙𝟕𝟖𝟒 9 Results
0.4
55
How to adjust the weights in such a way
that our outputs end up approaching the
desired results?
Training
56
Training Data
• Using training data with known desired activations.
• MINIST Dataset: 60,000 Sample Training Data
Training
Image
x
Label 0 4 1 5 9 3 9 5 6 6
1 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Desired 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
Activation
0 0 0 1 0 0 0 1 0 0
𝑎(𝑥)
4 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 1 0 0 0
57
Training Process
Training Data
Training Network
Images Neural Activation
Network
58
Compute Activations: Feedforward
Input Layer Hidden Layer Output Layer
764 neurons 30 neurons 10 neurons
0
𝒙𝟏 0.2
1
𝒙𝟐 0.1 0.2
2 0.1
𝒙𝟑 0.3 0.3
3
0.3
0.3
𝟎. 𝟖
4 𝑎=
0.8 0.4
0.4
…
…
5
…
0.4 0.2
0.6
6
0.4 0.4
7
0.2
𝒙𝟕𝟔𝟑 8
0.6
𝒙𝟕𝟖𝟒 9 Network
59
0.4 Activations
Training Process
Desired Activations
Training Network
Images Neural Activation Loss
Compute
Network Loss
Training Data
1) Randomly initialize Weights and Biases
2) Determine network activation for each training image
3) Compute Loss for the entire training image
The weaker the performance of the network, the larger the Loss.
60
Training Process
Desired Activations
Training Network
Images Activation
Loss
Neural Compute Gradient
Network Loss Descent
62
https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw 63
Convolution Neural
Network (CNN)
neural network
architecture designed for
image/video applications
64
Image Classification: Character Recognition
Input Layer Hidden Layer Output Layer
784 neurons 30 neurons 10 neurons
0
𝒙𝟏 One-hot vector
1
𝒙𝟐 0 0.2
2 0 0.1
𝒙𝟑 0 0.2
3 0 0.3
0 0.1
784×30 4 𝑎4 = 𝑎=
1 0.8
weights 0
…
… 5 0.1
0 0.1
Input 6 0 0.1
28 x 28 0 0.2
784 pixels 7 Desired
Activation Activation
𝒙𝟕𝟔𝟑 8
𝒙𝟕𝟖𝟒 9 65
Error/loss
Image Object Classification
Input Layer Hidden Layer Output Layer
65,536 neurons 30 neurons 10 neurons
..
𝒙𝟏 One-hot vector
..
𝒙𝟐 0 0.2
.. 0 0.1
𝒙𝟑 0 0.2
cat 0 0.3
0 0.1
65,536×30 dog 𝑎4= 𝑎=
1 0.8
weights 0
…
… bird 0.1
0 0.1
Input tree 0 0.1
256 × 256 0 0.2
65,536 pixels flower Desired
Activation Activation
𝒙𝟔𝟓𝟓𝟑𝟒
..
𝒙𝟔𝟓𝟓𝟑𝟓
.. 66
Error/loss
Observation: Some patterns are smaller than
the whole image.
A neuron does not have to see the whole image to discover the pattern.
“dog nose”
detector
……
……
xN ……
69