LECTURE 5: LEARNING
EL 367
KOBINA ABAKAH-PAINTSIL
Department of Electrical and Electronics Engineering
LECTURE OBJECTIVES
■ Supervised Learning
■ Hypotheses Evaluation
■ Reinforcement Learning
■ Unsupervised Learning
Learning
■ In learning, the AI is not given explicit instructions on what to
do to solve a problem but instead, some data is provided and
the AI is left to understand the data and make inferences
from it.
■ Supervised Learning
– The AI is provided a data set of input-output pairs and
then the AI is expected to create a function to map inputs
to outputs.
■ Classification
– A supervised learning task where the focus is a function
mapping an input point to a discrete category
Learning: Classification
■ Example: Weather Prediction
– Here the AI is given historical data on the weather
conditions (Humidity and Pressure) of previous days and
this data has also been labelled by a human hence
supervised learning.
Learning: Classification
■ Example: Weather Prediction
– Mathematically this can be written as
Hypothesis function: Estimation
of the function f.
Learning: Classification
■ Example: Weather Prediction
– The data can be plot on a 2D graph (2 inputs, more
inputs, more dimensions) for this example.
– Hence the job of the AI is to train a model such that any
new input without a label can be classified under these
conditions.
Rainy days
No rain days
Learning: Classification
■ Example: Weather Prediction
– Given a new input (white circle on graph) we can with
some degree of certainty conclude that the day may be
rainy.
Learning: Classification
■ Nearest-Neighbor Classification
– It is an algorithm that, given an input, chooses the class of
the nearest data point to that input
On the example to
the left, this
algorithm may
classify our white
circle as a rainy
day because all
historical data
points around it
signify rainy days
Learning: Classification
■ k-Nearest-Neighbor Classification
– It is an algorithm that, given an input, chooses the most
common class out of the k nearest data points to that
input
Learning: Classification
■ Linear Regression
– The goal here is to find a boundary that will separate the data
points to overcome the drawbacks for the previously discussed
algorithms. Sometimes this boundary is not perfect.
Learning: Classification
■ Linear Regression
– The hypothesis function (h(n)) will multiply each variable by a
weight to determine whether the new input is in whether in one
category or another.
– Mathematically:
Learning: Classification
■ Linear Regression
Rain
No rain
Learning: Classification
■ Weights: How do we select/tune these weights?
– Perceptron learning rule: Given data point (x, y), update
each weight according to:
Learning rate
Learning: Classification
■ Thresholds
Logistic regression
Learning: Classification
■ Support Vector Machines
– It is an algorithm that analyses data for classification and
regression by transforming the inputs into a higher-
dimensional feature space.
Learning: Classification
■ Support Vector Machines
– Support Vectors: Vectors closest to the boundary
separator.
– Maximum margin separator: boundary that maximizes the
distance between any of the data points
– In some cases there’s no line but rather a hyperplane that
differentiates one category from another. This helps in
cases where the dataset is not linearly separable.
Learning: Classification
■ Support Vector Machines
– Example of Linearly inseparable dataset
Learning
■ Regression
– It is a supervised learning task of learning a function
mapping an input point to a continuous value.
Learning: Regression
■ Regression
– Finding a way to correctly estimate a value from the data
available.
Learning
■ Hypothesis Evaluation
– Loss Function: function that expresses how poorly a
hypothesis performs
– 0-1 loss function
– L1 loss function
Learning: Hypothesis Evaluation
■ 0-1 Loss Function
Learning: Hypothesis Evaluation
■ L1 loss function
Learning: Hypothesis Evaluation
■ L2 loss function
■ Overfitting: In overfitting, a model fits too closely to a
particular data set and therefore may fail to generalize to
future data.
Learning: Hypothesis Evaluation
■ Overfitting
– Generally:
cost(h) = loss(h)
cost(h) = loss(h) + complexity(h)
■ Regularization: penalizing hypotheses that are more complex
to favor simpler, more general hypotheses
cost(h) = loss(h) + λcomplexity(h)
Learning: Hypothesis Evaluation
■ Holdout cross-validation: splitting data into a training set and
a test set, such that learning happens on the training set and
is evaluated on the test set
■ k-fold cross-validation: splitting data into k sets, and
experimenting k times, using each set as a test set once, and
using remaining data as training set.
Learning
■ Reinforcement Learning
– The agent is given a set of rewards or punishments to
learn what actions to take in the future.
Learning: Reinforcement Learning
■ Markov Decision Process
– A model for decision-making, representing states, actions,
and their rewards
Learning: Reinforcement Learning
■ Markov Decision Process
– A model for decision-making, representing states, actions,
and their rewards
Learning: Reinforcement Learning
■ Q-learning
– A method for a learning function Q(s, a) that estimates the
value of performing action a in state s.
■ Q-learning Approach
– Start with Q(s, a) = 0 for all s, a
– When an action is taken and receive a reward:
■ Estimate the value of Q(s, a) based on current reward and
expected future rewards
■ Update Q(s, a) to take into account old estimate as well as
our new estimate
Learning: Reinforcement Learning
■ Q-learning Approach
– Start with Q(s, a) = 0 for all s, a
– Every time we take an action a in state s and observe a
reward r, we update:
Q(s, a) ← Q(s, a) + α(new value estimate - old value estimate)
Q(s, a) ← Q(s, a) + α(new value estimate - Q(s, a))
α => learning rate. 1 = new information is more valuable,
0=ignore all new information
Q(s, a) ← Q(s, a) + α((r + future reward estimate) - Q(s, a))
Q(s, a) ← Q(s, a) + α((r + maxa' Q(s', a')) - Q(s, a))
Learning: Reinforcement Learning
■ Q-learning Approach
– Start with Q(s, a) = 0 for all s, a
– Every time we take an action a in state s and observe a
reward r, we update:
Q(s, a) ← Q(s, a) + α(new value estimate - old value estimate)
Q(s, a) ← Q(s, a) + α(new value estimate - Q(s, a))
Q(s, a) ← Q(s, a) + α((r + future reward estimate) - Q(s, a))
Q(s, a) ← Q(s, a) + α((r + maxa' Q(s', a')) - Q(s, a))
Q(s, a) ← Q(s, a) + α((r + γ maxa' Q(s', a')) - Q(s, a))
Learning: Reinforcement Learning
■ Greedy Decision-making
– When in state s, choose action a with highest Q(s, a)
■ Explore vs. Exploit
– Exploitation: Using information that the AI already has
– Exploration: Exploring other options available to the AI.
Learning: Reinforcement Learning
■ ε-greedy
– Set ε equal to how often we want to move randomly.
– With probability 1 - ε, choose estimated best move.
– With probability ε, choose a random move.
■ Function approximation
– It is the approach of approximating Q(s, a), often by a
function combining various features, rather than storing
one value for every state-action pair
Learning
■ Unsupervised Learning
– This involves giving the AI agent input data without any
additional feedback and allow it to learn patterns
■ Clustering
– This involves organizing a set of objects into groups in
such a way that similar objects tend to be in the same
group
– Some applications of clustering include
■ Genetic research ■ Medical imaging
■ Image segmentation ■ Social network analysis
■ Market research
Learning: Unsupervised Learning
■ k-means clustering
– It is an algorithm for clustering data based on repeatedly
assigning points to clusters and updating those clusters'
centers
Learning: Unsupervised Learning
■ k-means clustering
– It is an algorithm for clustering data based on repeatedly
assigning points to clusters and updating those clusters'
centers