0% found this document useful (0 votes)

23 views48 pages

Week3 LearningI

Uploaded by

albertadi412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views48 pages

Week3 LearningI

Uploaded by

albertadi412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

CSCI218: Foundations

of Artificial Intelligence
Lectures on learning
§ Learning: a process for improving the performance of an agent
through experience
§ Learning I (today):
§ The general idea: generalization from experience
§ Supervised learning: classification and regression
§ Learning II: neural networks and deep learning
§ Reinforcement learning: learning complex V and Q functions
Supervised learning
§ To learn an unknown target function f
§ Input: a training set of labeled examples (xj,yj) where yj = f(xj)
§ E.g., xj is an image, f(xj) is the label “giraffe”
§ E.g., xj is a seismic signal, f(xj) is the label “explosion”
§ Output: hypothesis h that is “close” to f, i.e., predicts well on unseen
examples (“test set”)
§ Many possible hypothesis families for h
§ Linear models, logistic regression, neural networks, decision trees, examples
(nearest-neighbor), grammars, kernelized separators, etc etc
§ Classification = learning f with discrete output value
§ Regression = learning f with real-valued output value
Inductive Learning (Science)
§ Simplest form: learn a function from examples
§ A target function: g
§ Examples: input-output pairs (x, g(x))
§ E.g. x is an email and g(x) is spam / ham
§ E.g. x is a house and g(x) is its selling price

§ Problem:
§ Given a hypothesis space H
§ Given a training set of examples xi
§ Find a hypothesis h(x) such that h ~ g

§ Includes:
§ Classification (outputs = class labels)
§ Regression (outputs = real numbers)
Classification example: Object recognition
x

f(x) giraffe giraffe giraffe llama llama llama

X= f(x)=?
Example: Spam Filter
§ Input: an email Dear Sir.
§ Output: spam/ham First, I must solicit your confidence in
this transaction, this is by virture of its
§ Setup: nature as being utterly confidencial and
top secret. …
§ Get a large collection of example emails, each labeled
“spam” or “ham” (by hand) TO BE REMOVED FROM FUTURE
§ Learn to predict labels of new incoming emails MAILINGS, SIMPLY REPLY TO THIS
MESSAGE AND PUT "REMOVE" IN THE
§ Classifiers reject 200 billion spam emails per day SUBJECT.

§ Features: The attributes used to make the ham / 99 MILLION EMAIL ADDRESSES
FOR ONLY $99
spam decision
§ Words: FREE! Ok, Iknow this is blatantly OT but I'm
beginning to go insane. Had an old Dell
§ Text Patterns: $dd, CAPS Dimension XPS sitting in the corner and
§ Non-text: SenderInContacts, AnchorLinkMismatch decided to put it to use, I know it was
§ … working pre being stuck in the corner,
but when I plugged it in, hit the power
nothing happened.
Example: Digit Recognition

§ Input: images / pixel grids 0

§ Output: a digit 0-9
1
§ Setup:
§ MNIST data set of 60K collection hand-labeled images
§ Note: someone has to hand label all this data! 2
§ Want to learn to predict labels of new digit images

1
§ Features: The attributes used to make the digit decision
§ Pixels: (6,8)=ON
§ Shape Patterns: NumComponents, AspectRatio, NumLoops
§ … ??
Other Classification Tasks
§ Medical diagnosis
§ input: symptoms
§ output: disease
§ Automatic essay grading
§ input: document
§ output: grades
§ Fraud detection
§ input: account activity
§ output: fraud / no fraud
§ Email routing
§ input: customer complaint email
§ output: which department needs to ignore this email
§ Fruit and vegetable inspection
§ input: image (or gas analysis)
§ output: moldy or OK

§ … many more
Regression example: Curve fitting
Regression example: Curve fitting
Regression example: Curve fitting
Regression example: Curve fitting
Regression example: Curve fitting
Basic questions
§ Which hypothesis space H to choose?
§ How to measure degree of fit?
§ How to trade off degree of fit vs. complexity?
§ “Ockham’s razor”
§ How do we find a good h?
§ How do we know if a good h will predict well?
Training and Testing
A few important points about learning
§ Data: labeled instances, e.g. emails marked spam/ham
§ Training set
§ Held out set
§ Test set
§ Features: attribute-value pairs which characterize each x Training
Data
§ Experimentation cycle
§ Learn parameters (e.g. model probabilities) on training set
§ (Tune hyperparameters on held-out set)
§ Compute accuracy of test set
§ Very important: never “peek” at the test set!
§ Evaluation Held-Out
§ Accuracy: fraction of instances predicted correctly Data
(Validation set)
§ Overfitting and generalization
§ Want a classifier which does well on test data
§ Overfitting: fitting the training data very closely, but not Test
generalizing well Data
§ Underfitting: fits the training set poorly
A few important points about learning

§ What should we learn where?

§ Learn parameters from training data
§ Tune hyperparameters on different data
§ Why?
§ For each value of the hyperparameters, train
and test on the held-out data
§ Choose the best value and do a final test on
the test data

§ What are examples of hyperparameters?

Supervised Learning
§ Classification = learning f with discrete output value
§ Regression = learning f with real-valued output value
Linear Regression

Hypothesis family: Linear functions

Linear Regression

1000
900

House price in $1000

800
(x, y=f(x)), x: house size, y: house price 700
600
500
400
300
500 1000 1500 2000 2500 3000 3500
House size in square feet

Berkeley house prices, 2009

Linear regression = fitting a straight line/hyperplane
40
1000
900

House price in $1000

hw(x) 800
20
700
600
500
0
400
0
x 20

300
500 1000 1500 2000 2500 3000 3500
Prediction: hw(x) = w0 + w1x House size in square feet

Berkeley house prices, 2009

Prediction error
Error on one instance: y – hw(x)

Error or “residual”
Observation y
Prediction hw(x)

0
0
x 20
Find w
§ Define loss function

§ Find w* to minimize loss function

Least squares: Minimizing squared error
§ L2 loss function: sum of squared errors over all examples

§ Loss = ____________________________
§ We want the weights w* that minimize loss
§ At w* the derivatives of loss w.r.t. each weight are zero:

§ ¶Loss/¶w0 = __________________________

§ ¶Loss/¶w1 = __________________________
§ Exact solutions for N examples:
§ w1 = [NSj xjyj – (Sj xj)(Sj yj)]/[NSj xj2 – (Sj xj)2] and w0 = 1/N [Sj yj – w1Sj xj]
§ For the general case where x is an n-dimensional vector
§ X is the data matrix (all the data, one example per row); y is the column of labels
§ w* = (XTX)-1XTy
Least squares: Minimizing squared error
§ L2 loss function: sum of squared errors over all examples
§ Loss = Sj (yj – hw(xj))2 = Sj (yj – (w0 + w1xj))2
§ We want the weights w* that minimize loss
§ At w* the derivatives of loss w.r.t. each weight are zero:
§ ¶Loss/¶w0 = – 2 Sj (yj – (w0 + w1xj)) = 0
§ ¶Loss/¶w1 = – 2 Sj (yj – (w0 + w1xj)) xj = 0
§ Exact solutions for N examples:
§ w1 = [NSj xjyj – (Sj xj)(Sj yj)]/[NSj xj2 – (Sj xj)2] and w0 = 1/N [Sj yj – w1Sj xj]
§ For the general case where x is an n-dimensional vector
§ X is the data matrix (all the data, one example per row); y is the column of labels
§ w* = (XTX)-1XTy
Regression vs Classification
§ Linear regression when output is binary, ! ∈ −1, 1
§ ℎ! " = $" + $# "
y !! + !" #

1
x
§ Linear classification
§ Used with discrete output values
§ Threshold a linear function
§ ℎ! " = 1, if $" + $# " ≥ 0
§ ℎ! " = −1, if $" + $# " < 0 y
§ w: weight vector $ !! + !" #
1
§ Activation function g x
Threshold perceptron as linear classifier
Binary Decision Rule
§ A threshold perceptron is a single unit
that outputs
§ y = hw(x) = 1 when w.x ³ 0
= -1 when w.x < 0
§ In the input vector space
§ Examples are points x w0 : -3

money
2
wfree : 4
§ The equation w.x=0 defines a hyperplane y=1 (SPAM) wmoney : 2
§ One side corresponds to y=1

w.x
1
§ Other corresponds to y=-1

=0
0
y=-1 (HAM) 0 1 free
Example
Dear Stuart, I’m leaving Macrosoft
to return to academia. The money is
w0 : -3
money

2 is great here but I prefer to be free

wfree : 4 to do my own research; and I really
y=1 (SPAM) wmoney : 2 love teaching undergrads!
Do I need to finish
w.x

1
x0 : 1 my BA first before applying?
=0

xfree : 1 Best wishes

xmoney : 1 Bill
0
y=-1 (HAM) 0 1 free
w.x = -3x1 + 4x1 + 2x1 = 3
Weight Updates

Need a different solution than before given the characteristic of perceptron

Perceptron learning rule
§ If true y* ¹ hw(x) (an error), adjust the weights
§ If w.x < 0 but the output should be y*=1
§ This is called a false negative
§ Should increase weights on positive inputs
§ Should decrease weights on negative inputs
§ If w.x > 0 but the output should be y*=-1
§ This is called a false positive
§ Should decrease weights on positive inputs
§ Should increase weights on negative inputs
Perceptron Learning Rule
§ Start with weights = 0
y = hw(x) = 1 when w.x ³ 0
§ For each training instance:
= -1 when w.x < 0
§ If wrong: adjust the weight vector by
adding or subtracting the feature
vector. y* is true label.
*
* x
Example
Dear Stuart, I wanted to let you know
that I have decided to leave Macrosoft
w0 : -3
money

2 and return to academia. The money is

wfree : 4 is great here but I prefer to be free
y=1 (SPAM) wmoney : 2 to pursue more interesting research
and I really love teaching
w.x

1
x0 : 1 undergraduates! Do I need to finish
=0

xfree : 1 my BA first before applying?

xmoney : 1 Best wishes
0 Bill
y=0 (HAM) 0 1 free
w.x = -3x1 + 4x1 + 2x1 = 3

w ¬ w + a y* x w ¬ (-3,4,2) + 0.5 (0 – 1) (1,1,1)

a = 0.5 = (-3.5,3.5,1.5)
Perceptron convergence theorem
Separable
§ A learning problem is linearly separable iff there is some
hyperplane exactly separating positive from negative examples

§ Convergence: if the training data are linearly separable,

perceptron learning applied repeatedly to the training set will
eventually converge to a perfect separator

Non-Separable
Example: Earthquakes vs nuclear explosions

7.5 1
7
0.9

Proportion correct
6.5
6 0.8
5.5
5 0.7
x2

4.5 0.6
4
3.5 0.5
3
2.5 0.4
4.5 5 5.5 6 6.5 7 0 100 200 300 400 500 600 700
x1 Number of weight updates

63 examples, 657 updates required

Perceptron convergence theorem
Separable
§ A learning problem is linearly separable iff there is some
hyperplane exactly separating +ve from –ve examples

§ Convergence: if the training data are separable, perceptron

learning applied repeatedly to the training set will eventually
converge to a perfect separator

Non-Separable
§ Convergence: if the training data are non-separable,
perceptron learning will converge to a minimum-error
solution provided the learning rate a is decayed appropriately
(e.g., a=1/t)
Perceptron learning with fixed a

7.5
1
7
6.5 0.9

Proportion correct
6
5.5 0.8
5
x2

0.7
4.5
4 0.6
3.5 0.5
3
2.5 0.4
4.5 5 5.5 6 6.5 7 0 20000 40000 60000 80000 100000
x1 Number of weight updates

71 examples, 100,000 updates

fixed ! = 0.2, no convergence
Perceptron learning with decaying a

7.5
7 1
6.5 0.9

Proportion correct
6
5.5 0.8
5
x2

0.7
4.5
4 0.6
3.5 0.5
3
2.5 0.4
4.5 5 5.5 6 6.5 7 0 20000 40000 60000 80000 100000
x1 Number of weight updates

71 examples, 100,000 updates

decaying ! = 1000/(1000 + *), near-convergence
Non-Separable Case
Even the best linear boundary makes at least one mistake
Other Linear Classifiers
§ Perceptron is perfectly happy as long as it separates the training data
y

$+,-.#,'/( ! ⋅ #
1

§ Logistic Regression § Support Vector Machines (SVM)

1 § Maximize margin between boundary and nearest points
$#$%&'$( # =
1 + ' )*
y
$#$%&'$( !⋅#
1

x
Perceptrons hopeless for XOR function

x1 x1 x1

1 1 1

0 0 0
0 1 x2 0 1 x2 0 1 x2
(a) x1 and x2 (b) x1 or x2 (c) x1 xor x2
Basic questions
§ Which hypothesis space H to choose?
§ How to measure degree of fit?
§ How to trade off degree of fit vs. complexity?
§ “Ockham’s razor”
§ How do we find a good h?
§ How do we know if a good h will predict well?
Classical stats/ML: Minimize loss function
§ Which hypothesis space H to choose?
§ E.g., linear combinations of features: hw(x) = wTx
§ How to measure degree of fit?
§ Loss function, e.g., squared error Σj (yj – wTx)2
§ How to trade off degree of fit vs. complexity?
§ Regularization: complexity penalty, e.g., ||w||2
§ How do we find a good h?
§ Optimization (closed-form, numerical); discrete search
§ How do we know if a good h will predict well?
§ Try it and see (cross-validation, bootstrap, etc.)

57
Probabilistic: Max. likelihood, max. a priori
§ Which hypothesis space H to choose?
§ Probability model P(y | x,h) , e.g., Y ~ N(wTx,σ2)
§ How to measure degree of fit?
§ Data likelihood Πj P(yj | xj,h)
§ How to trade off degree of fit vs. complexity?
§ Regularization or prior: argmaxh P(h) Πj P(yj | xj,h) (Max a Priori)
§ How do we find a good h?
§ Optimization (closed-form, numerical); discrete search
§ How do we know if a good h will predict well?
§ Empirical process theory (generalizes Chebyshev, CLT, PAC…);
§ Key assumption is (i)id
58
Bayesian: Computing posterior over H
§ Which hypothesis space H to choose?
§ All hypotheses with nonzero a priori probability
§ How to measure degree of fit?
§ Data probability, as for MLE/MAP
§ How to trade off degree of fit vs. complexity?
§ Use prior, as for MAP
§ How do we find a good h?
§ Don’t! Bayes predictor P(y|x,D) = Σh P(y|x,h) P(D|h) P(h)
§ How do we know if a good h will predict well?
§ Silly question! Bayesian prediction is optimal!!
59
Acknowledgement

The lecture slides are based on the materials from ai.Berkey.edu

Questions

lec21-ML II
No ratings yet
lec21-ML II
66 pages
NN Theory
No ratings yet
NN Theory
138 pages
AI Linear Regression & Perceptron
No ratings yet
AI Linear Regression & Perceptron
8 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Lec 21
No ratings yet
Lec 21
34 pages
Lecture 06 - Perceptron
No ratings yet
Lecture 06 - Perceptron
28 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
383 Fall11 Lec19
No ratings yet
383 Fall11 Lec19
30 pages
Linear Models in Regression & Classification
No ratings yet
Linear Models in Regression & Classification
30 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
14 pages
Machine Learning - I
No ratings yet
Machine Learning - I
126 pages
Machine Learning - Classifiers and Boosting: Reading CH 18.6-18.12, 20.1-20.3.2
No ratings yet
Machine Learning - Classifiers and Boosting: Reading CH 18.6-18.12, 20.1-20.3.2
54 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
cs188 sp23 Lec25 - Z
No ratings yet
cs188 sp23 Lec25 - Z
38 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
ML Notes Unit-1
No ratings yet
ML Notes Unit-1
11 pages
Lecture 01-2
No ratings yet
Lecture 01-2
33 pages
lec22-ML III
No ratings yet
lec22-ML III
51 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Lecture Notes 3 Perceptron
100% (1)
Lecture Notes 3 Perceptron
7 pages
Lecture 19
No ratings yet
Lecture 19
8 pages
SP14 CS188 Lecture 22 - Perceptron - Print
No ratings yet
SP14 CS188 Lecture 22 - Perceptron - Print
35 pages
Machine Learning Algorithms Explained
No ratings yet
Machine Learning Algorithms Explained
46 pages
Supervised Learning: Linear Models
No ratings yet
Supervised Learning: Linear Models
34 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
17 pages
W4 - Logistic Regression
No ratings yet
W4 - Logistic Regression
52 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
92 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
46 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
ch6 (Q 2,8,4)
No ratings yet
ch6 (Q 2,8,4)
9 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
CS229 Lecture Notes
No ratings yet
CS229 Lecture Notes
142 pages
3-1 Supervised Learning With Scikit-Learn - Chapter 1 Classification
No ratings yet
3-1 Supervised Learning With Scikit-Learn - Chapter 1 Classification
87 pages
Intro to Binary Classification
No ratings yet
Intro to Binary Classification
10 pages
Week3 Perceptron Mlprwerwerwer
No ratings yet
Week3 Perceptron Mlprwerwerwer
8 pages
3 Percept Ron
No ratings yet
3 Percept Ron
34 pages
Machine Learning Workshop Overview
No ratings yet
Machine Learning Workshop Overview
44 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
AI Lec2.1 MLsupervised
No ratings yet
AI Lec2.1 MLsupervised
21 pages
Neural Networks Cheat Sheet - 2020 PDF
No ratings yet
Neural Networks Cheat Sheet - 2020 PDF
14 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Overview For Artificial Intelligence
No ratings yet
Overview For Artificial Intelligence
112 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Chapter 03 - 1731422626
No ratings yet
Chapter 03 - 1731422626
42 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lec1 PerceptronPocket Recap
100% (1)
Lec1 PerceptronPocket Recap
61 pages
ML Imp Ques 1
No ratings yet
ML Imp Ques 1
22 pages
Foundations of Computer Vision Techniques
No ratings yet
Foundations of Computer Vision Techniques
58 pages
Week1 Lecture1
No ratings yet
Week1 Lecture1
40 pages
AI & Neural Networks Basics
No ratings yet
AI & Neural Networks Basics
39 pages
Foundations of AI Course Outline
No ratings yet
Foundations of AI Course Outline
39 pages
Problem Set 11: Probability Exercises
No ratings yet
Problem Set 11: Probability Exercises
3 pages
Uji Statistik Parametrik
No ratings yet
Uji Statistik Parametrik
1 page
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Expected Value in Discrete Variables
No ratings yet
Expected Value in Discrete Variables
10 pages
K-Fold Cross Validation in R
No ratings yet
K-Fold Cross Validation in R
26 pages
Random Errors in Chemical Analysis
No ratings yet
Random Errors in Chemical Analysis
37 pages
(Ebook PDF) Essentials of Business Analytics 2nd Edition Download
100% (6)
(Ebook PDF) Essentials of Business Analytics 2nd Edition Download
45 pages
Wind Speed Map India
100% (1)
Wind Speed Map India
12 pages
QUANTITATIVE TECHNIQUES FOR BUSINESS 1 2021 Min
No ratings yet
QUANTITATIVE TECHNIQUES FOR BUSINESS 1 2021 Min
3 pages
PVQ40 Scoring and Analysis Guide
No ratings yet
PVQ40 Scoring and Analysis Guide
4 pages
An Nova 2
No ratings yet
An Nova 2
16 pages
Fitting of Poisson Distribution
No ratings yet
Fitting of Poisson Distribution
3 pages
Practica 11
No ratings yet
Practica 11
7 pages
Stat & Prob Symbols
No ratings yet
Stat & Prob Symbols
5 pages
Exame - 2022:2023 (2º Sem) - Soluções
No ratings yet
Exame - 2022:2023 (2º Sem) - Soluções
14 pages
NAS Ultimate Algo 2.0V Script
No ratings yet
NAS Ultimate Algo 2.0V Script
17 pages
Chapter 4 Lesson 3 Mesaures of Dispersion 1
No ratings yet
Chapter 4 Lesson 3 Mesaures of Dispersion 1
9 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
11 pages
Sampling Methods 1
No ratings yet
Sampling Methods 1
26 pages
Staffff
No ratings yet
Staffff
16 pages
Lecture 15 Amir Bashir
No ratings yet
Lecture 15 Amir Bashir
7 pages
Test of Hypothesis - T and Z Tests. Chi-Square Test. F Test.
No ratings yet
Test of Hypothesis - T and Z Tests. Chi-Square Test. F Test.
15 pages
Ratcliff - Methods For Dealing With Reaction Time Outliers
No ratings yet
Ratcliff - Methods For Dealing With Reaction Time Outliers
23 pages
Linear Statistical Models
No ratings yet
Linear Statistical Models
7 pages
Mock Paper SI
No ratings yet
Mock Paper SI
5 pages
Stata GME Estimation Guide
No ratings yet
Stata GME Estimation Guide
26 pages
Percobaan 3
No ratings yet
Percobaan 3
14 pages
MAST90083 2021 S2 Exam Paper
No ratings yet
MAST90083 2021 S2 Exam Paper
4 pages
BSA 301 Final Exam Solutions
No ratings yet
BSA 301 Final Exam Solutions
7 pages
Wilcoxon Rank-Sum Test Guide
No ratings yet
Wilcoxon Rank-Sum Test Guide
4 pages

Week3 LearningI

Uploaded by

Week3 LearningI

Uploaded by

CSCI218: Foundations

f(x) giraffe giraffe giraffe llama llama llama

§ Input: images / pixel grids 0

§ What should we learn where?

§ What are examples of hyperparameters?

Hypothesis family: Linear functions

House price in $1000

Berkeley house prices, 2009

House price in $1000

Berkeley house prices, 2009

§ Find w* to minimize loss function

2 is great here but I prefer to be free

xfree : 1 Best wishes

Need a different solution than before given the characteristic of perceptron

2 and return to academia. The money is

xfree : 1 my BA first before applying?

w ¬ w + a y* x w ¬ (-3,4,2) + 0.5 (0 – 1) (1,1,1)

§ Convergence: if the training data are linearly separable,

63 examples, 657 updates required

§ Convergence: if the training data are separable, perceptron

71 examples, 100,000 updates

71 examples, 100,000 updates

§ Logistic Regression § Support Vector Machines (SVM)

The lecture slides are based on the materials from ai.Berkey.edu

You might also like