Ortle 6 pFok bs yi
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon
Definition by Tom Mitchell (1998):
Machine Learning is the study of algorithms that
* improve their performance P
* atsome task T
* with experience FB.
A well-defined learning task is given by
.GF WT Aigai wiz jt Jo
Improve on task T, with respect to
performance metric P, based on experience E
- playing checkers
Percentage of games won against an arbitrary opponent
Playing practice games against itself
Recognizing hand-written words
: Percentage of words correctly classified
Database of human-labeled images of handwritten words
: Driving on four-lane highways using vision sensors
Average distance traveled before a human-judged error
A sequence of images and steering commands recorded while
‘observing a human driver.
P
.
1
B
e
Categorize email messages as spam or legitimate.
: Percentage of email messages correctly classified
2 Database of emails, some with human-given labels
myGrol Elgil
* Supervised (inductive) learning
— Given: training data + desired outputs (labels)
* Unsupervised learning
— Given: training data (without desired outputs)
* Reinforcement learning
— Rewards from sequence of actionsOg 7S) MBL 5 Fok,
Given (24, 43), (Xp, Ya), + (Tar Yo)
* Learn a function f(x) to predict y given x
— yis real-valued == regression
°
8
o*
6
5
Ba
2
a
o
°
September Arctic Sea ie Extent
(4,000,000 sql)
19701980 am90 200020102020ay Alig 1 b og 9b
* Given (24, 44), (@21 Yo), +» (av Yn)
* Learn a function f(x) to predict y given x
— y is categorical == classification
Breast Cancer (Malignant / Benign)
(Malignant) eo 2@200
(Benign)
Tumor Size
Predict Benign Mal
Tumor SizeOl & 6p Sol
* x can be multi-dimensional
— Each dimension corresponds to an attribute
Tumor SizeBS Go Gp Fol,
* Given x4, £2,...,%, (without labels)
* Output hidden structure behind the 2’s
— E.g., clustering
@e ®e
@ @
°° > e°
ee ° eeSe Stok,
* Given a sequence of states and actions with
(delayed) rewards, output a policy
— Policy is a mapping from states > actions that
tells you what to do in a given state
* Examples:
— Credit assignment problem
— Game playing
— Robot in a maze
— Balance a pole on your handGAY Alle’ oT ly cold Aegare Ky jl gl Algal
oe
2
&
Tig Home Marital
a ees
No
No
No
No
‘Yes
No
No
‘Yes
No
‘Yes.ole Elgil
TS or
(Types of Data)
Creed org
(Categorical) (Tue)
wle aug
BP cases aaa 7
(Ordinal) Binary) (Discrete) (Continues)oold wildy Gay hol aby
ools gio SE
We BEET G95 Spb 9 oy dob lubes w9ld js) G28 Szby wrgiie poli go Sy +
oats gil a LS
en olBaly psig (Quan) sil sh *
cols nals
Cine Aegean GEE! gle! sla adlye oles «Kege bus (dimensionality reduction) sy uals *
Silo yp Sigal 9 gan, Ag Aba pl Semed wygew 5, eld Jae (numerosity reduction) 265 pals +
(data compression) sls gjlw oa p23 *
0919 5 jes Aims 9 JSS ph
site Jleg +cole oj Lsk
lest slates wt gle sbRine jf Ite yy a2 a jlo Lle gle ols ogery Gfalo ay sainn GES ally clus yo la ook ©
Ay ge ols JEN tals U9 Pg
w5)8 089 ABLE 5 8p Sig ope Ja ay LL be als
jas +
wiles lke Sy gla ce yolks glass
Gg wky ae Se +
Sip pale b las opagh Jal ool a9 6
(as) "10" = Syio Sn
iUles slp ceabul Laas’ 49 GIES! ogy toy yall ¢
"2010/07/03" sig ai "42"
RBC aese 58 ey D3" ghd 86 43,
oad Sua 5 Sis JEL 4 guee sab & apd ool alt +
tan ay iy alee & etangd INS tpond 99> p) OdgaRe polio digha
2135) GES case *
SRS rab 9 7 Sle} 00g hiie o3lo ad G9) y,
ody
cal Sow AF oygh go Opry ay dese WIT SA“UNKNOWN” Come yy Wha 35 pl ps ul SE *
Sol.
og altl sgl colo us ay pe
cake She b Gakile *
sober ay glaie gle digas pla sly alle &
rome C59 jar pegd fhe qlistel colaylp! jf oolitel b ylade Gye 3 ares *oslo falas
IS oaks ae pace S S9y y gl aslo 0019 Salat ©
odes GUS Solos Se eS |) La onl i) gece qane eslojbul Uoald ok K, Goole Jules Le *
SS It 5072 1S p08 |p gel clos Gate 9 0852 2 cole] sles yee
wold Jui loyal, *
(dimensionality reduction) ax; Ja.
Sep just *
ete ag Aye Julad *
ike deena) Lal *
(numerosity reduction) 2 Js .Y
log-linear y yw, sla Jar *
by Spa st Migs pl feet
ool Gea Set
(data compression) oss (jlo pSIz2 .¥
ty hel old ay Su955 gybel aulth ad gs 9 HW pS a Fk
cote oslo Gateswile Jlos
Bad phe iI ates gle ly peel ldo tly IF ST oalo Jule yo silgage a, IS a Ghee woly
Soph ih glise als als a yao lyse ELE! ly
Se gully opine ee GT ole Fae 58 a pate argh Gloss 99 cae Se he I gh
cage Cabo gl 4 shew O59,
sylasteul & Slag wb cols gyal oily arias lanly LRT L La ya el iI LAT Kas ale
oy
27 3175 [0.0/1.0] & [1,2] aslen 65 yapee b KesS 05h 2 0 ange IEE pail oale
ile Slay lets,
minmax gibe Jas *
score gile Jas *
Sle soba Geb it eile Jia *wile Jlos
min- max, jlw Jleyi
Aes plasl hol oaks g9y98 ght JRE w at +
5H gba Jlas rian A caso 45 STS 9 lilam pole 0509 GLEI MAXy 9 Ming tess yd *
V/A JV Jake Sp tl8s & [neW_ming, mew_maxg] 95h
[ming, maxg] 0
v— min.
(new_max—new_min.)+new_min,
max.— min.
adie poe
SABIB5 [0.01.0] 036 52 1) al > atlys ge cual Aves g Weer sel jo cake yl 9 Jila> Sle *
gh co MNF ay ras VIF ee
73,600—12,000,
0-0) +0=0.716
98,000—12,000wile Jlos
(phe dike gible Jl,ib) z-score sjlu Jl
Bia oe ile JAB A jhene lye 9 abies polly, A cae ly polio *
ye Woes
on
Spb ge NTO ay fas VF + OF ee aloe
sy ceSile sll ©wile Jlos
i Jlonws lide Gr jb ji! oily Joy
A caise syste lagu aba 9,8 tale ugh jt a
vy, 8yIO A lle lade pS ay Sa owt Lule bis land =
To
cool Max ("JS 1 ass jg ay coal gaara GyiSegs jail yo
aly A Gils jade yiSla> cust 917 G -9B6 oj6 9A jJoadeud polis Je =
emit Neve ayy lala 0 Slag ola Gabe jlsile Slag gly cou! 986
GE ie) Sno
opcige Jle 50.917 «917 5 -0.986 «)-986Gjbw mnd?
werrtede SB come yt U ofl GE mg yb Gote Cite SY AE pollo Gos nL *
cabeg (Dealer JLaF 52 caster) ype WG g TMD Nome )ejl Cryge ay AF caw stile *
oe
4 go 00ld july Ral Gel *rilizro glory dine eloil
* K-nearest neighbor
+ Decision Trees
* Logistic regression
* Naive Bayes
* Neural networks
* SVMLe dy larod oy p95Tell me who your friends are and I'll tell you who
you are!Abana cy Ks 9 55
@ Suppose we're given a navel input vector x we'd like to classify.
@ The idea; find the nearest input vector to x in the training set and copy
its label.
@ Can formalize “nearest” in terms of Euclidean distance
Ix — [2 =, Yelevery example in the blue every example in the blue
shaded ares will be shaded area willbe classified
misclassified asthe blue class correctly as the red class
© Nearest neighbors sensitive to noise or mis-labeled data (“class noise” )
Solution?
@ Smooth by having k nearest neighbors vote
Algorithm (kNN):
1. Find & examples (x, 1} closest to the test instance x
2. Classification output is majority class
i
= (2) 2 4
y=argmax ae 2)KNN 31 Jus
682 9 ge dd 99. & ba SES gary dd aly gl daliay erry S39 *
tAIMIN yeh 4 ad plod aly> 9 ome! yy p20 coaglia Sieg 99 ole! »
ig Bad OAS yl) gS
3 = py poge Geconds) 32a ap Y= Classification
(kg/square meter)
* 7 Bad
7 4 Bad
3 4 Good
a 4 Good
utlgiaa coal 9 asd cl ghph XD=T gX1=3 sighs cod 6 analy apse lS eS
lay a8 el fad ind Ae BDU OB(aol!) KNN jf Jlso
Lies if
AB peep
inl 9 Sse! plat bssys See delbeutes 2
een haloes ILA sla ged ali UJ sT abo’ 5939 oly (3.7) gS bs 59 b
Rie wipe X= Gd Sah goal ab
{econds) Cipfaqeare nett)
g z YO C— =4
7 4 i+G- 5
3 4 4-37 += =3
1 i ae(aol!) KNN jf Jlso
Su29 eben K Steel y ahold polal » grjgal Gl sige uo ie 3
= geak Map GDGA NLS ie) alata
sileconds) —_Oce/square meter) oni Fen ay
7 7 ¥ =
7 7 ea 7 -
3 7 aa y =
; ‘ fa-3F+G-7F = a(aol!) KNN jf Jlso
AS she edd Uli oy12 Kop Gh Glhce jo) uss OS 4
3S G2an3 Gye
I cap CTLs =n
wana eae Sno
: 7 oS ro BE
i * EEG ' =
LE G78 a
T 7 roTradeoffs in choosing k?
@ Small k
> Good at capturing fine-grained patterns
» May overfit, ie. be sensitive to random idiosynerasies in the
training data
© Large k
» Makes stable predictions by averaging over lots of examples
> May underfit, i.c. fail to capture important regularities
© Balancing k
’
Optimal choice of & depends on number of data points n.
Nice theoretical properties if k + 00 and & + 0
Rule of thumb: choose k < /n.
We can choose i: using validation set (next slides).
vyOPLL yo Alanod yy HK 955 WT
[1 from skiearn inport datasets
fron sklearn-neighbons import KelghborsClessifien
fron stlearn.nodel_selection inport train test_split
4 Load the inte dataset
iris ~ datazets.load_sris()
Xe dete gata
y = Seis target
‘Split the data into trauning and test sete
X train, Ktest, y train, y test = train test split(t, y, [email protected])
4 create @ kell classifier with 5 nedghtors
rn = KheighoorsClassifien(n_nefghbors-S)
44 Fit the classifier to the training data
kon-F480X train, y-tratn)
4+ Evaluate the classifier on the test dats
accuracy = [Link](K test, y test)
print(#*Accuracy: (accuracy:.27)")[reonas a>
3»poponad asd
BSpe GAY Awd ogo | Le digad GT jo aS Cul 25> peed Sp ©
Bey ge Typ Sle OF 4p Cals 52 9 SET ge A, tly eae ay at, 5! a
2p Qe pat (attribute) 35 © Lb (non leaf) +222 L blo oF » @
BS 52 Thee 52999 JE be aba 521) Ne Shee oo!
a5 93 o9e5 (branch) aslo Jou cull jXee clyle slar a Leh oS 2 > ©
align eet llge gi) jist be Sy gi
wig (co Gattche lglge jf ats Sib g WW SLs ,9 gl cE, ©
peat aT Gd 9 ul aS Cull gal penal cdj0 lb gl efi cle ©
W809 654 GLE |) Gogy9 Sle Ke ae Gad chy GeBS gt pot | at he OS ot GS AO
1849 pr oi no ape cal tga Cop 9 diel Ue Hh?
Hp Sl Jab pot ob Jee 9 Ss i)
;
2 etl oot
95 gt 1S se of Shs ln S aly salp cl?ply S199 5 ote Ct lat ames CS 10 1 Jldo
a
eed
Model: Decision Treeply CS y iL ee tee Gly Ro ppm CS 39
aC rs
CeremmScCaoo
[No
No
No
No
|Yes
No
No
Yes.
No
Yes.praead ad 10 LIT 09ers
Start from the root of tree.
Test DataSve esl sT 9 Sigeles SIL SIT 2 te es alae pera CS 19 1 JLo
outlook:
= b&b ta
&e & &
lovercast hot high false
mile high false
highprponad S19 (yr led ogod
Outlook
S552 ont cote LLs,!
Sunny Overcast ~ Rein alae Eley soles 9 pened
> wl ah
a he a eee 2 FS ayders
gle oS heb Sg pee
sce! gaee
High ‘a = me
No Yes No Yes
(Outlook = Sunny \ Humidity = Normal)
v (Outlook = Overcast)
VY (Outlook = Rain \ Wind = Weak)
25% 9 20gei Gate |) 2 Shs II(AND) cies 5 Sy 4 at, jl
Sih oe | HLS 5 opN(OR) (bad og 3,9Papel 2 > ote ott sly pena CS 30 1 JLo(Iris) G35 JF cole de gore : Jl
Aol y ais US gi as 51s jf digas 50 Wedges gal 6 cal gat SIME j1 4a Gyglaar tigal 150 Jol aguas gel
sans> ys Sul g Spat 2a 9 dgbo Joli LaaSjnp gal caalossd sySeihil gu BS Sg Udeigas Sy yO ese igs
all fg
Bes)
— >= 175em
ga) eal 9 Gy L8yb aS ay gloien ust ausgicaliy gl 9 Siki-Learh Sayts oalatal Ll gay BS eal aegare
4. from [Link] import load iris
2. dataset = load iris()bw wg,
aS de
owed
etpened S19 City 6loets 9S
Decision Tree Induction
Blah 59 cile ay Me agreedyy athens spot fees tale pe pemet So ys GpeFals aglaera;sSll lel
iS ge Jae oe ge glacsp
* Hunt
+ ID3{(Iterative Dichotomiser 3)
+ CHS
+ CART (Classification And Regressoin Tree)tg ril- plo alone
Entropy(S) y piloge |
t |
Bucket 1 Bucket 2 Bucket 3
Entropy: 0 Entropy: 0.81125 Entropy: 1i973] Apoloee 51 Jide
Entropy(S) = ~ P1082) - 51085) = 0.940CALEB! 0 40
S sega 9) pA Spy Stel oe
Is
Ts Entropy(S,)
Gain(S,A) = Entropy(s) - 2
vevalites(A)
A Siig cob 9 canes jog cl 6 last nal ole *LEI 0 543 Aqwloee 51 ldo
ie
4
m
ewe = @)vas)-(2)ous(2)=0704
renmn- (Jou) eas
Gain, Wind) = E(D) - (Greve +5 stsro))
= 0.94 (0.491 +0346) = 0.94—0.837 = 0.102CLEIL! 0 py Annolree 51 JLo
vm | wo
~ maz]
Gain(o, rem) = £(0)- (Len + Sean + Leo)
eb = 0.94 ~ (0.231 + 0.346 + 0.346) = 0.94 ~ 0.932 = 0.008LEI 0 yy deauloee jf flo
ask [toe | Siner | Hom ee
svg [vor Siney | Hoh |
aieie—ctec| Qutook PeKOyerers—| 4.0
vee [mt _| Overt | nah 1.4
som | cod [tin [heal |
seg [cot |main [nomi | | E(Sunmy) = 0.722
wk [ma [sumer [vom | | E(Overcast) = 0
wat [oa [sumer [nom | | E(Rain) = 0.722
wok [me Sinoy | Hah eumnre '
seg [ma mon Hoh Gatton) = B10) (Fat + $0) + 00)
vetk [tet vera |verml | = 0.94 ~ (0258 +0 +0258) = 0.940516 = 0424
wk [oa [fam | HowCLEIL! 0 py Annolree 51 JLo
=
ih
= "sma 49 |
E(High) = 0.863
E(Wormal) = 0.985
Gent, Hid) = (0) ~ (70 + 7,60)
= 094 (0481 +0493) = 094-0974 = 0016
102
0.424 0.016LEU 0 php Aneolece 31 ldo
Sunny” Overcast Rain
Yes
1,02,08,09,11
[eed
5 =(p.p2pspep11)
Gain S ,Manidy) = 970 - (48) 00 - @15)00 = $70
Gain(S Tenperatare) = $70 - (25)00 - (25)1.0 - (1590.0 = 57
Gain(S Wind) = 970 - (215)1.0 - G5).918 = 919Leb] 0 pgs Awl 31 Jldook
49 prponcs
at)Training set accuracy: 1.000
Test set accuracy: 0.051preonad GS 10 51 Be ney ploulalg Gloold 9 Wee ten Gy GlE> H2 0 dewlro
sreay( 8.02009000e400, 0. 0200%EBIeHOD, e.09B00200e409, 0. c9B00900+00,
0000000006100, 0.600000006H00, 6, 59871500600, 0,00800080e+00,
1.646#1577e102, 0.€0090000e+00, 1.09477685e101, 0.00090000+00,
0.090000006100, 060000000400, -1.13177354e4011)prone CF 19 5155 porn yitha! GR kxGry Ade’ ob 55! cle jliro
# correct predictions
accuracy = @———_____—
# test instances
# incorrect predictions
error = 1 — accuracy = S&————___—
# test instances(Confusion Matrix) Sis) hyo uu yilo
* Given a dataset of P positive instances and N negative instances:
Predicted Class
Yes No
Actual Class
=| AD
No lie) T™N
precision =
FP
PP+FP
TP+TN
P+N
accuracy =
reoall=
“~ TP+FN390i! 9 Ajgel Gl ool
* Training data: data used to build the model
* Test data: new data, not used in the training process
* Training performance is often a poor indicator of
generalization performance
— Generalization is what we really care about in ML
— Easy to overfit the training data
— Performance on test data is a good indicator of
generalization performance
— i.e., test accuracy is more important than training accuracyTraining and Test Data
Training Data Idea:
Full Data Set Train each
a model on the
“training data”...
..and then test
= Data each model’ s
accuracy on
the test datak-Fold Cross-Validation
* Why just choose one particular “split” of the data?
— In principle, we should do this multiple times since
performance may be different for each split
+ k-Fold Cross-Validation (e.g., k=10)
— randomly partition full data set of n instances into
k: disjoint subsets (each roughly of size n/k)
— Choose each fold in turn as the test set; train model
on the other folds and evaluate
— Compute statistics over k test performances, or
choose best of the k modelsExample 3-Fold CV
Full Data Set 1 Partition 2° Partition kt Partition
[| | 9 =
Test
Performance
Test Test
Performance Performance
‘Summary statistics
over Is test
performancesGeneralization
Test set (labels
unknown)
Training set (labels known)
* How well does a learned model generalize from the data it was trained
on to a new test set?Predictive
Underfitting Overfitting
Error
Error on Test Data
Error on Training Data
Mode! Complexity
Ideal Range
for Model Complexityoss
pramead C19 39 ily te
‘Ov texining data
‘On test dara
200 30 4D
Size of tee (aumber of nodes)Oo)ig) Selo
Le pete le el Lely ee g ye cle Si! Sy
ee te eb Bw lb ee gale LLG! ge SL doles ™
Cj cepeae cle pte 5 gelace! —(colatil cle pite lb Kis byes uy *
J @2 ee424 pone scyqems Fy
et dole Gh oho *ae Fy Elgil
cole be Og Sy 8
Y=P+BpX +é eh Rs Se ee
see bt op Sy *
Vs PytBX,+ BX, tt BX, tO eek ee Rs See we i
Sod tae 55
3282 oh i Rs Se pe te
yt fX,
a=PUN=I|X,
1 AA,
I+(0 yaiine SG) ool hd cyquaw Fy i! Jeo
¥F vh-F
wr vee
re vort
WA nov
hg (x) = 0) + O,xwoke gen) evar tule0 patio Me bs " Jo
> 9 5)
Oe
= = Se
+ O4x4
Ey
1x1 + Oox2 + 03
=O +
he(x)© prmito 92 pga Fy wd (yy lol
YSarma cy gaan 955
OAS (or 9)F 59) p72 Alia! Sy gb 51,5
yelihe(x) 205 Sees ae
a
y = 0: ho(x) < 0.5
= So cf
owed