ree Learnin
g
part}
part:
part-3
part-4
part 5
part-6
part-7
part ®
pert?
part-
10:
CONTEN TS
earning
a. B-6P to 3-8P
3-2P to 3-5P
jsion Tree 1
pec
ree Learning tee
Decision T
Algorithm
Inductive Bic p, Inductive 3-BP to 3-10P
inference t, Decisio? Trees
seo 3-10P to g-12P
Entropy and
Theory nformati?
ip-3 Algorithm ~ _ 3-12P to 3-13P
jseve® 19 Decisio? Tree Algorithm aeeeee g-13P to g-14P
Instance” _pased Learning ~ ee g-14P to g-17P
K-Nearest Neighbor Learning 3-18P to g-21P
Locally weighted Regressio® 3-21P to 3-22P
nctio? Networks ~ 3-22P to 3-24P
g-24P 1 3-30P
ewaren
Daehn
Tree Learm
=
r
LPART-1
De A
lecision Tree Leary
rnin
i“.
GueB. | Doncrtbe th
1¢ basic terminology used
in decision tree.
Answer
panic terminology used in deci
inion trees
are:
L node : It represents e1
era psenta enti
gets divided into two or curehe pean Or samghnged thi
geneous seta, this farther
Splitting + Itin a process of dividing a node
into twoor mote sub
sub-nodes
2
4, Decision node : When»
ig called decision node, ssub-node splits into further mb-noes
woes, then it
Branchisub-tree
Splitting
| Decision aode
‘Terminal node) |Teeminal oo 2
J
Fig. 3.1.1.
4. Leaf! ‘terminal node: Nodes that do not spits called eal terminal
node.
5, Pruning: ‘When we remove sub-nodes of adecision node, this process
js called pruning: ‘This process is opps"? co splitting process
6 Branch/ sub-tree : Asub section of entire Gee jscalled branch or sub-
tree.
7, Parent and child node + nich is divided into sub-nodes 6
called parent node of ssub-nodes where ‘as sub-nodes are the child o
parent node.
Que 3.2. | Why do we use decision tree t
jerstand and interpret.
be visualized, simple ound
1. Decision trees canMachine Learning |
They require less data preparation whe:
require data normalization, the creation of
Teas othe,
ds techn
of blank values. a rigs of
3. The cost of using the tree (for predicting data) is |
number of data points used to train the tree, Raith,
4. Decision trees can handle both categorical and numeri
other techniques are specialized for only one type ofvan
5. Decision trees can handle multi-output problems,
Decision tree is a white box model i.e., the explanation
can be explained easily by Boolean logic because there
For example yes or no.
in
, ata wy
iable, "ny,
for the
Con
are tivo ou
~
Decision trees can be used even if assumptions are Violated
dataset from which the data is taken 4 ty
Que 3.3. | How can we express decision trees ?
z=
1. Decision trees classify instances by sorting them down the tree from, the
root to leaf node, which provides the classification of the instance
2. Aninstance is classified by starting at the root node of the tree, testing
the attribute specified by this node, then moving down the tree branch
corresponding to the value of the attribute as shown in Fig. 3.3.1.
3. This process is then repeated for the subtree rooted at the new node,
4. The decision tree in Fig. 3.3.1 classifies a particular morning according
to whether it is suitable for playing tennis and returning the classification
associated with the particular leaf.
Outlook
Sunfy Overcast Rain
Yes
Humidity J
i Normal Strong, Weak
BO Fig. 8.31.it
—s
(Qutlook = Rain, Temperature =
Hot,
would be sorted down the left meas
they belong to the same class @F not Por the instances me
clans, a single name is used to demote the clase -uiere on
instances are classified on the baais of splitting attr nute
2 Cas:
+ C45 ivan algorithan used to generate a decision tree | a0 tetas
of [D3 algorithm.
a. C4 5 generates decision trees which cau be used for lasmification
and therefore C4.5 is referred to as statistical classifier
4 It ts better than the [D8 algorithis because it deals with doth
continuous and discrete attributes and also with ‘he musing 6a
and pruning trees after constructionDP img,
Me
ACA 5 becmse it ef
Hier decision troeg —
gee UT 9 tree pruning process Thi
Ce ten am a
ed ttyg
And Trees): *
senification and regrension
tr
ant
5 CART Co onesie
constrocted by CART
. through ty,
Mary
ng the epliting attribute
reqrossion analyst With the
he og
‘ART can be used in forecant
of predictor variable over a i a
‘en
of time
of processing and supports box,
Toe regression feature of C
veriable given # et
‘of 105 algorithas +
used to create understandable prediction rules,
1)? searches the whole te the whole tree.
thus enabling the test data to be pruned and
Jtfiode the Jeal poder
rb uber of tess
of 1 ithe fear function ofthe product of the
5 The calculation time
ee etertic number and Ge UMD
Dinedvantages of IDB algorithm +
{For ssmall sample, data may be overfitted or overclassified.
2 For making # decision, only one attribute is tested at an instant thus
conssuamang ¢ lot of time
Chaneifying the continuous data may prove to be expensive in terms of
computation, os many Wee have to be generated to wee where to break
the continua sequence
Iv overly enstive to features when given large number of input
value
4Mace, e007 10 implement
A gbuitde rode that can he ean,
pn vandle bath categorical and contin
we
1
3 can deal with noise and missing vate sttrinune
‘ veantanes of CAS algorithm :
is emall variation in data can lead to differen,
i
dec
As ; ory
For a small training set 068 nat wor very wel,
2 antages of CART algorithm
sav
‘CART can handle missing values automaticaly using ory
ruses combination of continuoua/diserete + arable:
+ CART automatically performs variable selection
CART can establish interactions among variables
5.
predictive variable.
cadvantages of CART algorithm :
Ls CART has unstable decision trees.
‘ART splits only by one variable,
3. tis non-parametric algorithm.
CART does not vary according to the monotime +
Transformation of
PART-3
Inductive Bias, Inductive Inference with Decision Trees.
Que 3.8. | Explain inductive bias with inductive system.
Answer
Inductive bias :
1, Inductive bias refers to the restrictions that ar:
e imposed by the
assumptions made in the learning method
For example, assuming that the solution to the problem of road safety
can be expressed as a conjunction of a set of eight concepts.
3
This does not allow for more complex expressions that cannot be
expressed as a conjunction,
‘This inductive bias means that there are some potential solutions that
we cannot explore, and not contained within the version space we
examineHe P (MOA,
Hi
a 24)
ann te wens BAO WHEY gy
eM ena
oni
yuld 110" BMT og
Machine Len ni
‘ no Wy '
at the Harner proce’ ,
‘ of eraanin at hon i
han tM bw able (0 ¢ Ananify dat E a WW had proven, ;
ener wot nl ol be able Reneralyg
inte 8
eee the ennai etimirnt ion sgovith in that ig
The indueti inn of w natant if a Che BypAtHEHeN conn jy
el tito eda he nae cient
within ite vermon pace won hinitation on (he earning nyo)
vy Tene, he indie bw me a
Inductive rystem ¢
Indu we nyateny Be
Candidate now inwtance or
Training examples] qjjmination ny nin
algorithm me
New instance
Using hypothe
Fig B81.
Que 39, | Explain inductive Jearning algorithm.
Answer
Inductive learning algorithm :
Step 1: Divide the table ‘7° containing m examples into 1 sub-tables
(11.12, tn) One table for each possible value of Lhe class attribute (repeat
steps 2: for och eubtable)
Step 2: Initinlize the attribute combination county = |
Step 3: For the sub-table on which work in going on, divide he attribute list
‘nto distinct combinations, each combination with j distinct attributes
Step 4: For each combination of attributes, count the number of occurrences
of attribute values that appear under the same combination of attributes in
unmarked rows of the sub>table under consideration, and al. he same time,
not appears Under the same combination of attributes of ob
Call the firet combination with the maximum number of occurrences the
max-combination MAK
Beep 6: MAX « « null, incream jby 1 and go to Step
Beep 6: Mark all rowe of the wub-tuble where working, in which the valuet
AMAK appear, wn classified(MOAMom-4)
lis Dictator Tag
ts Add arate HP attriite yy.
stat got) whe Tol hand side yy
(rw wm noparated by AND, an
value anwociated with
V6 ath
14 ite right
8b table
ap IF all 70% AF MAPK We clay
Bum able ans Ko to Stop 2, eine, Mota er rs
Mvwith the set of ruler obtained qi then
que il0. | Which lenrning algorithms an,
‘Bed in inductive bing?
Fnewer |
Learning algorithm used in inductive hing are:
1, Rote-learner : -
1 emerYe a
4 Ino why,
1 Learning corresponds to storing each
memory. Beach chested training —
b, Subsequent instances are classifi by looking
«. If the instance is found rem opin menor,
7 in meme Fa
returned. MY, the stored
Otherwise, the system refuses 10 elasait
©. Inductive bias : There is no inductive
2, Candidate-elimination ;
lassifiestior,
4
bas,
a, New instances are classifier
r only in the case where all member
the current version space
‘agree on the classification
Otherwise, the system refuses to classify the new instance
¢, Induetive bias : The
target concept ean be represented ns
hypothesis space,
3. FIND-S:
a. This algorithm, fi
inds the most speeific hypothesis consistent »'!
the training examples,
itthen uses this hypothesis to classify all subsequent instances
Inductive bias : The target eoncept can be representesi
hypothesis space, and all instances are negative instances unless
the opposite is entailed by its other knowledge
<<
b,
¢
ParT-4 |
Entropy and Information Theory, Information Gain.
‘Que 3.11, | Explain attribute selection measures used in decision
tree,S-ILP OMCA-Gep,
4)
Answer
Attribute selection measures
1, Entropy:
Entropy is a measure of uncertaint
variable
ii The entropy increa!
randomness and dec
randomness.
iii. The value of entropy ranges from 0-1
used in decision tree are :
ty associated with a rang,
lom
cos with the increase in uncertainty
eases with a decrease in uncertainty o
Y or
Entropy) = Y,~ ?, logs.)
is ¥ hat an arbitrary tupl
where p, is the non-zero probability ¢ y tuple in p
belongs to class C and is estimated by {¢,D\/|D]
iv. A log function of base 2 is used because the entropy is encoded in
bits Oand 1
2 Information gain :
{ TD8 uses information gain as ts attribute s
Information gain is the difference between the original information
gain requirement (i. based on the proportion of classes) and the
new requirement (i.e. obtained after the partitioning of A).
. (D,
Gain(D, A) = Entropy(D) - y Hl Entropy(D,)
election measure.
Where.
D: Agivendata partition
A: Attribute
V:: Suppose we partition the tuples in D on some
attribute 4 having V distinct values
ii. Disoplit into V partition or subsets, (D,, Dy, .D,) where D, contains
those tuples in D that have outcome a, of A.
iv. The attribute that has the highest information gain is chosen.
3 Gain ratio:
i The information gain measure is biased towards tests with many
outcomes
it a is, it prefers to select attributes having a large number of
values:
iii, As each partition is pure, the information gain by partitioning !®
maximal. But such partitioning cannot be used for classification.
iv, C4.5 uses this attribute selection measure which is an extension
the information gain.Decision Tae Learning
x, Gain ratio dlfers from information e —
, i,
information with res toa classification hie Measures the
‘on some partitioning hat is acquired based
vi, Gain ratio applies kind of informa
value defined as ea RAIN Using a spit
information
Splitinfo, =
vii The gain ratio is then defined ag
. 7 Gain (A)
Gain ratio (A)
vi, splitting attribute is elected which ene attrib
maximum gain ratio, ane
EE
Part-5,
ID-3 Algorithy,
Le
Que 3.12, | Explain procedure of 1D6 algorithm,
OR
Describe ID-3 algorithm withan ‘example.
‘Answer |
D3 (Examples, Target Attribute, Attributes) =
1
2.
ARTU 2021-22, Marks 10
Create a Root node for the tree.
fall Examples are positive, return the singlenode tree root. with late
st
{fall Examples are negative, return the single-node tree wot, with abe
If Attributes is empty, return the single-node tree root, with label =
most common value of target attribute in examples.
Otherwise begin
a. A the attribute from Attributes that best classifies Examples
b. The decision attribute for Root =A
© For each possible value, V., of A,
i, Add a new tree branch below root, corresponding to the test
AsV,3-13 P (Mca,
Machine Learning
i Let Example V, be the subset of Examples that hay, ~
Ale
\
for A
If Example V, isempty j
"Then below this new branch add a leaf node yy
a ivrnost common value of TargetAttribute in Examph late)
Flee below this new branch add the sub-tree 1D3 (Ry.
\-. TargetAttribute, Attributes ~ (A) Xap
b.
6 End
7 Return roots
6
[| Part-
Issues in Decision Tree Algorithm.
ee
Que 515. | Discuss the issues related to the applications of decision
trees.
OR
List out the five issues in decision tree learning.
AKTU 2021-22, Marks 19
Answer
Issues related to the applications of decision trees are :
L Missing data :
2 Whenvalues have gone unrecorded. or they might be too expensive
to obtain.
b. Two problems arise :
i. Toclaseify an object that is missing from the test attributes.
Te modify the information gain formula when examples have
unknown values for the attribute.
2 Multi-valued attributes :
2 When an attribute has many possible values, the information gain
measure gives an inappropriate indication of the attribute's
usefulness
& In the exreme case, we could use an attribute that has a different
value for every example
© The subset of examples would be a singleton with a unique
——s so the information gain measure would have it®
Bighest vale for this etre the ateibuve coud be irrelevant of
4 One solution is to use the gain ratiops? CCAMCTS)
Continuous and integer vq)
3
input ate,
a. Height and weight have an intinne we
b, Rather than generating ina tPan
learning algorithms find the spay branches
information gain, Point th;
decision tree
at gives the highest
c. Efficient dynamic progeammi
split points, but it is still the mo has exist for finding gond
decision tree learning applications, MY part of raat woe id
4, Continuous-valued output attributes ;
a, Ifwe are trying to predict an,
mumerical value. ,
a work of art, rather than discret. laste, de an tbe Price of
regression tree. NS. then: we need
b Such a tree has a linear function of seme.
attributes, rather than a single vain a = of cumerical
c. The learning algori
ithm must decide when go stop splitting and
ST Tegression using the'remaineng attributes
Que 3.14. | Describe limitation of decision tree . :
begin applying line:
Rea ew
— Ss, Marks 10
Answer
Limitation of decision tree classifier :
1. Prone to overfitting : CART decision trees are prone to a
‘raining data, iftheir growth is not restricted nacmene
problem is handled by pruning the tree. which. offeetrequiarises the
model.
2 Unstable to changes in the data: Significantly different trees can be
Produced from training, if small changwagccupin the date
3. Non-continuous : Decision trees are piece-wise functions, not «mooth
Fy continuous. This piece-wise approximation approaches a -oanae,
function the deeper and more complex the tree gets,
4 Unbalanced classes : Decision tree classifiers can be biased if the
‘raining data is highly dominated by certain classes.
5
Greedy algorithm : CART follows a greedy algorithm that finds ony
'ocally optimal solutions at each node in the tree
eee
PART-7
Instanee-Based Learning
ee eeeON
3-15 P imc,
Machine Learning
Que 3.15. | Write short note on instance-based learning
Answer |
Inctance-Based Learning (IBL? is an extension of nearest neigh
hbo ur,
"_ K-NN classification algorithms
IBL aigorsthms do not maintain a set of abstractions of mode]
eateg
2
from the instances
3. The K-NN algorithms have large space requirement
They also extend it with a significance test to work with nois,
Y instances
4
since a lot of real-life datasets have training instances ang qi:
CNY
algorithms do not work well with noise.
Instance-based learning is based on the memorization ofthe datages
The number of parameters is unbounded and grows with the size of
data. he
The classification is obtained through memorized examples.
The cost ofthe learning process is 0, all the cost isin the computation op
the prediction.
9 This kind Jearning is also known as lazy learning.
‘Que 3.16. | Explain instance-based learning representation,
Answer
Following are the instance based learning representation
Instanee-based representation (1) :
1. The simplest form of learning is plain memorization.
2 This s completely different way of representing the knowledge extracted
from a set of instances just store the instances themselves and operate
by relating new instances whose class is unknown to existing ones
whose clase is known.
Instead of creating rules, work directly from the examples themselves.
ao
- x
4.
Instance-based representation (2) :
1. Instance-based learning is lazy, deferring the real work as long as
possible.
2. In instance-based learning, each new instance is compared with existing
ones using a distance metric, and the closest existing instance is used to
assign the class to the new one. This is also called the nearest-neighbour
classification method
Sometimes more than one nearest neighbour is used, and the majority
class of the closest k-nearest neighbours is assigned to the new instance
This is termed the k-nearest neighbour methodposte
when computing the distanes between 1
1 pychidear distance may be used 9 Pxampies
vjatance of 18 aiened if he val
2 jance alues are identical, otherwise the
sah tripaes wil be ore important
3. finds of attribute weighting. To get a ae vers We need some
training set is KEY problem tribe weights fom he
Irmay not be necessary, oF desirable to store all
the
training instances
In! ce-bi
1 Generally some regions of attribute space ar= more-stabit
to class than others, and just a few examples a smth regard
arent drawback to instance-based n
epreseat.ation is that :b
2, AnapP i
not make ‘explicit the structures that ar= learned
004 Pa
0 ope0? neo
oo Ce
© 050,°
77!
(@ a
Que 3-17. ‘What are the performance dimensions used for instan
based jearning algorithm 2
‘Answer
Performance dimension used for instance-based learning algorithm
are:
1, Generality +
a. Thisis the class of concepts hat describe the representation of a2
algorithm.
b. IBL algorithms can pac-learn any concept whose boundary 15 &
union of a finite number of clased yper-curves of finite size
2, Accuracy : This concept leeribes the accuracy of classification
3, Learning rate?
a. Thisisthe speedat which classification accuracy increases during
training.
b, It is amore useful indicator of the performance of the learning
algorithm than accuracy for finitesized training °°
4, Incorporation costs ?
while updating the coneePt descriptions with &
a, These are incurred
single training instance.
They include classifiation cosBALE MCA stony,
4)
Machine Learning
ent : This isthe site of the concept doscripyig
Storage requirem:
Ihr alewrithmns which is defined ax the number of saved inataneg
for classification decisions 8
FaeTIS] What ave the functions of instance-bated lonening »
Answer
ing are:
Functions of instance-based learnt
Similarity function :
‘This computes the similarity between a (raining instance j agg
instances in the concept description. the
bb Simulanities are numericvalued.
2 Classification function =
4g Thasreceives the similarity function's results and the classifica
performance records tion
b Itpieldsa classification for i,
Concept description updater :
This maintains records on elassification performance and decid
schich instances to include in the concept description. os
& Inputs include i, the similarity results, the classification results,
ani a current cancept description. It yields the modified concept
description
Que 318. | What are the advantages and disadvantages of instance.
based learning ?
Answer |
fo
Ue
LR
8
of the instances in the concept description,
Advantages of instance-based learning:
L Learningis trivial
2 Works efficiently.
2 Nowe resistant.
4 Bichrepresentation, arbitrary decision surfaces.
5 Easyto understand.
Disadvantages of instance-based learning :
L Need lots of data.
2 Compatational cost is high
4 Restricted tox « R°
4 Ieophust weights of attributes (need normalization)
5 Need large apace for storage i.e, require large memory
6 Expenwive application timey
‘ woaseml
Decision Tree Lexening
[Panta
|
Nearest Ne
K-Nearest Neighbor Learning
a] Deserbe KNeareet Neighbor apt ith
with steps
que
er
‘anew a
ANN classification algorithm i sed
y TH a belong to which clas, ie tease eee
vin K = 1, we have the nearest reghtor gut
2 NN elassifiationsicromenta
MN gasification doesnot have tani phase
Ra Training uses indexing id seat ate 0
During testing KNN classification algorithm has ps wt
0 find K-nearest
5 i i
neighbors ‘of anew instance. This is time consuming ifwe do oxhausti
arison.
‘est neighbors use the local neighborhood to obtain a preci
fiction
a. Kone nm : Let m be the number
gorithm : Let m bethe number otrning da ;
argnown point. sioenntte
1, Storethe training samples in an array of data points array. Ths moan
ach element of this array represents a tuple x»
9, Fori=0tom:
Calculate Euclidean distance darril,p.
3 Make sot S of K smallest distances obtained. Hach of these distances
corresponds to an already classified data put, "
Return the: majority label among S.
4
‘Que 3.21. | What are the advantages and disadvantages of K-aearest
neighbor algorithm ?
Answer
‘Advantages of KNN:
1, Simple implementation: k-NNiseasy 0 understand and implement
‘making it suitable for beginners.
2 No training required : There's no explist training phos 2° algorithm
Teams from the data directly during lassficaton
3, Non-parametric :k-NN makes no assumpaons
data distribution, making it versatile
4. Multiclass classification : Naturally ext
problems without modification.
5. Robust to noise: NN tends to perform wellin the P
data and outliers
about the anderiying:
ds to multidlas classification
resence of noisyN
Machine Learning S10 Mo,
se,
—™.
Disadvantages of KNN : ,
1. Computational complexity t Classification invoty,
; 8
{items to altainingiatances, making i computatigng My,
for lange datasets expen
2 Memory intensi i
which can be memory-intensive for large datasets,
Sensitive to feature sealing: Performance can be affoo ;
ealing, as it relies on distance metrics, "ea by fay
; un,
4. Parameter sensitivity : Performance depends heavily o "
ofthe parameter k, requiring optimal tuning, ron the chy,
cg
5 Imbalanced data : May produce biased results in dat,
imbalanced class distributions. orn
6 Impact of irrelevant features : All features are considoy
including irrelevant ones, which can degrade performance, «Ally,
Que 3.22. | Apply KNN for following dataset and predict cla
test example (Al = 3, A2= 7). Assume K = 3 88 of
Wine
Explain k-nearest neighbor learning algorithm with an example,
AKTU 2021-22, Marks 10)
oR
What is K-nearest neighbor algorithm ?
Answer
Step 1 : Calculate the distance between the query instance and all training
samples.
Coordinates of query instance is (3,7)
Give suitable example.
‘AKTU 2022-23, Marks 10the distance and det
or
tne’.
Decision Tren Vestn
Square distance
AUery instance (4,7,
a
Baz
(1-38 414 ap
ermine nearest neighbors based on
Rank minimum
distance
Step 4 : Usin,
“alve of query instant
: gab as peition
i majority of the category of nearest neiglLearning 11 POMCA Soy
Ai,
Mac!
A, = 3and A, = 7is included in True category,
Kenonreat neighbor algorithm : Refer Q. 3.20, Page 3-18P, Unig
——— HT Omit
Locally Weighted Regression.
Que 3.25. | Explain locally weighted regression.
‘Answer
such as neural networks and the
m
1. Hxture of
1. Model-based methods,
Gaussians, use the data to build a parameterized mode!
‘After training, the model is used for predictions andthe data are ge
Nerally
discarded.
3, Incontrast, memory-based methods are non-parametric appro
that explicitly retain the training data, and use it each time a pre, ‘aot
needs to be made. ition
Locally Weighted Regression (LWR) is @ memory-based method tha
vecforms a regression around a point using only training data that t
Jocal to that point. are
5. LWR was suitable for real-time control by constructing an LWR- hag
system that learned a difficult juggling task. ed
6. The LOESS (Locally Estimated Seatterplot Smoothing) model performs
a linear regression on points in the data set, weighted by a kerne|
centered atx.
7 The kernel shape is a design parameter for which the original LOBSS
model uses a tricubic kernel :
h(x) = hx =,) = expl— lx —, 7),
where h is a smoothing parameter.
x
Fig. 8.23.1.
(x), and define n= 5h,
as:
For brevity, we will drop the argument x for h,
We can then write the estimated means and covariance:
1 3A,
of =
n
8.
A(x, ~ 1,00, — 1)
n
LAX,
a=
now)
n
We s
snedati covariances expen
wo Fienated variances
o yheit
th 2
af ‘
Oy (2 =p | SA Eo
o
Kernel too wide ineludes
Kernel just right ae
Kernel too narrow - eel yds som,
Radial Basis Function Networks
gare | Explain Radial Basis Function (REF).
ower |
1 Radial Basis Function (RB i afuetina st seg 9a
cach input from its domain (it isa real-value functicm) and the value
produced by the RBF is always an abetute valve’ ¢ ts amessure of
Fistance and cannot be negative
2, Euclidean distance (the straight-line distance) between "0 >
Euclidean space is used.
4 Radial basis functions are usedto apprinat ns
networks acts as function approximatars
spresents a radial basis function aetwork
3s, such as neural
4, The following sum re]
x
yx) = Yow, ahaa!)
ot
‘The radial basis functions act as activation functions
6 ‘The approximant y(is differentiable with resect 35 °°
are learned using iterative update methods commen ©
networks.
Qued.25.] Explain the architecture of a radia!
network,
| basis functionerr Sap
1 Radial Ba: ction (RBF) networks have thre
layer. a hidden laver with a non-linear RBE activatign eee
linear output layer. tie in,
2 The input ean be modeled as a vector of real num berg . tna
The output of the network is then a scalar function or am.
6: R” + Rand is given by PO,
5
“
oixd= Ya, plllx—e, ID
where n is the number of neurons in the hidden Ja, 6 i th
vector for neuron : and a, is the weight of neuron j jn the linear, Cente,
neuron outnet
Output y
Linear weights
Radial basis
functions
Weights
OC Input x
Fig. 8.25.1. Architecture of a radial basis function network
An input
vecuor x 3
used as input w all radial basis functions, each with different
vs The output of the network is a linear combination of the
outputs from radial basis functions.
4 Functions that depend only on the distance from a center vector are
racvally symmetric about that vector.
5 In the basic form al] inpute are connected to cach hidden neuron.
6 The radial basis function is taken to be Gaussian
Pix — 6 = expl-Biix—c, 1]
The Gauvsian basis functions are Jocal tot
that
enter vector in the sense
Im phix-¢ jpeg
(40 hanging parameters hone neurgn has only asmall effect for mput
values that are far away {rom the center of that neuronMcA-Sem-A)
pur i Petition Tree Learning
5 certain mild conditions on the .p, . ~
Given i | Be of the activation
ABP networks are universal poring On a compas
ip means that an RBF networ, With enough t subset of Re
4 proximate any continuous function gq, 12h hidden neuron
aenitrary precision,
rameters 6. an Bare determine,
10. Tye fit between @ and the data ” A manner that optimizeg
the
qa | Explain initance-based learning, Compare
eoighted reeression and radial bayigfuncy..” networks,
er
nstance-based learning: Refer Q. 3.15, pap, 215P, Unit.3
gino. [Aspect | Locally Weighted jas Rabel
ete |_Networke (ene
[Purpose | Non-parametric
1 regression method,
locally
| Feed forward neural
network,
Training
(3, | Number of
"| parameters
a7 ]Moder Mow complexity Higher ompie
| complexity
| 5. | Applications | Used in robotics, control,| Used in function
and signal processing | approximation,
classification, and
clustering tasks.
Part-11
Case-Based Learning.
Write short note on case-based learning algorithm.lachine Learning. iis, ll fp
aE Me,
A
o
Answer |
| Cane Rased Learning (CBE) AOFM Contain y
so ane genet an tpt concept
te pemeente preaversone of ron! PALE V
cam ws
+The rimary campanent of HE CONCEPL doweripstig es
The or PL algorithms maintain additional relate ce ™,
ae cerpome of generating accurate Predictionn (toy xan hny to
fort feature weight™) Mae
Corvent CBL algorithms assume that e808 Are deere ha
where features are eith Uri,
value repronentation er Pred ic (hfe,
or gly,
features ote
4 Chi algorithms are distinguished by their PrOCOHNiNR bE hay, Ml
——— Our
Que 4.28. | What are the functions of case-based learning
m9
Answer
Functions of case-based learning algorithm are ;
L Pre-processor : This prepares the input for procenping (for
normalizing the range of sumericvalued features to ensure Oxia,
are treated with equal unportanee by the similarity function fo Shey
the raw input into a set of canes erating
2 Similarity:
This function asseses the similarities of f given eiyy
previously stored eases in the concept description, With 4,
b Assessment may involve explicit encoding and/or
congrfation Yivamig
© CEL similarity functions find a compromise along the conti
between thene extremes num
& Prediction : This function inputs the si: rity ABGCHAMENty
generates 8 prediction for the value of the given case's goal featured
2 classification when it ix symbolic-valued) et
4 Memory updating : This updates the stored ase-base, such ap
modifying or abstracting previously stored case » forgetting cases
Presumed to be nowy, or updating a feature's relevance weight setting
Que 329.) Describe case-based learning cycle with different
schemes of CBL.
Answer
Cavorbased learning algorithm processing stages are :
l. hoe retriev: fal : After the problem situation has b
a matching Case is searched in the case-base and
Solution is retrieved
en assessed, the
approximate
anpavement’! .
~ pation : The rot rin
conme dnp’ Hr taht hone fy
a One probhen “ co
inion Teme Senin
Problem
|
ly
|
|
ed:
=
Confirmed hi
wolution pened
solution _ thn
Fig. 3.20.1, The CBE, eyeie
golution evaluation :
a. The adapted ‘solution ean be evaluated either before the olution os
applied to the problem oF after the solutinn has been apyiied, ,
b, In any case, if the accomplished result is not satisfaetor the
retrieved solution must be adapted again pr nore -soe. stows ne
retrieved.
Jane-base updating® If the sol ve the mes
ca be ed tthe cane ition was veri AERC, the new
Different scheme of the CBL working cycle are :
1 Rett 1¢ most similar case,
9g, Reuse the case to attempt to solve the eurrent problem
3, Revise the proposed solution if necessary,
4, Retain the new solution as a part of « new case
‘Que 3.30. | What are the benefits of CBL as a lazy problem solving
method ?
Answer
‘The benefits of CBL as a lazy problem solving method are :
1. Ease of knowledge elicitation :
a. Lazy methods can utilise easily available case or problem instances
instead of rules that are difficult to extract,
b. So, classical knowledge engineering is replaced by case acquisition
and structuring.— _
PATE ead AO
— CA
2. Absence of problem-solving bias Sem
aera evan be weed for multiple problem-solving Puro
ther are stored in a rate form. 8, he
Maca eantrast fo eager methods, which ean be used 4,
the purpose for which the knowledge has already been, catsly fa
Incremental learning : mn
TRL evetem can be put into operation with a minima,
cases furnishing the case base. ;
The case base will he filled with new cases increasing the g,
problem-solving ability i
Peeides augmentation of the case base, new indexes ang ¢,
Bees ape created and the existing ones ean be che lSter,
a The in contrast requires a special training periog wp Sd
This in contrast tion knowledge generalisation) i per'°™*Ve,
Hence, dynamic on-line adaptation a non-tigid envirgne nt
nt
eng
3B
°C Solvay
a
b
tomy,
e
possible. is
4 Suitability for complex and not-fully formalised solution -
CBL eystems can applied to an incomplete model of problem q :
ves both to,identity relevant case foyr i,
a
implementation invol h 1
sr to furnish, possibly a partial case base, with proper cag. "es
b Lazy approaches are appropriate for complex solution spaces
hich replace the presented dat,’ than
‘ith
eager approaches, 1
abstractions obtainedby generalisation.
5. Suitability for sequential problem solving
2 Sequential tasks, like these encountered reinforcement learn;
problems, benefit from the storage of history in the form of sequen’
of states or procedures. ence
b. Such a storage is facilitated by lazy approaches.
6 Ease of explanation :
2 Theresultsofa CBL system can be justified based upon the similari
of the current problem to the retrieved:ease. ty
b. CBL are easily traceable to precedent cases, it is also easier to
analyse failures of the system.
7. Ease of maintenance : This is particularly due to the fact that CBL
systems can adapt to many changes in the problem domain and the
relevant environment, merely by acquiring.
Que 3.31. | What are the limitations of CBL ?
Answer
Limitations of CBL are :
1. Handling large case bases :
2, High memory storage requirements and time-consuming retrieval
accompany CBL systems utilising large case bases.Detision Tree ‘Learning
‘aithoush the order of both in
lear with the number
ip Althewtrablems usually lead to incre number of cases,
tngyced aystem performance ‘aged construction costs and
‘ge problems are less significant a
© ome faster and cheaper. ‘ the hardware components
jblem domain:
pynamnie ms may have difficult
a, Oe, ‘where they may be unable ees Sa ae
onplems are solved, since they are strongly bacon tne 8
probieady worked towards what
‘This may result in an outdated case base
Vandling noisy data:
r
a Parts of se ‘oblem “ may be irrelevant to'the problem
‘uccessful assessment of such noi
b. UPtation currently imposed on a CBL aystow weey neuen
Sine problem being unnecessarily stored mumerovs tines rhe
tase base because of the difference due othe noise.
In turn this implies inefficient storage and retrieval of cases.
automatic operation :
Ina CBL system, the problem domain is not fully covered.
f Hence, some problem situations can occur for which the system
has no solution.
¢, Insuch situations, CBL systems expect input from the user.
Gue5a2. | What are the applications of CBL ?
Applications of CBL:
1. Interpretation It is a process of evaluating situations / problems in
| some context (For example; HYPO for interpretation of patent laws
KICS for interpretation of building regulations, LISSA for interpretation
ofnon-destructive test measurements).
2, Classification : It is a process of explaining a number of encountered
symptoms (For example, CASEY for classification of auditory
impairments, CASCADE for classification of software failures, PAKAR
for causal classification of building defects, ISFER for classification of
facial expressions into user defined interpretation categories.
3. Design : It is a process of satisfying a number of posed constraints (For
example, JULIA for meal planning, CLAVIER for design of optimal
layouts of composite airplane parts, EADOCS for aircraft panels design).
4. Planning : It is a process of arranging a sequence of actions in time
(For example, BOLERO for building diagnostic plans for medical patients,
‘TOTLEC for manufacturing planning).
c
‘4
a
ee ee aSO »
9.29 (te
seta ___—_—_ PBT Ag
7 I ignosed probl
& Advi isa process fe TE TER). sme OF ea
le,
DECIDER for advising student
Que dS. ] What are major paradigms of machine learning
Answer
chine learning are =
Major paradigms of ma
1. Rote Learning: |
a Thereis one-to-one mapping from inputs to stored "epresentat,
Learning by memorization. ion,
» : and retrieval,
¢. There is Association-based storage
2 Induction : Machine learning use specific examples to reach gon, '
al
conclusions,
ing : Clustering is a task of grouping a set of objects in
& eat chectsin the same groupis similar to each other than wonha
is ‘ :
other group.
‘Analogy : Determine correspondence between two ditt, er
representations. ; ;
5. Discovery : Unsupervised i-e.,. specific goal not given,
6 Genetic algorithms : ; /
a. Genetic algorithms are stochastié search algorithms Which act on.
population of possible solutions. a
b. They are probabilistic seareh methods means that the states whieh
they explore-ere not determined solely by the properties.of the
problems.
7. Reinforcement:
a Inreinforcement only feedback (positive or negative reward) given
at end ofa sequence of steps.
b, Requires assigning reward tosteps by solving the credit assignment
problem which steps should receive credit or blame for a final result,
Que 3.34. | Briefly explain the inductive learning problem.
Answer
Inductive learning problem are :
Supervised versus unsupervised learning :
We want to learn an unknown function fix) =
example and y is the desired output.
b. Supervised learning implies we are given a set of (x, y) pairs by a
teacher.
© Unsupervised learning means we are only given the xs
d. Incither case, the goal is to estimate /.
, where x is an input
a.gem
pe tO Dein ee Learning
ae
Gi
+t
arning
4 eae of examples of te cone
5 Given example isan instanceof heonerge ey ermine
~ ept or not.
.
i 7 ‘
frit gam instances we calli a positive
1 not tical eg ame
cite concept learning by induction:
jul i it
a 9 Piven a training, set of positive and negative exa
oeetract a desritionthat wll earately Sty cheer ame
examples are positive or negative, et
vat i, learn some good estimate of function f given a training set
eraining se
b
(cl. YD (2,92) en» (en, yr)} wher
W negative)- re each y, is either + (positive) or
jain the relevs