ML Unit-1

Machine learning unit 1 notes jntuh r18 for exam preparation

Uploaded by

cannotfindme41

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

270 views26 pages

ML Unit-1

Machine learning unit 1 notes jntuh r18 for exam preparation

Uploaded by

cannotfindme41

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 26

Introduction Introduction : Well Learning Problems Qt Define learning, 6&7 [INTU : Dec.-17, Marks 2] Ans. : Learning is a phenomenon and process which has manifestations of various aspects. Learning process includes gaining of new symbolic knowledge and development of cognitive skills through instruction and practice. It is also discovery of new facts and theories through observation and. experiment. Q.2 Define machine learning. Ans.: A computer program is said to leam from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Q.3 What is need of machine learning in this era? GSP (INTU : Dec.-16, Marks 3] ‘Ans. : # Main goal of machine learning is to devise learning algorithms that do the learning automatically without human intervention or assistance. * The machine learning paradigm can be viewed as "programming by example.” Another goal is to develop computational models of human learning process and perform computer simulations. * The goal of machine learning is to build computer systems that can adapt and learn from their experience. * Machine algorithms can figure out how to asks by generalizing from greater insights into their organizations. This adaptive technology is being used by global | enterprises to gain a competitive edge. Machine learning algorithms discover the relationships between the variables of a system (input, output and hidden) from direct samples of the system. Q.4 What are T, P, E ? How do we formulate a | machine learning problem ? | Ans.: + In general, to have a well-defined learning problem, we must identity these three features : the class of tasks, the measure of performance to be | improved, and the source of experience. ‘* A Robot Driving Leaming Problem 1, Task T : Driving on public, 4lane highway | using vision sensors. Performance measure P : Average distance traveled before an error (as judged by human i overseer). Training experience E : A sequence of images and steering commands recorded while i observing a human driver. + A Handwriting Recognition Learning Problem | 1 Task T : Recognizing and classifying i handwritten words within images 2. Performance measure P : Percent of words correctly classified. 3. Training experience E : A database of handwritten words with given classifications. * Text Categorization Problem 1, Task T : Assign a document to its content category. | 2 Performance measure P : Precision and Recall. 3. Training experience E : Example pre-classified | documents, ay Scanned with CamScannerlearning ? ‘Ans. + Following are some of the reasons 1. Some tasks cannot be defined well except by examples, For example + recognizing people. 2. Relationships and correlations can be hidden within Iarge amounts of data. To solve these problems, machine Jeamning and data mining ray be able to find these relationships. 4. Human designers often produce machines that do ‘not work as well as desired in the environments in which they are used. 4. The amount of knowledge available about certain tasks might be t0o large for explicit encoding by humans. 5, Environments change time to time. & New knowledge about tasks is constantly being discovered by humans. 6 List the phases of machine learning ? ‘Ans. : Typically follows three phases : 1. Training : A training set of examples of correct behaviour is analysed and some representation of the newly leant knowledge is stored. This is some form of rules. Validation : The rules are checked and, if wnecessary, additional training is given. Sometimes ‘additional test data are used, but instead, a jruman expert may validate the rules, or some other automatic knowledge - based component may be used. The role of the tester is often called the opponent. 3. Application : The rules are used in responding to some new situation. Q7 What is meant by machine learning ? What is its need to today's society ? Explain successful applications of machine learning ? UG LINTU : Dec.-17, Marks 10) ‘Ans.: + Examples of successful applications of machine leaming : : 1. Leaming to recognize spoken words. 2. Leaming to drive an autonomous vehicle. 3. Leaming to classify new astronomical structures. : eh Ing 4. Leaming to play world-ass backgamyy 5. Spoken language understanding. context of a limited domain, “iethn meaning of something uttered by v""™ine the extent that it can be clasteg i et fixed set of categories M0 one Face Recognition : # Face recognition task is effortlessly ang we recognize our friends, relative °Y 4 members, We also recognition by lookn - photographs. In photographs, they are in’, ug pose, hair styles, background light, may without makeup. makeup ay © We do it subconsciously and cannot exp, we do it. Because we can't explain tow we can't write an algorithm. We do My Face has some structure. It is not collection of pixel. It is symmetric stu ™ contains predefined components like = hk eye, ears. Every person face is a pattem com of a particular combination of the fren analyzing sample face images of a pena,” learning program captures the pattern oak e i person and uses it to recognize if a new as face or new image belongs to this specific not. es Petion «© Machine learning algorithm creates an optin soodel af the concept belng lend based'gr or past experience. oa Q.8 Explain the difference between mach learning and data mining. 7 [ } | In machine learning the | main goal is to learn a In data mining, the main | goal is to discover new | model, which can be used interesting information | to predict future events. which describes the cure | data set. | Te considered data as It considered data as _sseondary, primary. | Machine learning uses Data mining uses simple relatively complex and models ot local pattems global models. a emacs _-— Scanned with CamScannerae ee Only ance mpes ina milion of row somes ne Tolsmoneor fw | Tofnd ll intresting patterns which describe the | data set. carefully defined models, | this can be used to predict \ future events. 0.9 What learning ? ‘Ans.: The ingredients of machine leaming are as follows 1. Tasks : The problems that can be solved with machine leaming. A task is an abstract representation of a problem. The standard methodology in machine learning is to learn one task at a time. Large problems are broken into small, reasonably independent sub-problems that are learned separately and then recombined. » Predictive tasks perform inference on the current data in order to make predictions. Descriptive tasks characterize the general properties of the data in the database 2 Models : The output of machine learning. Different models are geometric models, probabilistic models, logical models, grouping and grading. = The model-based approach seeks to create a modified solution tailored to each new application. Instead of having to transform your problem to fit some standard algorithm, in model-based machine learning you design the algorithm precisely to fit your problem. = Model is just made up of set of assumptions, expressed in a precise mathematical form. These assumptions include the number and types of variables in the problem domain, which variables affect each other, and what the effect of changing one variable is on another variable. «Machine leaming models are classified as : Geometric model, Probabilistic model and Logical model. 3. Features : The workhorses of machine learning. A. good feature representation is central to achieving high performance in any machine learning task. faction starts from an initial set of derived values non redundant, are the Ingredients of machine facilitating yy ———aedeton Beneraization stepy eT leaning and + Feature 5 Q10 What Is an 4 Influens on machine learning? tern ing. Optimal codes and their relationship to op training sequences for timal encoding a hypothesis, Q41 What Is meant by tar get functi a learning program ? SS (anTy : baerié, Mae a ‘arget function is a to find. Once an algorithm finds its target function, that function can be used to predict results. The function can then be used to find output data related te inputs for real problems where, unlike training sets, outputs are not included. Q.12 Explain basic design issue and approaches of maciiine leaming, Ams. Designing a Leaming System Goal : Design a system to leam how to play checkers and enter it into the world checkers tournament. 1) Choose the training experience 2) Choose the target function 3) Choose a representation for the target function 4) Choose a function approximation algorithm Training Experience ‘* How training experience influences performance goal ? 1. Type of feedback : Direct vs Indirect 2, Learning strategy : Have a teacher or not ? Exploration vs Exploitation ? Scanned with CamScannerMachine Learning 3. Diversity of training : Is the training data representative of the task ? How many peers should we play with? How many tactics should we try when playing with self ? * Let us decide that our program will lear by playing with itself and formulate the leaming problem, * Choosing the training experience : 1, Direet or indirect feedback 2 Degree of leamer's control 3. Representative distribution of examples * Learning Goal is to : Define precisely a class of problems that forms interesting forms of learning, explore algorithms to solve such problems, understand fundamental structure of learning problems and processes. # Design choice 1: The problem for selecting type of training experience from which our system will Tear. Direct training examples. Just a bunch of board states together with a correct move. Design Choice 2 : Indirect training. A bunch of recorded games, where the correctness of the moves is inferred by the result of the game. Leaming is most reliable when the training examples follow a distribution similar to that of future test examples. Q43 How to choose and represent the target function ? Ans.:¢ It determines exactly what type of knowledge will be learned and how this will be used by the performance program. * Choosing a representation for the target function : 1. Expressive representation for a close function approximation 2. Simple representation for simple training data and learning algorithms VQ) = Wo tWyXy Hat W6Xe X,2 : Number of blacl/red pieces on the board X34: Number of black/red kings on the board Xs,¢ : Number of black/red pieces threatened can be captured on red/black next turn) * Consider the chess board program. —— ~~ litreg « ChooseMove : B-> M where B is any tg state and M is a legal move (hopefully 4, My Tegal move) function V : B -> Rwhich * Alternatively, whch ng B to some real value where higher sf assigned to better board states. =m @ Now use the legal moves to generate subsequent board state and use V to best one and therefore the best legal move, 1. V(b) = 100, if b is a final board state tha , 2. V(b) = 100, if bis a final board state thay Pe 3. V(b) = 0, if b is a final board state that is . nn 4, V(b) = V(b"), if bis not a final state where 5.” the best final board state starting pgs assuming both players play optimally ' # While this recursive definition specifies a yj, V(b) for every board state b, this definition i” usable by our checkers player because it ig efficiently computable. ™ the « For representation 1. Use a large table with an entry specifyn value for each distinct board state, : 2 Collection of rules that match against fetus the board state. 3. Quadratic polynomial function of predefne board features. Q.14 How to adjust the weights ? Ans. : © Choose the weights w; to best fit the set ¢ training examples. © Minimize the squared error E between the trin values and the values predicted by the hypothesis Es L Mersin (b)-(o))? (b-Virain(b))¢ training examples Require an algorithm that will incrementally reire weights as new training examples become available and it will be robust to errors in these estimated training values. * Least Mean Squares (LMS) is one such algorithn. Scanned with CamScannerIng 9.15 Define useful perspective on machi learning. rs: One useful perspective on machine learning is that it involves searchin very lay hypotheses to determine one that best fits the observed data and any per mice of Foe y the ‘Ans. learner Q.16 Describe the Issues in machine learning ? ‘Ans. + Issues of machine learning are as follows : « What leaming algorithms to be used ? «How much training data is sufficient ? «When and how prior knowledge can guide the learning process ? ‘« What is the best strategy for choosing a next training experience ? « What is the best way to reduce the learning task to one or more function approximation problems ? + How can the learner automatically alter its representation to improve its learning ability ? — | 1.4: Concept Learning and the General to Specific Ordering Q.17 What Is hypothesis ? ‘Ans.: * A hypothesis is a vector of constraints for each attribute 1. Indicate by a?" that any value is acceptable for this attribute 2. Specify a single required value for the attribute 3. Indication by a "O" that no value is acceptable If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive example (h(x) = 1). Q.18 What is the inductive learning hypothesis ? ‘Ans.: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. Q.49 Explain with example concept learning task. Ans. : + Inducing general functions from specific training examples is a main issue of machine learning. *Concept Leaming: Acquiring the definition of a general category from given sample positive and negative training examples of the category. ‘*Concept Leaming can be seen as a problem of searching through a predefined space of potential is that best fits the training examples. generalto-specific ordering of hypotheses, and the search can be advantage of a naturally occurring structure over the hypothesis space- Leaming: Inferring a boolean-valued function from training examples ning is the learning of bird-concept from the given examples of birds | Scanned with CamScannerMachine Leeming the definition of a concept from given examples. termining a mapping fom a set of IMP ductive learning methods. Jing data to corre ation. 1 We are trying tle t variables to a Boolean value, «Concept learning involves 4 ‘Such methods are known 8 it If a function can be found which well for unseen data. This process maps trai ct classifications, then it will also work is known as generaliz 1d enjoys his favor ter sport” fe : Leam the “days on which my frien ite water SPO + Exampl Water Forecast Enjoy Sport ‘Humidity Wind T Warm Same YES jormal Strong Strong, High St07E [Banal] iy [ ARTem | Sunny | Warm by six attributes. The task is to leam fo predict the of its attribute values. hypothesis found to approximate the target function well es will also approximate the target function well over + A set of example days, and each is described ‘value of EnjoySport for arbitrary day, based on the values «The inductive learning hypothesis : Any over a sufficiently large set of training examp! other unobserved examples. « Although the Ieaming task isto determine a hypothesis ( h) identical the target concept cover the rte eet of instances ( X), the only information available about ¢ is its value over the taining examples. «Inductive learning algorithms can at best guarantee that the output hypothesis fits the target concept over the training data. Lacking any further information, our assumption is that the best hypothesis regarding unseen instances is the hypothesis that best fits the observed training data. This is the fundamental assumption of inductive learning. « Hypothesis representation (constraints on instance attributes) : 1. Any value is acceptable is represented by ? 2. No value is acceptable is represented by ® Q20 Illustrate general to specific ordering of hypotheses in concept learning? US [INTU : Dec.-17, ‘Marks 5] ithms for concept learning organize the search through the hypothesis space by relying on ic ordering of hypotheses. ivantage of this naturally occurring structure over the hypothesis space, we can design ithms that exhaustively search even infinite hypothesis spaces without explicitly every hypothesis. + Anup trust for knowodige Scanned with CamScanner«Consider two hypotheses: ht = (Gunny, 2, 2, Strong, ?, ?) 2 = (Sunny, 2, 2, 2,7 2) «Now consider the sets of instances that are classified positive by hand by 42 fewer constraints on the instance, it classifies more instances as positive, Because ho — ‘ein fact, any instance classified positive that h2 is more general than hl. by hl will also be classified posit Positive by h2, There fore, we say One learning method is to determine the most specific hypothesis that matches all th ‘s More-General-Than-Or-Equal Relation : Let hl and h2 be two boolean-valued f a X. Then hi is more-general-than-or-equal-to h2 (written hl 2 2). If and only if ‘raining data, ‘ons defined over satisfies h2 also satisfies hl. any instance that ahi is more-general-than h2 ( hl > h2) if and only if hl 2 h2 is true and h2 > ht is false. 12 is more-specific-than hl. «We also say hy 2 hy i Vxe Xi hy X= 1 by O=1 ‘Specific ~ S \ | \ A VvYv hy = Lattice WRAY, a= x | \ AYN | | YN | General " , Fig, Q.20.1 1.5 : FIND-S Algorithm Q.21 Explain the key properties of FIND-S algorithm for concept learning with necessary example. ‘EB INT : Dec.-17, Marks 5] ‘Ans. : © The key property is that for hypothesis spaces described by conjunctions of attribute constraints, FIND-S is guaranteed to output the most specific hypothesis within Hi that is consistent with the positive training examples. «Its final hypothesis will also be consistent with the negative examples provided the correct target concept is contained in H, and provided the training examples are correct. sthm starts from the most specific hypothesis and generalize it by considering only re examples. As long as the hypothesis space contains 2 hypothe that , and the training data contains no errors, ignoring negative examples ost spcitc hypothesis within H tat is consent wit he FSS | Scanned with CamScannera Im 18 oe if the correct target cng Machine Leaning ©The final hypothesis will also be consistent with negative = H, and the training examples are correct. ‘Sky Air Temp Humidity | Wind _ / Samay Warm Normal Stor High _|_Stons Sunny Warm High a Rainy ‘Cold High Strong Coot =f | Sunny Warn High Strong, = <0,0,0, 0 07 oa h = Sunny, Warm, Normal, Strong, Warm, Sam h = h = Sunny, Warm, ?, Strong, ?, Forecast Enjoy Spoq~) Example 1 Algorithm : ‘Initialize h to the most specific hypothesis in H = © For each attribute training instance For each attribute constraint a in If the constraint is not satisfied by x. Then replace a, by the next more general constraint satisfied by x. Output hypothesis h Example | _-Humidity Air Temp GE [INTU : Dec.-16, Marks 10) Q.22 Describe hypothesis space search by FIND-S algorithm, * The FIND-S algorithm illustrates one way in which the more-general than partial ordering can be wed? igniter the search for an acceptable hypothesis. ©The search moves from hypothesis to hypothesis, searching from the most specific to progressiely more general hypotheses along one chain of the partial ordering. «The hypothesis space search performed by FIND-S. __ al — Scanned with CamScannerathe search begins (hg) with the most specific hypothesis in H, then considers in hypotheses (hy through hg) a8 mandated by the training examples, sin the insance space diagram postive tening examples are denoted by *.~ negative instances that have not been presented as ttalning examples are denoted by a slid ties aAt each stage the hypothesis is the most specific hypothesis consistent observed up to this point. Instances X Specific General Ng = <2, 8, 0, 8,2, a> x =,+ hy = | xp = , + ha = x= , ~ hy = xx = , + hg = Fig. 0.22.4 1.6 : Version Space and Candidate Elimination Algorithm Q.23 What Is version space ? ‘Ans. : * Version space : A set of all hypotheses that are consistent with the training examples. «The version space, denoted VS_H,D, with respect to hypothesis space H and training examples D, is the subset of from H consistent with the training examples in D. # A version space is a hierarchical representation of knowledge that enables you to keep track of all the useful information supplied by a sequence of learning examples without remembering any of the examples. The version space method is a concept learning process accomplished by managing multiple models within a version space. Q.24 Explain characteristics of version space. ‘Ans.: Characteristics : are represented using version spaces. alternative plausible descriptions of a heuristic. is applicable to all: known positive examples and no known: Scanned with CamScannerMachine Learning 4. A version space description consists of two ‘complementary trees : 4. One that contains nodes connected to overly | general models, and ii, One that contains nodes connected to overly specific models, 5. Node values/attributes are discrete. Q.25 Describe compact representation for version spaces. Ans. : * Instead of enumerating all the hypotheses consistent | with a training set, we can represent its most specific and most general boundaries. The | hypotheses included in-between these Wo | boundaries can be generated as needed. * Definition: The general boundary ( G) with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent ! with D. * Definition : The specific boundary (S) with respect | to hypothesis space H and training data D, is the set of minimally general (ie, maximally specific) | members of H consistent with D. | « Fig. Q25:1 shows general and specific hypothesis. | Most general hypothesis fs Empl ° I Q . ? Oo i + Examples Most specific. hypothesis Fig. 0.25.1 General and specific hypothesis + Each specialization must be a generalization of some specific concept description: No specialization ‘can be a specialization of another general concept description. Fig. Q.252 shows boundary set with i j | | G-seot Unda sats S-sot Most specific hypothesis Q.25.2 Boundary set with hypothes, Fig 0.26 What are advantages and disadvantage, 4 version space method ? Ans. ‘Advantages of the version space mothod: 1, Can describe all the possible hypotheses ny Janguage consistent with the data, 2. Fast (close to linear). Disadvantages of the version space method: 1. Inconsistent data (noise) may cause the tpy concept to be pruned. 2, Leaming disjunctive concepts is challenging Q.27 Describe List-Then-Eliminate Algorithm. © List-Then-Eliminate algorithm initializes ti version space to contain all hypotheses in H, th, eliminates any hypothesis found inconsistent wis any training example. The version space of candidate hypotheses ths shrinks as more examples are observed, uti ideally just one hypothesis remains that i consistent with all the observed examples. If insufficient data is available to narrow te version space to a single hypothesis, then te algorithm can output the entire set of hypothe consistent with the observed data. © List-Then-Eliminate algorithm can be appli whenever the hypothesis space H is finite. It many advantages, including the fact that it ® guaranteed to output all hypotheses consistent "#* the training data. a Scanned with CamScanner1s exhaustively enumerating. all hypotheses nue, —___Introduetion san esis spazces + Unfortunately, it = “nvealistic requirement for all but the most trivial hypoth 0.28 Define candidate ellmination algorithm. ‘Ans: The candidate-Elimination algorithm computes the version space contai hypotheses from H that are con: \didate-Elimination. sistent with an observed sequence of traning on a (and ony ho mp! 0.29 Write algorithm for Can Ans: Algorithm : Given: © A representation language. A set of positive and negative examples expressed in that language, Compute + a concept description that is consistent with all the positive examples and none of the negative examples. Method : «Initialize features are variables). e Initialize $, the set of maximally specific hypotheses, to contain one element : the first positive G, the set of maximally general hypotheses, to contain one element: the mull description (al example. « Accept a new training example. If the example is positive: 1. Generalize all the specific ‘The new specific models involve minimal changes. «Each new specific model is a specialization of some general model, is a generalization of some other specific model. ‘models that fail to match the positive example. models to match the positive example, but ensure the following : #No new specific model 2. Prune away all the general If the example is negative + 1. Specialize all general models to prevent ma following : «The new general models involve minimal changes. «Each new general model is a generalization of some specific model. «eNo new general model is a specialization of some other general model. 2. Prune away all the specific models that match the negative example. elf S and G are both singleton sets, then : if they are identical, output their value and halt. ing cases were inconsistent tch with the negative example, but ensure the t. Output this result and halt. ing examples. Scanned with CamScannerIntroa, in the language. 1-12 ~ . Machine Learsng str jtent descriPt 0 = No consis e convers® 1 = Answer (version SP re implicitly included, ge a 2h All descriptions in the langu™ ontaining all (and ony iples. Moy th example. ace «t 2.30 Explain candidate elimination algorithm version 5P82 © Ans.:+ The candidate-Elimination algorithm computes 1 araining &* hypotheses from H that are consistent with an observed annie car" «Example : Learning the concept of “Japanese Eco! cade, TYPE ‘+ Features : Country of Origin, Manufacturer, j_Ontsin Color, ‘Example Type Positive Manufacturer Honda Positive Example 1: (Japan, Honda, Blue, 1980, Economy) ‘Initialize G to a singleton set that includes everything. G= (07220) ‘Initialize S to a singleton set that includes the first positive example. S = { Japan, Honda, Blue, 1980, Economy) } © Negative Example 2 : (Japan, Toyota, Green, 1970, Sports) * Specialize G to exclude the negative example. G = { (2, Honda, ?, 2, 2), (?, ?, Blue, ?, 2), (2, 2, 2, 1980, 2), (2, 2, 2 2, Economy) } S = { (apan, Honda, Blue, 1980, Economy) } * Positive Example 3 : (Japan, Toyota, Blue, 1990, Economy) © Prune G to exclude descriptions inconsistent with the positive example. G = ((?, 2, Blue, 2, 2), (2, 2, 2, ?, Economy) } * Generalize S to include the positive example : 5 = (Japan, ?, Blue, ?, Economy) } + Negative Example : (USA, Chrysler, Red, 1980, Economy) * Specialize G to exclude the negative example (but stay consistent with S) G = (22, Blue, 2, 2), Japan, 2,2, 2, Economy) } S = { Qapan, ?, Blue, ?, Economy) } Honda, Red, 1990, Economy) Scanned with CamScannerearning a ie * Example is inconsistent with the veston-epa lon-space, G cannot be specialized, S cannot be generalized, ‘* The version space collapses, * Conclusion : No conjunctive hypothesis is consistent with th data set 1 [7 nea 'B algorithm L for the set of = (&, e(x)>} be an Q31 What Is an Inductive bias ? Ans.: *Consider a concept learnin, defined over X, and let De ; assertions B su corresponding training examples De the following formula hola "at ®°Y treet concept c and (¥x; ENB D.AIE LO, DI Q.32 Explain fundamental Property of Inductive Inference. Ans.: *A leamer that makes no a priori assumpti ing the i iptions regarding the ident rational basis for classifying any unseen instances. 6 NS SREY OF Ue age concept bas ro * Inductive Leap : A leamer should be able to generalize trainin; data using prior assumptions i to classify unseen instances. ° Sicilia abies * The generalization is known as inductive leap and our prior assumptions are the inductive bias ofthe learner, ¢ Inductive Bias (prior assumptions) of Candidate-Elimination Algorithm is that the target concept can be represented by a conjunction of attribute values, the target concept is contained in the hypothesis space and training examples are correct. Q.33 Explain with example inductive bias. ‘Ans. : © The Candidate-Elimination algorithm will converge toward the true target concept provided itis given accurate training examples and provided its initial hypothesis space contains the target concept. What if the target concept is not contained in the hypothesis space ? # Can we avoid this difficulty by using a hypothesis space that includes every possible hypothesis ? m i ize to «How does the size of this hypothesis space influence the ability of the algorithm to generalize d instances ? init be space influence the number of training examples hat must only conjunctions of ate ; : oe ; sted the hypothesis space to include resent even simple disunciv® the hypothesis space is unable to TP or Sky = Cloudy.” Scanned with CamScannerCoo! From first two examples : $2 : * This is inconsistent with third examples, and there are no hypotheses consistent with these three examples PROBLEM : We have biased the learner to consider only conjunctive hypotheses. We require a more expressive hypothesis space. * The obvious solution to the problem of assuring that the target concept is in the hypothesis space H is to provide a hypothesis space capable of representing every teachable concept. Q.34 Which are the three learning algorithms from weakest to strongest bias ? ‘Ams.: *ROTE-LEARNER : Learning corresponds simply to storing each observed training example in memory. Subsequent instances are classified by looking them up in memory. If the instance is found in memory, the stored classification is retumed. Otherwise, the system refuses to classify the new instance. Inductive Bias : No inductive bias * CANDIDATE-ELIMINATION : New instances are classified only in the case where all members of the current version space agree on the classification. Otherwise, the system refuses to classify the new instance. Inductive Bias : the target concept can be represented in its hypothesis space, *FIND-S : This algorithm, described earlier, finds the most specific hypothesis consistent with the training examples. It then uses this hypothesis to classify all subsequent instances. Inductive Bias: the target concept can be represented in its hypothesis space, and all instances are negative instances unless the opposite is entailed by its other knowledge. 1.8 : Decision Tree Learning : Introduction and Representation Q.35 What is decision tree ? Ans. Decision tree Jearning is a method for approximating discrete-valued target functions, in which the learned function is represented by a decision tree. 4 decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a decision(rule) and each leaf represents an outcome(categorical or continues value). on tree of a classification tree isa tree in which each internal node is labeled with an input The arcs coming from a node labeled with a feature are labeled with each of the possible the feature. nodes of decision tree ? tree has two kinds of nodes An up thrust for knowledge Scanned with CamScanner* Build a decision tree for classifying Positive or negative instances of a * Supervised leaming, batch Processing of training examples, using a preference bias, * A decision tree is a tree where a. Each non-leaf nod attribute (feature) » Each leaf node has associated with it a classification (+ or -) © Each are has associated with it one of the Possible values of the attribute at the node from which the arc is directed, * Internal node denotes a test on Tepresents an outcome of Tepresent class labels or class A decision tree is a flow. where each node denote: value, an attribute. Branch the test. Leaf nodes distribution, -chart-like tree structure, "Sa test on an attribute each branch represents an outcome of the fest, and tree leaves represent classes or class distributions. Decision trees can easily be converted 2.38 Write decision tree algorithm. Ans. i+ To generate decision tree from the taining tuples of data partition D. Input : 1. Data partition ( D) class then labeled with the turn N as a leaf ss in D (D, attribute list) nm; Introduction 8 Then attribute list > attribute list > splitting attribute For (each outcome j of spliting criterion ) Let D, be the set of data tu outcome j; If D, is empty then attach a leat | majority class in D to node N; 12 Else attach the node retumed by Generate decision tree (Dj , atribute list) to node N; 13. End of for loop 14. retum N; ples in D satisfying 1 labeled with the Q.39 Describe characteristics for ‘appropriate Problems for decision tree learning. Ans, : « Decision tree learning is generally best suited to problems with the following characteristics : 1. Instances are represented by attribute-value Pairs. Fixed set of attributes, and the attributes take a small number of disjoint possible values, 2. The target function has discrete output values, Decision tree leaming is appropriate for a boolean classification, but it easily extends to Jearning functions with more than two possible output values, Disjunctive descriptions may be required. Decision trees naturally represent disjunctive expressions, 4 The training data may contain errors. Decision tree leaming methods are robust to errors, both errors in classifications of the training examples and errors in the attribute values that describe these examples, 5. The training data may contain missing attribute values. Decision tree methods can be used even When some training examples have unknown values. 6 Decision tree learning has been applied to Problems such as leaming to classify Q.40 Define the following : a. Input variable b. Leaf node c. Intemal nodes d. Depth Ans. : a. Input variable : Each member of the set ( x1, xp, + Xn} is called an input variable. b. Leaf mode ; A node without further branches is called a leaf node. The leaf nodes return class Scanned with CamScannerlabels and, in some implementations, they return the probability scores. €. Internal nodes are the decision or test points. Each internal node refers to an input variable or an attribute. The top internal node is called the root. 4. Depth : The depth of a node is the minimum number of steps required to reach the node from the root. Q41 List the advantages and disadvantages of decision tree. ‘Ans. : Advantages : 1. Rules are simple and easy to understand. 2. Decision trees can handle both nominal and numerical attributes. 3. Decision trees are capable of handling datasets that may have errors. 4. Decision trees are capable of handling datasets that may have missing values. 5. Decision trees are considered to be a nonparametric method. 6. Decision trees are self-explanatory. Disadvantages : 1. Most of the algorithms require that the target attribute will have only discrete values. 2. Some problem are difficult to solve like XOR. 3. Decision trees are less appropriate for estimation tasks where the goal is to predict the value of a continuous attribute. 4. Decision trees are prone to errors in classification problems with many class and relatively small number of training examples. 2.42 How to evaluate a decision tree ? ‘Ans. © A decision tree can help you make tough ths and outcomes, but 1 correctly. models of possible ible outcomes in a trée wn as "branches" off a decision tree to help . Decision tree is evaluated as follows : ‘with domain experts, actermine if the decision rules are sound «Next, look at the depth and nodes of the ‘tree Having too many layers and obtaining nodes with few members might be signs of overfitting, the model fits the training set well, «In overfitting, the new samples in the ‘but it performs poorly on testing set. + Overfitting occurs when describes random error or nol underlying relationship. « Overfitting is when a classifier fits the training data too tightly. Such a classifier works well on the training data but not on independent test data. It is a general problem that plagues all machine learning, methods. ‘« Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. a. statistical model ise instead of the « To prevent over-fitting we have several options : 1. Restrict the number of adjustable parameters the network has - e.g. by reducing the number of hidden units, or by forcing connections to share the same weight values. 2. Stop the training early, before it has had time to learn the training data too well. . Add some form of regularization term to the error/cost function to encourage smoother network mappings. 4. Add noise to the training patterns to smear out the data points. 1.9: Basic Decision Tree Learning Algorithm Q.43 Define information gain. Ans.:¢ Entropy measures the impurity of a collection. Information Gain is defined in terms of Entropy. Information gain tells us how important a given attribute of the feature vectors is. Scanned with CamScannerGain(S, A) = Entropy(s)— Sy [5s lentopy(s, vevatuesiay [S| where Values (A) for attribute A which attribute A has Q.44 What Is the role of information gain in decision tree leaning ? ) is the set of all possible values and Sy is the subset of $ for value v USP LINTU : Dee.-16, Marks-3) Ans. + © Information gain measures how well a given attribute separates the training examples according to their target classification *1D3 uses this information gain measure to select among the candidate attributes at each step while growing the tree, + Entropy is used for measuring information gain, + Putting together a decision tree is all a matter of choosing which attribute to test at each node in the tree. + We shall define a measure called information gain ‘which will be used to decide which attribute to test at each node. ‘Information gain is itself calculated using a measure called entropy, which we first define for the case of a binary decision problem and then define for the general case Q45 Explain Gini Index and Entropy of dicision tree algorithm. Ans. : * One of the decision tree algorithms is CART ificatic Tree). decision or target decision tree is decision or target the decision tree is for building both Decision Trees. The een —~——____[ntraduction CART is Gint Index. The decision tree built by CART algorithm is always a binary decision tree. *Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset, * Gini index, entropy and twoing rule are some of the frequently used impurity measures. * Gini Index for a given node t : GINI() = pGlN1-pG ly) = Dpgin? i i Maximum of 1-I/n< (number of classes) when records are equally distributed among all classes = maximal impurity. * Minimum of 0 when all records ‘belong to one class = complete purity. * Entropy at a given node by : Entropy (1) =D PGI) log pil) i * Maximum (log n.) when records are equally distributed among all classes(maximal impurity). * Minimum (0.0) when all records belongs to one class (maximal purity). * Entropy is the only function that satisfies all of the following three properties 1. When node is pure, measure should be zero 2. When impurity is maximal (Le. all classes equally likely), measure should be maximal 3. Measure should obey multistage property * When a node p is split into k partitions (children), the quality of the split is computed as a weighted sum : " GNou = 2 St Ging) = S pgiv? ® i where nj = number of records at child i, and n = number of records at node p. + A problem with all impurity measures is that they depend only on the number of (training) patterns of different classes on either side of the hyperplane. Thus, if we change the class regions without changing the effective areas of class regions on Scanned with CamScannerinit) ay ine) = [Bem Fig, 45.4 either side of a hyperplane, the impurity measure of the hyperplane will not change. * Thus the impurity measures do not really capture the geometric structure of class distributions, Also, all the algorithms need to optimize on some average of impurity of the child nodes and often it is not clear what kind of average is proper. Q.48 Which attribute is best ? How to select best attributes ? Ans: + We would like to select the attribute that is ‘most useful for classifying examples. Information gain measures how well a given attribute separates the training examples according to their target classification, © 1D3 uses this information gain measure to select among the candidate attributes at each. step while growing the tree. ‘mn order to define information gain precisely, we use a measure commonly used in information theory, called entropy © Entropy characterizes the impurity of an arbitrary collection of examples. * Putting together a decision tree is all a matter of to test at each node in the called information gain which attribute to test lated using a measure first define for the case of and then define for the tion, C, and a set of yportion of examples and_ the proportion of examples categorized as negative by Cis p-, then the entropy of S is : Entropy(S) = ~P+ logz (p+ )-P—log2(P-) Where Sis a sample of training examples pt is the proportion of positive examples p* is the proportion of negative examples ‘* Imagine having a set of boxes with some balls in. If all the balls were in a single box, then this would be nicely ordered, and it would be extremely easy to find a particular ball. «If, however, the balls were distributed amongst the boxes, this would not be so nicely ordered, and it might take quite a while to find a particular ball. ‘© If we were going to define a measure based on this notion of purity, we would want to be able to calculate a value for each box based on the number of balls in it, then take the sum of these as the overall measure. © We would want to reward two situations: nearly empty boxes, and boxes with nearly all the balls in. This is the basis for the general entropy measure, which is defined as follows. * Given an arbitrary categorization, C into categories Cy sr Gq and a set of examples, S, for which the Proportion of examples in c; is pi, then the entropy of Sis: Entropy(S) = Pa Jog. (Pi) Q.47 Explain ID3 algorithm. ‘Ans. : © The calculation for information gain is the most difficult part of this algorithm, * ID3 performs a search whereby the search states are decision trees and the operator involves adding a node to an existing tree. It uses information gain to measure the attribute to put in each node, and performs a greedy search using this measure of worth, Scanned with CamScannercategorised in categories c,, then : the highest for information gain relative ii TESy contains ont Attribute A~ ‘scores highest for gain (8, A) Attribute B = scores highest, for gain (SA) Leaf node Category ¢ 8, must conta Deraut ‘only: —e in leaf node d ‘ategorye S, mest have no ‘examples iaing value x for attribuse 8, and d ‘must be the category ‘containing the most, members of S,, Fig. Q47.1 a decision tree using the following Instances : Scanned with CamScanner‘Machine Learning yp (Prennis) Fi we 10g (Peinema | Ans. : Entropy(S) = ~ Peinema!982 (Pein pany. in 10821 Peay) tennis 82 Papp 9821 Peers) = = (10) » log (610)-2/00)* 1°82 )--3322 = 0) +0737 -(2/10)*-2922 “a 322 + 0.3922 = 1571 ‘@/t0)-(i/10)* Joga (Y/10)-(1/10)» log, thy —(Y10) *-3.322, = 0.4422 + 0.4644 + 0. and we need to determine the best of : Gain(S, weather) = 1.571-(/Syun //10) * Entropy! in ) = (Srain (10) * Entropy(Srain . = 1.571-(03) « Entropy(Spun)(0-4)* Entropy(S wind )~(0-3) * Entropy(s,,.,) 1571-(03) +(0.918)-(04) * (0.81125)-(03) * (0.918) = 0.70 Gain(, parents) = 1.571-(Syes/10) * Entropy( Ses) ~ (Sno 10) * Entropy(Sno) = 1571-(05) * 0-(0.5)* 1.922 = 1571 - 0.961 = 0.61 Gain(S, money) = 1.571-(S,in /10) * Entropy Si) - (Spoor |/19) * Entropy(Spoor) 1.571-(0.7) * (1.842)-(0.3)*0 = 1.571 - 1.2894 = 0.2816 This means that the first node in the decision tree will be the weather attribute. From the weite node, we draw a branch for the values that weather can take: sunny, windy and rainy : * Now we look at the first branch. Saunay ~ (W1, W2, WI0}. This is not empty, so we do not put a default categorization leaf node here. *The categorizations of WI, W2 and W10 are Cinema, Tennis respectively. As these are not all Fig. 0.48.1 cannot put a categorization leaf we put an attribute node here, (Spun) ~ (Swind 10) * Entropy(Sying) | this is not empty, and they do D the same class, so we put an left blank for now. The same with the third branch, hence looks like this : Scanned with CamScannerATR eIn effect, we are interested only in this part of the table >= Weekend (Example) Hence we can calculate : Gain(Sganny Parents) = 0.918-Syeg | \sp« Entropy(Sye,)=(s, = 0918-(/8) 0-@/2)+ 0= 0918 GainGaamy money) = 0918-8, 15D * EDOPYSyen)~ (Seu /S)» Ent = 0918-(/8) * 0918-(93}+0 ~ 0818- ajay v0 8) * Entropyis,,) PIS poor) *0918-(0/3)+0~ 0518-9519 09 41.10 : Hypothesis Space Search in Decision Tree Learning Q.49 Discuss hypothesis space search in decision tree learning. Ans. : + The hypothesis space searched by IDS is the set of possible decison tres, » IDS i complex, hill-climbing search through this hypothesis space. rean eae * Begins with the empty tree, then considers progressively more elaborate hypotheses in search of a decision tree that correctly classifies the training data. * The information gain measure guides the hill-climbing search. + Hypothesis Space : Set of possible decision trees Search Method : Simple-to-Complex Hill-Climbing Search (only a single current hypothesis is maintained (from candidate-elimination method)). No Be

ML Unit-1

Uploaded by

ML Unit-1

Uploaded by

You might also like