Unit 3
Definition of learning, Forms of learning, learning
by taking advice, Learning in problem solving,
Induction learning
Prof. Pranjal Pandit
Contents
• Knowledge representation
• Types of knowledge
• Rules, Rule based expert system.
• Inference: Backward chaining, Forward chaining, Rule value approach,
Inference engine.
• Planning: Goal Tree, Non-linear planning, Hierarchical planning, Goal
stack planning
• Definition of learning, Forms of learning, learning by taking advice,
Learning in problem solving, Induction learning
• Expert systems - Architecture of expert systems, Roles of expert
systems, Knowledge Acquisition
Learning
A standard, widely used formal definition (Tom Mitchell, 1997) is:
A program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.
Plain language: learning = improvement on a task produced by experience/data,
measured by some performance metric.
Example: a spam filter (T = classify emails, P = classification accuracy) improves as it sees
labelled emails (E).
Key ingredients in the definition:
• Task (T) — what the system is expected to do (classification, control, planning).
• Experience (E) — data, interactions, demonstrations, rewards, etc.
• Performance measure (P) — accuracy, average reward, time to solution, etc.
Forms of learning
1. By type of feedback
A) Supervised learning — labeled examples (input → correct output).
Example: image classification, regression.
B) Unsupervised learning — no labels; discover structure (clusters,
manifolds).
Example: k-means clustering, PCA.
C) Reinforcement learning (RL) — scalar reward signal through
interaction; learn policy to maximize long-run reward.
Example: game playing, robotic control.
2. By interaction style
• Batch learning —
learner sees a
dataset and trains
offline.
• Online (incremental)
learning — model
updates continuously
as new data arrives.
• Active learning —
learner queries an
oracle (asks for
labels) for the most
informative
instances.
3. By representation and method
• Instance-based (lazy) — store examples and use them at query time (k-
NN).
• Model-based (parametric) — build a compact model (linear regression,
neural network).
• Symbolic vs subsymbolic — rules/logic vs neural nets.
4. Hybrid and advanced forms
A) Semi-supervised learning — mix of labeled and unlabeled data.
Self-supervised learning model
B) Self-supervised learning — create supervisory signals from the data
itself (predict missing parts).
C) Imitation learning / Learning from demonstration (LfD)
— learn policies from demonstrated trajectories.
D) Transfer learning / Meta-learning — reuse knowledge
from prior tasks to speed learning on new ones.
5. By goal
• Classification / Regression — predict labels or continuous values.
• Clustering / Dimensionality reduction — discover structure.
• Policy learning / Planning — produce actions or plans.
Learning by taking advice
The learner receives advice (hints, rules, demonstrations, corrective feedback)
from a teacher/mentor/oracle and uses it to speed or guide learning.
Modes of advice
• Demonstrations: teacher shows correct behavior (e.g., driving traces →
imitation learning / behavior cloning).
• Corrective feedback / evaluative advice: teacher gives “good/bad” signals
for actions (e.g., TAMER-style human feedback).
• Hints / constraints / rules: symbolic advice like “avoid region X” or
“variable Y is important”.
• Policy advice / demonstrations for bootstrapping: initial policy from
teacher, then refine via RL.
Algorithms / approaches
• Behavioral cloning: treat demonstrations as supervised learning (state
→ action).
• Inverse Reinforcement Learning (IRL): infer the teacher’s reward
function from demonstrations.
• Interactive RL / Reward shaping: use teacher’s evaluative feedback
to shape rewards or policies.
• DAGGER (Dataset Aggregation): combines learner’s trajectories
with teacher corrections to avoid compounding errors.
Advantages
• Faster learning; avoids random/exploratory mistakes.
• Transfers human expertise directly.
• Useful when exploration is costly/dangerous.
Pitfalls & cautions
• Bias / suboptimal advice: poor advice can mislead the learner.
• Over-reliance: learner may fail to generalize beyond advice scope.
• Inconsistency: conflicting advice complicates learning (needs mechanisms
to weigh trust).
• Practical example: a human steering a drone for a few flights
(demonstrations) — the agent clones the behavior, then refines with RL
using a simulator.
Learning in problem solving
Learning can be integrated with classical search and problem-solving to make future
solutions faster or better.
Main ideas
• Learn better heuristics: use past solved problems to learn a heuristic function
h(state) that guides search (A*, IDA*). Example: learning pattern databases for
the 15-puzzle.
• Learn macro-operators / chunks: create higher-level actions (macros) that
collapse repeated subplans into single operators to speed future planning.
• Explanation-based learning (EBL): from a solved example + domain theory,
derive a general rule that makes solving similar problems trivial.
• Case-based reasoning (CBR): store solved problem cases and adapt their
solutions to new similar problems.
• RL for problem solving: learn policies that map problem states to actions (e.g.,
using Q-learning for maze navigation).
Example (8-puzzle):
• Experience: solve many random puzzles via search.
• Learn: estimate of distance-to-goal for common subpatterns (heuristic
table).
• Result: future search cut drastically because heuristic is more informed.
Why it helps
• Reduces search branching; speeds solution time; may enable solving
problems that were infeasible before.
Tradeoffs
• Time + memory spent to learn/store heuristics or cases.
• Risk of over-specialization to seen problems (less generalization).
Induction learning
• What is induction?
Induction is the process of forming general rules/hypotheses from
specific observed examples. In machine learning we typically induce
models (hypotheses) that generalize from training examples to
unseen instances.
Contrast with deduction
• Deduction: apply general rules to derive conclusions about specific
cases.
• Induction: infer general rules from specific observations.
Core components
• Hypothesis space (H): all candidate functions/models the learner
considers.
• Training examples: labeled instances used to evaluate hypotheses.
• Search/selection mechanism: algorithm that finds a good hypothesis
(e.g., minimize training error + regularization).
• Inductive bias: assumptions (e.g., simplicity, smoothness) that let us
prefer some hypotheses over others so generalization is possible.
Classic algorithms & ideas
• Decision trees (ID3/C4.5): induce tree structure from examples using information
gain.
• Linear models: learn weights to fit data (perceptron, logistic regression).
• Nearest neighbour: a lazy induction—use stored examples to classify new points.
• Version space learning: maintain the set of hypotheses consistent with examples;
general-to-specific boundary. (Useful as a conceptual model.)
• Statistical learning theory / PAC: formalizes conditions under which induction
generalizes (sample complexity, VC dimension).
Bias-variance & overfitting
• Overfitting: hypothesis fits training data too closely and fails to generalize.
• Underfitting: hypothesis too simple to capture patterns.
• Principles to manage: regularization, cross-validation, choosing model complexity,
more data.
Simple illustration
Suppose we want to learn isBird(x) from attributes:
• Examples: {(feathers=True) → Bird=True, (feathers=False, swims=True) →
Bird=False, ...}
• Induction might produce rule: isBird(x) := has_feathers(x).
Algorithmic approach: search hypothesis space of logical rules or decision trees;
pick rule with best generalization.
Key points about induction
• It always requires bias — no purely “data-only” method can generalize without
assumptions.
• Quality of induced model judged by generalization (test performance), not
training fit alone.
• Inductive methods range from symbolic (rules, trees) to statistical (probabilistic
models, neural nets).