Machine Learning
Introduction
What is Learning?
Learning denotes changes in the system that are adaptive in the sense that they
enable the system to do the same task or tasks drawn from the same population
more effectively the next time. -- Simon, 1983
Learning is making useful changes in our minds. -- Minsky, 1985
Learning is constructing or modifying representations of what is being experienced.
-- McCarthy, 1968
Learning is improving automatically with experience. -- Mitchell, 1997
CS 8751 ML & KDD 2
Can machines really learn?
• Definitions of “learning” from dictionary:
To get knowledge of by study, Difficult to measure
experience, or being taught
To become aware by information or
from observation Trivial for computers
To commit to memory
To be informed of, ascertain; to receive instruction
Operational definition:
Things learn when they change their behavior Does a slipper learn?
in a way that makes them perform better in
the future.
Does learning imply intention?
3
witten&eibe
Why Machine Learning?
• Data, Data, DATA!!!
• Examples
• World wide web
• Human genome project
• Business data (WalMart sales “baskets”)
• Idea: sift heap of data for nuggets of knowledge
• Some tasks beyond programming
• Example: driving
• Idea: learn by doing/watching/practicing (like humans)
• Customizing software
• Example: web browsing for news information
• Idea: observe user tendencies and incorporate
CS 8751 ML & KDD 4
Why Machine Learning ?
• Solving tasks that required a system to be adaptive
• Speech, face, or handwriting recognition
• Environment changes over time
• Understanding human and animal learning
• How do we learn a new language ? Recognize people ?
• Some task are best shown by demonstration
• Driving a car, or, landing an airplane
• Objective of Real Artificial Intelligence:
• “If an intelligent system–brilliantly designed, engineered and
implemented– cannot learn not to repeat its mistakes, it is not as
intelligent as a worm or a sea anemone or a kitten.” (Oliver 5
Selfridge)
What is Machine Learning?
• Optimize a performance criterion using example data or past
experience.
• Role of Statistics: Inference from a sample
• Role of Computer Science: Efficient algorithms to
• Solve the optimization problem
• Representing and evaluating the model for inference
Lecture Notes for E Alpaydın 2004 Introduction to Ma 6
chine Learning © The MIT Press (V1.1)
What is Machine Learning ?
• A computer program M is said to learn from experience E with
respect to some class of tasks T and performance P, if its
performance as measured by P on tasks in T in an environment Z
improves with experience E.
• Example:
• T: Cancer diagnosis
• E: A set of diagnosed cases
• P: Accuracy of diagnosis on new cases
• Z: Noisy measurements, occasionally misdiagnosed training cases
• M: A program that runs on a general purpose computer; the learner
7
Defining a Learning Problem
Learning = improving with experience at some task
• improve over task T
• with respect to performance measure P
• based on experience E
Ex 1: Learn to play checkers
T: play checkers
P: % of games won
E: opportunity to play self
Ex 2: Sell more CDs
T: sell CDs
P: # of CDs sold
E: different locations/prices of CD
CS 8751 ML & KDD 8
What is Machine Learning ?
• A computer program M is said to learn from experience E with
respect to some class of tasks T and performance P, if its
performance as measured by P on tasks in T in an environment Z
improves with experience E.
9
Kinds of Learning
• Based on the information available
• Association
• Supervised Learning
• Classification
• Regression
• Reinforcement Learning
• Unsupervised Learning
• Semi-supervised learning
• Based on the role of the learner
• Passive Learning
• Active Learning
10
Classification
Learn a method for predicting the instance class
from pre-labeled (classified) instances
Many approaches:
Regression,
Decision Trees,
Bayesian,
Neural Networks,
...
Given a set of points from classes
what is the class of new point ?
12
Linear Regression
Linear Regression
w0 + w1 x + w2 y >= 0
Regression computes
wi from data to
minimize squared
error to ‘fit’ the data
Not flexible enough
13
Classification: Decision Trees
if X > 5 then blue
else if Y > 3 then blue
Y else if X > 2 then green
else blue
2 5 X
14
Classification: Neural Nets
Can select more
complex regions
Can be more accurate
Also can overfit the
data – find patterns in
random noise
15
Analysis/Prediction Problems
• What kind of direct mail customers buy?
• What products will/won’t customers buy?
• What changes will cause a customer to leave a bank?
• What are the characteristics of a gene?
• Does a picture contain an object (does a picture of space contain a
metereorite -- especially one heading towards us)?
• … Lots more
CS 8751 ML & KDD Chapter 1 Introduction 16
Credit Risk Analysis
Data:
ProfitableCustomer=No, CustId=103, YearsCredit=9, LoanBalance=2400, Income=52,000, OwnHouse=Yes,
OtherDelinqAccts=2, MaxBillingCyclesLate=3
ProfitableCustomer=Yes, CustId=231, YearsCredit=3, LoanBalance=500, Income=36,000, OwnHouse=No,
OtherDelinqAccts=0, MaxBillingCyclesLate=1
ProfitableCustomer=Yes, CustId=42, YearsCredit=15, LoanBalance=0, Income=90,000, OwnHouse=Yes,
OtherDelinqAccts=0, MaxBillingCyclesLate=0
…
Rules that might be learned from data:
IF Other-Delinquent-Accounts > 2, AND
Number-Delinquent-Billing-Cycles > 1
THEN Profitable-Customer? = No [Deny Credit Application]
IF Other-Delinquent-Accounts == 0, AND
((Income > $30K) OR (Years-of-Credit > 3))
THEN Profitable-Customer? = Yes [Accept Application]
CS 8751 ML & KDD Chapter 1 Introduction 17