0% found this document useful (0 votes)
123 views39 pages

Lecture1 - Introduction To Machine Learning

Uploaded by

Packet Mancer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views39 pages

Lecture1 - Introduction To Machine Learning

Uploaded by

Packet Mancer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to Machine Learning

(CSCI-UA.0480-002)

David Sontag
New York University

Slides adapted from Luke Zettlemoyer, Pedro Domingos, and


Carlos Guestrin
Logistics
•  Class webpage:
–  [Link]
–  Sign up for mailing list!

•  Office hours:
–  Tuesday 3:30-4:30pm and by appointment.
–  715 Broadway, 12th floor, Room 1204

•  Grader: Jinglun Dong


–  Email: jinglundong@[Link]
Evaluation
•  About 7 homeworks (50%)
–  Both theory and programming
–  See collaboration policy on class webpage

•  Midterm & final exam (45%)

•  Course participation (5%)


Prerequisites
•  Basic algorithms (CS 310)
–  Dynamic programming, algorithmic analysis
•  Linear algebra (Math 140)
–  Matrices, vectors, systems of linear equations
–  Eigenvectors, matrix rank
–  Singular value decomposition
•  Multivariable calculus (Math 123)
–  Derivatives, integration, tangent planes
–  Lagrange multipliers
•  Probability (Math 233 or 235)
Source Materials
Optional textbooks:

•  C. Bishop, Pattern Recognition and Machine


Learning, Springer, 2007

•  K. Murphy, Machine Learning: a Probabilistic


Perspective, MIT Press, 2012
A Few Quotes
•  A breakthrough in machine learning would be worth ten
Microsofts (Bill Gates, Chairman, Microsoft)

•  Machine learning is the next Internet (Tony Tether, former


director, DARPA)

•  “Machine learning is the hot new thing (John Hennessy, President,


Stanford)

•  Web rankings today are mostly a matter of machine


learning (Prabhakar Raghavan, former Dir. Research, Yahoo)

•  Machine learning is going to result in a real revolution (Greg


Papadopoulos, former CTO, Sun)

•  Machine learning is today’s discontinuity (Jerry Yang, former


CEO, Yahoo)
What is Machine Learning ?
(by examples)
Classification

from data to discrete classes


Spam filtering
data prediction

Spam
vs.
Not Spam
Object detection

Example training images


for each orientation

10 ©2009 Carlos Guestrin


Weather prediction
Regression

predicting a numeric value


Stock market
Weather prediction revisted

Temperature

72° F
Ranking

comparing items
Web search
Given image, find similar images

[Link]
Collaborative Filtering
Recommendation systems
Recommendation systems
Machine learning competition with a $1 million prize
Clustering

discovering structure in data


Clustering Data: Group similar things
Clustering images

Set of Images

[Goldberger et al.]
Clustering web search results
Embedding

visualizing data
Embedding images

•  Images have
thousands or
millions of pixels.

•  Can we give each


image a coordinate,
such that similar
images are near
each other?

26 ©2009 Carlos Guestrin


[Saul & Roweis ‘03]
Embedding words

[Joseph Turian]
Embedding words (zoom in)

[Joseph Turian]
Structured prediction

from data to discrete classes


Speech recognition
Natural language processing

I need to hide a body


noun, verb, preposition, …
Growth of Machine Learning
•  Machine learning is preferred approach to
–  Speech recognition, Natural language processing
–  Computer vision
–  Medical outcomes analysis
–  Robot control
–  Computational biology
–  Sensor networks
–  …
•  This trend is accelerating
–  Improved machine learning algorithms
–  Improved data capture, networking, faster computers
–  Software too complex to write by hand
–  New sensors / IO devices
–  Demand for self-customization to user, environment
Supervised Learning: find f
•  Given: Training set {(xi, yi) | i = 1 … n}
•  Find: A good approximation to f : X  Y
Examples: what are X and Y ?
•  Spam Detection
–  Map email to {Spam, Not Spam}
•  Digit recognition
–  Map pixels to {0,1,2,3,4,5,6,7,8,9}
•  Stock Prediction
–  Map new, historic prices, etc. to ! (the real numbers)
Example: Spam Filter
Dear Sir.
•  Input: email
•  Output: spam/ham First, I must solicit your confidence in this
transaction, this is by virture of its nature
•  Setup: as being utterly confidencial and top
–  Get a large collection of secret. …
example emails, each
labeled spam or ham
TO BE REMOVED FROM FUTURE
–  Note: someone has to hand MAILINGS, SIMPLY REPLY TO THIS
label all this data! MESSAGE AND PUT "REMOVE" IN THE
–  Want to learn to predict SUBJECT.
labels of new, future emails
99 MILLION EMAIL ADDRESSES
FOR ONLY $99
•  Features: The attributes used to
make the ham / spam decision
Ok, Iknow this is blatantly OT but I'm
–  Words: FREE! beginning to go insane. Had an old Dell
–  Text Patterns: $dd, CAPS Dimension XPS sitting in the corner and
–  Non-text: SenderInContacts decided to put it to use, I know it was
–  … working pre being stuck in the corner, but
when I plugged it in, hit the power nothing
happened.
Example: Digit Recognition
•  Input: images / pixel grids
0
•  Output: a digit 0-9
•  Setup:
–  Get a large collection of example 1
images, each labeled with a digit
–  Note: someone has to hand label all
this data!
–  Want to learn to predict labels of new, 2
future digit images

•  Features: The attributes used to make the


digit decision 1
–  Pixels: (6,8)=ON
–  Shape Patterns: NumComponents,
AspectRatio, NumLoops ??
–  …
Important Concepts
•  Data: labeled instances, e.g. emails marked spam/ham
–  Training set
–  Held out set (sometimes call Validation set)
–  Test set
•  Features: attribute-value pairs which characterize each x
Training
•  Experimentation cycle Data
–  Select a hypothesis f to best match training set
–  (Tune hyperparameters on held-out or validation set)
–  Compute accuracy of test set
–  Very important: never peek at the test set!
•  Evaluation
–  Accuracy: fraction of instances predicted correctly Held-Out
Data
•  Overfitting and generalization
–  Want a classifier which does well on test data
–  Overfitting: fitting the training data very closely, but not
generalizing well Test
–  We’ll investigate overfitting and generalization formally in a Data
few lectures
A Supervised Learning Problem
•  Consider a simple, Dataset:
Boolean dataset:
–  f : X  Y
–  X = {0,1}4
–  Y = {0,1}

•  Question 1: How should


we pick the hypothesis
space, the set of possible
functions f ?
•  Question 2: How do we
find the best f in the
hypothesis space?
Most General Hypothesis Space
Consider all possible boolean functions over four
input features!
Dataset:
• 216 possible
hypotheses

• 29 are consistent


with our dataset

• How do we
choose the best
one?
A Restricted Hypothesis Space
Consider all conjunctive boolean functions.

Dataset:
• 16 possible
hypotheses

• None are
consistent with our
dataset

• How do we
choose the best
one?

You might also like