0% found this document useful (0 votes)

24 views25 pages

Lecture 2.2 Example Data Preparation Feature Engineering

The document provides an overview of machine learning, explaining its goal of learning patterns from examples and generalizing them to new instances. It distinguishes between supervised and unsupervised learning, detailing the processes involved in model fitting, including data splitting, tuning, and evaluation. Additionally, it discusses various algorithms used in supervised learning and emphasizes the importance of avoiding overfitting while maximizing model performance.

Uploaded by

revaldochetie092

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views25 pages

Lecture 2.2 Example Data Preparation Feature Engineering

Uploaded by

revaldochetie092

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Feature Engineering

What is Machine Learning?

Simple
How machines learn rules from examples.
definition:

Goal of any machine learning:

• Learn patterns from examples

• Be able to generalize them to new examples

Supervised and unsupervised machine learning:

 In both cases learning is achieved through examples!

What is Machine Learning?
A program is said to learn from experience E with regard to
Formal
task T and performance measure P, if its performance on task T
definition:
improves with experience.

# Task Experience Performance Measure

Recognize Set of digits with Percent of correct
1
handwritten digits labels recognitions
Predict length of
2 Patient histories Mean prediction error
hospital stay
Recommend Netflix
3 Viewing histories # users viewing show
shows
Some motivating examples

Early detection of Identifying vulnerable Predicting

disease outbreaks buildings for retrofit transport demand

Preventing violent Reducing CO2 Targeting fire

crime emissions risk inspections

Acknowledgment: D. Neill, Machine Learning for Cities, CUSP NYU

An approach to model fitting
Five main steps:

Use Case & Model Tune

Predict Evaluate
Data training (calibrate)

Determine Split data into Tune model Use the tuned Compare the
question of training and parameters model to form predictions
interest, get test sets. Fit predictions with the
informative model to about your actual values
data. training set. test set for the test set

Variables of interest are categorical, supported by classification

or numerical, supported by regression
Unsupervised Learning
• The only thing we have is input data.
• Labels are not provided by a supervisor.

What does an unsupervised algorithm do?

Extract patterns in the data.
Create clusters whose members are similar (based on some set of measurements).

 Example: Take raw data on visitors to my website

 Segment them into groups that share same characteristics; target ads.
Supervised Learning
• The machine learns from examples that have already been labelled.
• Each example has input values (attributes) and an output value.

Example: A spam classifier learns rules from this training set of emails1

Goal:
 Use known output values to learn the patterns of the input.
 Predict the output value of new examples.

Image credit: Géron, Hands-On Machine Learning

Supervised Learning algorithms

Linear regression
• Models output as linear combination of inputs
 Fast to train, effective on high-dimensional data.

Support Vector Machines

• Learns a decision boundary (linear or non-linear)
 Suits complex, medium-size datasets

Decision trees and Random Forest

• Builds flow-chart style rules that maximize information gain
 High predictive power, requires less data preparation.

Neural networks
• Algorithms inspired by structure and function of the brain.
• Scalable, highly accurate on tasks like image recognition.
Building a model
Use Case & Model
Tune Predict Evaluate
Data training
Use Case & Model
Tune Predict Evaluate
Data training
Build labeled dataset for question of interest
Use Case & Model
Tune Predict Evaluate
Data training
Split training and test data

When fitting ML algorithms, it is common to separate

data into training and test sets
Split the dataset
Dataset
(e.g. 70/30 ratio)

Build model on
the training set

Training set Test set Evaluate model on

(70% of records) (30%)
the test set

Image credit: D. Ziganto “Standard Deviations” blog

Use Case & Model
Tune Predict Evaluate
Data training
Complexity vs. accuracy
We can build models of lower or higher complexity by
changing their hyper-parameters.
Aim for the ‘sweet spot’ that maximizes performance but
avoids overfitting.

* Overfitting: a complex model that memorizes the test set (including noise in it)
but fails to generalize to new data.
Complexity vs. accuracy
We can build models of lower or higher complexity by
changing their hyper-parameters.
Aim for the ‘sweet spot’ that maximizes performance but
avoids overfitting.

* Overfitting: a complex model that memorizes the test set (including noise in it)
but fails to generalize to new data.
Use Case & Model
Tune Predict Evaluate
Data training
Make predictions

With the model tuned and fitted to training data, we can

predict outcomes for test set, ensure its performance is
satisfactory, and deploy.

Figure: Object detection in images

Image: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2016)
EXAMPLE:
Predicting mode of
transport and music
taste
Decision tree model for transport planning
Scenario: The World Bank has hired a cohort of 100 new staff, who start this
summer. GSD needs to decide how many bike racks or parking spaces to build for
them.
Decision tree model for transport planning
Scenario: The World Bank has hired a cohort of 100 new staff, who start this
summer. GSD needs to decide how many bike racks or parking spaces to build for
them.

Attributes (𝑿𝟏 … 𝑿𝑵 ) Target variable (y)

From this training set, construct

set of rules to predict mode of
transport for unseen examples.
Decision tree model for transport planning
Scenario: The World Bank has hired a talented cohort of 100 new staff, who start
after Thanksgiving. GSD needs to decide how many bike racks or parking spaces to
build for them.

Mix of home states and ages

Decision tree model for transport planning
Scenario: The World Bank has hired a talented cohort of 100 new staff, who start
after Thanksgiving. GSD needs to decide how many bike racks or parking spaces to
build for them.

High enjoyment of Netflix

Use Case & Model
Tune Predict Evaluate
Data training

Classification
No ratings yet
Classification
53 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Module3 DS PPT
No ratings yet
Module3 DS PPT
68 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (2)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
5.1 Large Scale ML
No ratings yet
5.1 Large Scale ML
10 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
UNIT - 5 Data Science
No ratings yet
UNIT - 5 Data Science
34 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
38 pages
Machine Learning Notes "2023
No ratings yet
Machine Learning Notes "2023
31 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
ML Iat 1
No ratings yet
ML Iat 1
23 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Unit 5
No ratings yet
Unit 5
30 pages
Lecture 15 - Recap and Midterm Review
No ratings yet
Lecture 15 - Recap and Midterm Review
37 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Made By: Swati Tripathi
No ratings yet
Made By: Swati Tripathi
31 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
Chapter 2 Supervised Learning - p1-2
No ratings yet
Chapter 2 Supervised Learning - p1-2
45 pages
Chapter 7 Learning
No ratings yet
Chapter 7 Learning
34 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
BE02000041 Funda of AI Unit 3 Basics of ML
No ratings yet
BE02000041 Funda of AI Unit 3 Basics of ML
86 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
59 pages
Machine Leaning 1 Unit
No ratings yet
Machine Leaning 1 Unit
10 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
ML Bu
No ratings yet
ML Bu
31 pages
DL UNIT 1 (AB22) Continution
No ratings yet
DL UNIT 1 (AB22) Continution
9 pages
AAI Lecture 9 SP 25
No ratings yet
AAI Lecture 9 SP 25
26 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
96 pages
Unit 4
No ratings yet
Unit 4
61 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
6 pages
Mod 1
No ratings yet
Mod 1
15 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
Week 15
No ratings yet
Week 15
41 pages
INT354 - Unit 1
No ratings yet
INT354 - Unit 1
72 pages
I. The Types of Machine Learning
No ratings yet
I. The Types of Machine Learning
8 pages
ML and DL
No ratings yet
ML and DL
15 pages
Chapter 4 - Machine Learning
No ratings yet
Chapter 4 - Machine Learning
81 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
27 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
Lec2 Intro To ML
No ratings yet
Lec2 Intro To ML
35 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
37 pages
Selected T Chapter 3
No ratings yet
Selected T Chapter 3
62 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
51 pages
Data Science-Unit-4 - 05.10.23
No ratings yet
Data Science-Unit-4 - 05.10.23
59 pages
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
No ratings yet
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
17 pages
0 Machine Learning Overview and Metrics LT
No ratings yet
0 Machine Learning Overview and Metrics LT
84 pages
Unit 3 ML
No ratings yet
Unit 3 ML
40 pages
Machine Learning Basics & Lifecycle
No ratings yet
Machine Learning Basics & Lifecycle
74 pages
Atlas Honda
No ratings yet
Atlas Honda
4 pages
MDCG 2020-13: Clinical Evaluation Assessment Report Template
No ratings yet
MDCG 2020-13: Clinical Evaluation Assessment Report Template
31 pages
Impact Analysis Document Guide
No ratings yet
Impact Analysis Document Guide
3 pages
Unit Plan Rubric: 4 3 2 1 Targeted Standards (5%)
No ratings yet
Unit Plan Rubric: 4 3 2 1 Targeted Standards (5%)
3 pages
Research To Investigate NHIF Uptake at Nakuru Level Five
No ratings yet
Research To Investigate NHIF Uptake at Nakuru Level Five
65 pages
KEY ASPECTS OF ANALYTICAL METHOD VALIDATION AND LINEARITY EVALUATION (Araujo 2009) PDF
No ratings yet
KEY ASPECTS OF ANALYTICAL METHOD VALIDATION AND LINEARITY EVALUATION (Araujo 2009) PDF
11 pages
Khayesi 2018-Rural Development Planning in Africa
No ratings yet
Khayesi 2018-Rural Development Planning in Africa
265 pages
Chemistry PPT New-1
No ratings yet
Chemistry PPT New-1
18 pages
Villarreal Et Al 2019 - Classifier ICPhS
No ratings yet
Villarreal Et Al 2019 - Classifier ICPhS
5 pages
Government Policy and Performance of Small and Med
No ratings yet
Government Policy and Performance of Small and Med
13 pages
Lavender Oil as Mosquito Repellent Study
No ratings yet
Lavender Oil as Mosquito Repellent Study
2 pages
DDDDDDDDDDDDa
No ratings yet
DDDDDDDDDDDDa
33 pages
Success Factors in Family Businesses
No ratings yet
Success Factors in Family Businesses
22 pages
Self-Efficacy in Social Cognitive Theory
No ratings yet
Self-Efficacy in Social Cognitive Theory
34 pages
Sex Every Afternoon Japan PDF
100% (2)
Sex Every Afternoon Japan PDF
178 pages
Inquiries, Investigation and Management
No ratings yet
Inquiries, Investigation and Management
4 pages
HHM 50 Case Studies
No ratings yet
HHM 50 Case Studies
2 pages
Week 5
No ratings yet
Week 5
25 pages
2 Ronald Richman - Caesar Balona - The Actuary and Ibnr Techniques A Machine Learning Approach
No ratings yet
2 Ronald Richman - Caesar Balona - The Actuary and Ibnr Techniques A Machine Learning Approach
53 pages
Student Stress and Coping in Social Work
No ratings yet
Student Stress and Coping in Social Work
11 pages
2way Anova
No ratings yet
2way Anova
4 pages
Optimism and Risk Taking Akshi
No ratings yet
Optimism and Risk Taking Akshi
7 pages
Scope and Delimitation of The Study
No ratings yet
Scope and Delimitation of The Study
1 page
(Ebook) Seeing Through Statistics, 3Rd Edition (With Cd-Rom and Infotrac) by Jessica M. Utts Isbn 9780534394028, 0534394027 Latest PDF 2025
No ratings yet
(Ebook) Seeing Through Statistics, 3Rd Edition (With Cd-Rom and Infotrac) by Jessica M. Utts Isbn 9780534394028, 0534394027 Latest PDF 2025
128 pages
Certified Personal Trainer (CPT) Exam Study Guide
100% (1)
Certified Personal Trainer (CPT) Exam Study Guide
20 pages
DesigningImplementingandManagingE LearninginKenya
No ratings yet
DesigningImplementingandManagingE LearninginKenya
7 pages
Young 2001
No ratings yet
Young 2001
22 pages
Surveying & Geomatics Course Guide
No ratings yet
Surveying & Geomatics Course Guide
2 pages
Osman Research JULY 2025 CCC
No ratings yet
Osman Research JULY 2025 CCC
98 pages
Fascite Plantar
No ratings yet
Fascite Plantar
12 pages

Lecture 2.2 Example Data Preparation Feature Engineering

Uploaded by

Lecture 2.2 Example Data Preparation Feature Engineering

Uploaded by

Feature Engineering

What is Machine Learning?

Goal of any machine learning:

• Learn patterns from examples

Supervised and unsupervised machine learning:

 In both cases learning is achieved through examples!

# Task Experience Performance Measure

Early detection of Identifying vulnerable Predicting

Preventing violent Reducing CO2 Targeting fire

Acknowledgment: D. Neill, Machine Learning for Cities, CUSP NYU

Use Case & Model Tune

Variables of interest are categorical, supported by classification

What does an unsupervised algorithm do?

 Example: Take raw data on visitors to my website

Image credit: Géron, Hands-On Machine Learning

Support Vector Machines

Decision trees and Random Forest

When fitting ML algorithms, it is common to separate

Training set Test set Evaluate model on

Image credit: D. Ziganto “Standard Deviations” blog

With the model tuned and fitted to training data, we can

Figure: Object detection in images

Attributes (𝑿𝟏 … 𝑿𝑵 ) Target variable (y)

From this training set, construct

Mix of home states and ages

High enjoyment of Netflix

You might also like