0% found this document useful (0 votes)

32 views7 pages

ML Day2

The CS229 Machine Learning notes by Andrew Ng introduce linear regression as a foundational supervised learning algorithm, emphasizing its simplicity and practicality for predicting continuous outputs like housing prices. The document covers key concepts including the supervised learning framework, cost functions, gradient descent methods, and normal equations, providing insights into algorithm selection for different dataset sizes. Practical tips and additional insights highlight the importance of feature scaling and the limitations of linear regression.

Uploaded by

hardik.mahajan.work

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views7 pages

ML Day2

Uploaded by

hardik.mahajan.work

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CS229 Machine Learning Notes:Day 2

Andrew Ng (Lecture Notes, Autumn 2018)

Notes by Hardik Mahajan

May 30, 2025

Contents
1 Introduction to Linear Regression 3
1.1 Why Linear Regression? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Context: Housing Price Example . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Supervised Learning Framework 3

2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Linear Regression Model 4

3.1 Single Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Multiple Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Vectorized Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 Cost Function 4
4.1 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

5 Gradient Descent 4
5.1 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.3 Learning Rate (𝛼) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.4 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

6 Batch vs. Stochastic Gradient Descent 5

6.1 Batch Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6.2 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6.3 Mini-Batch Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6.4 Choosing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

7 Normal Equations 6
7.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.2 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.3 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.4 Cost Function in Matrix Form . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.5 Derivation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.6 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1
8 Practical Tips 7

9 Additional Insights 7

10 Resources 7

2
1 Introduction to Linear Regression
Linear regression is a foundational supervised learning algorithm, ideal for introducing core ma-
chine learning concepts such as cost functions, optimization, and model fitting. It predicts con-
tinuous outputs, making it suitable for applications like forecasting house prices, stock prices,
or temperatures. This lecture, part of Stanford’s CS229 (Autumn 2018) by Andrew Ng, covers
linear regression, batch and stochastic gradient descent, and the normal equations, laying the
groundwork for advanced algorithms.

1.1 Why Linear Regression?

• Simplicity: Easiest algorithm to understand optimization and model design.
• Practicality: Widely used for continuous predictions.
• Contrast with Classification: Regression predicts continuous outputs (e.g., steering an-
gles), while classification predicts discrete outputs (e.g., spam/not spam).

1.2 Context: Housing Price Example

The lecture uses a dataset from Craigslist (Portland, Oregon) to motivate linear regression:
• Features: House size in square feet (𝑥).
• Output: Price in thousands of dollars (𝑦).
• Goal: Fit a straight line to predict price from size.
• Insight: Assumes a linear relationship, which may oversimplify real-world data but serves
as a starting point.

2 Supervised Learning Framework

Supervised learning involves learning from labeled data to map inputs (𝑥) to outputs (𝑦). The
process is:
1. Collect a training set: {(𝜉, 𝑦 (𝑖) )}𝑖=1
𝑚 .

2. Feed to a learning algorithm.

3. Output a hypothesis ℎ(𝑥).
4. Predict 𝑦 for new 𝑥.

2.1 Notation
• 𝑚: Number of training examples.
• 𝑛: Number of features.
• 𝜉: Feature vector for 𝑖-th example (𝑛 + 1-dimensional, with 𝑥 0 = 1).
• 𝑦 (𝑖) : Output for 𝑖-th example.
• 𝜃: Parameter vector (𝑛 + 1-dimensional).

3
• Insight: Standardized notation scales to complex models like neural networks.

3 Linear Regression Model

3.1 Single Feature
The hypothesis for one feature is:
ℎ(𝑥) = 𝜃 0 + 𝜃 1 𝑥
where 𝑥 is the house size, and ℎ(𝑥) is the predicted price.

3.2 Multiple Features

For multiple features (e.g., size 𝑥 1 , bedrooms 𝑥 2 ):
ℎ(𝑥) = 𝜃 0 + 𝜃 1 𝑥 1 + 𝜃 2 𝑥 2
Compact form:
∑
𝑛
ℎ(𝑥) = 𝜃𝑗𝑥𝑗, 𝑥0 = 1
𝑗=0

3.3 Vectorized Form

ℎ(𝑥) = 𝜃 𝑇 𝑥, 𝜃, 𝑥 ∈ R𝑛+1
Insight: Vectorization simplifies coding and computation.

4 Cost Function
The cost function measures prediction error:
1∑
𝑚
𝐽 (𝜃) = (ℎ𝜃 (𝜉) − 𝑦 (𝑖) ) 2
2 𝑖=1
1
• 2: Simplifies derivatives.
• Squared error penalizes large deviations.
• Why Squared?: Convex (single minimum), assumes Gaussian noise (explained in later
lectures).

4.1 Visualization
A scatter plot shows house sizes vs. prices with a fitted line. (Placeholder: Imagine a scatter
plot with points at (2104, 400), (1416, 300), etc., and a red line 𝑦 = 0.15𝑥 + 100.)

5 Gradient Descent
5.1 Goal
Minimize 𝐽 (𝜃) iteratively.

4
5.2 Algorithm
1. Initialize 𝜃 (e.g., zeros).
2. Update:
∑
𝑚
𝜃 𝑗 := 𝜃 𝑗 − 𝛼 (ℎ𝜃 (𝜉) − 𝑦 (𝑖) ) · 𝑥 (𝑖)
𝑗
𝑖=1

3. Repeat until convergence.

5.3 Learning Rate (𝛼)

• Choosing 𝛼: Start with 0.01 (features scaled to [−1, 1]). Try 0.01, 0.03, 0.1, 0.3.
• Monitoring: If 𝐽 (𝜃) increases, reduce 𝛼; if slow, increase 𝛼.
• Insight: Feature scaling ensures consistent 𝛼.

5.4 Convergence
Linear regression has a convex 𝐽 (𝜃), ensuring a global minimum. A contour plot shows the
path of 𝜃 converging to the minimum. (Placeholder: Imagine an elliptical contour plot with a
red path from (0.5, 0.5) to (0.1, 0.05).)

6 Batch vs. Stochastic Gradient Descent

6.1 Batch Gradient Descent
• Uses all 𝑚 examples per update.
• Pros: Precise, converges to global minimum.
• Cons: Slow for large 𝑚 (e.g., millions).
• Use Case: Small datasets (hundreds/thousands).

6.2 Stochastic Gradient Descent

• Updates per example:
𝜃 𝑗 := 𝜃 𝑗 − 𝛼(ℎ𝜃 (𝜉) − 𝑦 (𝑖) ) · 𝑥 (𝑖)
𝑗

• Pros: Fast for large datasets (terabytes).

• Cons: Noisy path, oscillates near minimum.
𝛼0
• Enhancement: Decrease 𝛼 over time (e.g., 𝛼𝑡 = 1+𝑘𝑡 ).

• Stopping: Stop when 𝐽 (𝜃) plateaus.

6.3 Mini-Batch Gradient Descent

Uses small batches (e.g., 100 examples). Balances speed and stability. Common in practice.

5
6.4 Choosing Algorithms
• Small datasets: Batch.
• Large datasets: Stochastic or mini-batch.
• Insight: Stochastic solutions are often ”good enough.”

7 Normal Equations
7.1 Purpose
Solve for optimal 𝜃 in one step:
𝜃 = (𝑋 𝑇 𝑋) −1 𝑋 𝑇 𝑦
where 𝑋 is the 𝑚 × (𝑛 + 1) design matrix, and 𝑦 is the 𝑚 × 1 label vector.

7.2 Advantages
• Single step, no iterations.
• Exact global minimum.

7.3 Disadvantages
• Expensive for large 𝑛 (𝑂 (𝑛3 ) for inversion).
• Non-invertible 𝑋 𝑇 𝑋: Use pseudo-inverse or remove redundant features.

7.4 Cost Function in Matrix Form

1
𝐽 (𝜃) = (𝑋𝜃 − 𝑦)𝑇 (𝑋𝜃 − 𝑦)
2

7.5 Derivation Outline

1. Expand 𝐽 (𝜃).
2. Take derivative w.r.t. 𝜃 using matrix derivatives.
3. Set to zero: 𝑋 𝑇 𝑋𝜃 = 𝑋 𝑇 𝑦.
4. Solve: 𝜃 = (𝑋 𝑇 𝑋) −1 𝑋 𝑇 𝑦.

7.6 Visualization
A line plot compares convergence speed of batch, stochastic, and normal equations. (Place-
holder: Imagine a plot with batch decreasing gradually, stochastic oscillating, and normal equa-
tions jumping to minimum in one step.)

6
8 Practical Tips
• Feature Scaling: Normalize features to [−1, 1].
• Debugging: Plot 𝐽 (𝜃) vs. iterations.
• Implementation: Use Python/NumPy or scikit-learn.

9 Additional Insights
• Why Linear Regression Matters: Foundation for neural networks, generalized linear
models.
• Limitations: Assumes linearity, sensitive to outliers.
• Enhancement: Use robust regression for outliers.
• Future Topics: Generalized linear models, neural networks.

10 Resources
• CS229 lecture notes for derivations.
• Problem Set 1 for matrix derivative practice.
• Discussion sections for calculus refreshers.

Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
38 pages
Undergraduate Fundamentals of Machine Learning
No ratings yet
Undergraduate Fundamentals of Machine Learning
163 pages
Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
No ratings yet
Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
143 pages
Textbook
No ratings yet
Textbook
161 pages
CS229 Machine Learning Lecture Notes
No ratings yet
CS229 Machine Learning Lecture Notes
223 pages
Cs181 Textbook
No ratings yet
Cs181 Textbook
163 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Machine Learning Guide
No ratings yet
Machine Learning Guide
185 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
6 390 Lecture Notes Fall24
No ratings yet
6 390 Lecture Notes Fall24
146 pages
CS229
No ratings yet
CS229
216 pages
Shatter Pitch in MLP Context
No ratings yet
Shatter Pitch in MLP Context
227 pages
Machine Learning Basics Guide
100% (1)
Machine Learning Basics Guide
124 pages
UC Berkeley ML Course Guide
100% (1)
UC Berkeley ML Course Guide
185 pages
CS229: Machine Learning Notes
No ratings yet
CS229: Machine Learning Notes
241 pages
Main Notes
No ratings yet
Main Notes
227 pages
Cs229-Main Notes Andrew NG and Tengyu Ma
No ratings yet
Cs229-Main Notes Andrew NG and Tengyu Ma
227 pages
Summary FS24
No ratings yet
Summary FS24
63 pages
Notes Cce 577
No ratings yet
Notes Cce 577
71 pages
Andrew NG Main - Notes PDF
100% (1)
Andrew NG Main - Notes PDF
226 pages
03 Linear Regression Intuition
No ratings yet
03 Linear Regression Intuition
23 pages
AIMLB PGP 2025 Session 5
No ratings yet
AIMLB PGP 2025 Session 5
67 pages
ML Day3
No ratings yet
ML Day3
10 pages
MIT 6.390 Machine Learning Notes
No ratings yet
MIT 6.390 Machine Learning Notes
144 pages
Machine Learning Simplified
100% (1)
Machine Learning Simplified
109 pages
Sourav Moocs A2 65
No ratings yet
Sourav Moocs A2 65
32 pages
Notes Class1 Copy 2
No ratings yet
Notes Class1 Copy 2
225 pages
Lecture7 Linear Regression
No ratings yet
Lecture7 Linear Regression
36 pages
Optim ML
No ratings yet
Optim ML
41 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
The Hundred-Page Machine Learning Book-Andriy Burkov (2019) - Removed
No ratings yet
The Hundred-Page Machine Learning Book-Andriy Burkov (2019) - Removed
145 pages
Updating Weight
No ratings yet
Updating Weight
9 pages
Unit II - Supervised Machine Learning Techniques
No ratings yet
Unit II - Supervised Machine Learning Techniques
131 pages
Machine Learning Overview & SVMs
No ratings yet
Machine Learning Overview & SVMs
378 pages
MIT 6.036: Machine Learning Overview
No ratings yet
MIT 6.036: Machine Learning Overview
56 pages
Machine Learning Basics for Students
No ratings yet
Machine Learning Basics for Students
7 pages
Supervised Learning and Linear Regression
No ratings yet
Supervised Learning and Linear Regression
141 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Machine Learning Course Notes
No ratings yet
Machine Learning Course Notes
112 pages
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
No ratings yet
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
204 pages
Regression
No ratings yet
Regression
30 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
A Comprehensive Guide To Machine Learning
No ratings yet
A Comprehensive Guide To Machine Learning
152 pages
Dive Into Deep Learning
No ratings yet
Dive Into Deep Learning
883 pages
(MLP) MidtermNote
No ratings yet
(MLP) MidtermNote
31 pages
Optimization For Machine Learning
No ratings yet
Optimization For Machine Learning
45 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Recruitment Task - Startup Support Vertical (2025)
No ratings yet
Recruitment Task - Startup Support Vertical (2025)
6 pages
BSW - Parents Pamphlet
No ratings yet
BSW - Parents Pamphlet
2 pages
Quotes
No ratings yet
Quotes
2 pages
Standardization Campusx
No ratings yet
Standardization Campusx
4 pages
Bypassing KYC Verification Techniques
100% (4)
Bypassing KYC Verification Techniques
2 pages
Test Bank For Information Systems Project Management 1st Edition
No ratings yet
Test Bank For Information Systems Project Management 1st Edition
24 pages
Ritik Mahapatro: Professional Summary
No ratings yet
Ritik Mahapatro: Professional Summary
1 page
Exploring Data with SAS Procedures
No ratings yet
Exploring Data with SAS Procedures
2 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
T BBP 1653924755 Teacher Planner Ver 8
No ratings yet
T BBP 1653924755 Teacher Planner Ver 8
200 pages
Expert in Additive Manufacturing Technologies
No ratings yet
Expert in Additive Manufacturing Technologies
1 page
PRORXD Broadcast Receiver User Guide - Rev12.1
No ratings yet
PRORXD Broadcast Receiver User Guide - Rev12.1
60 pages
TCP Client-Server Socket Programming
No ratings yet
TCP Client-Server Socket Programming
5 pages
Matlab Tutorial
No ratings yet
Matlab Tutorial
25 pages
Dynamics 365 for Business Leaders
0% (1)
Dynamics 365 for Business Leaders
19 pages
Minecraft JVM Arguments Setup Guide
No ratings yet
Minecraft JVM Arguments Setup Guide
3 pages
You've Got Mail
No ratings yet
You've Got Mail
132 pages
Heist CTF Notes - TryHackMe. Solve Ethereum Smart Contract CTF On - by Sle3pyHead ? - ? - May, 2025 - Medium
No ratings yet
Heist CTF Notes - TryHackMe. Solve Ethereum Smart Contract CTF On - by Sle3pyHead ? - ? - May, 2025 - Medium
7 pages
GEA32899B-ReciprCompressors FS R5 PDF
No ratings yet
GEA32899B-ReciprCompressors FS R5 PDF
2 pages
Soft Computing (SC) Topper Solution
100% (2)
Soft Computing (SC) Topper Solution
35 pages
ZX-3 and Zx-5 Error Codes List
100% (6)
ZX-3 and Zx-5 Error Codes List
70 pages
Programming Ecto Build Database Apps in Elixir For Scalability and Performance Darin Wilson PDF Version
No ratings yet
Programming Ecto Build Database Apps in Elixir For Scalability and Performance Darin Wilson PDF Version
66 pages
CH-1 Assignment
No ratings yet
CH-1 Assignment
4 pages
Free Photobook Campaign Report
No ratings yet
Free Photobook Campaign Report
6 pages
(Ebook) Introduction To Modern Cryptography by Jonathan Katz, Yehuda Lindell ISBN 9781466570269, 1466570261 Download
100% (1)
(Ebook) Introduction To Modern Cryptography by Jonathan Katz, Yehuda Lindell ISBN 9781466570269, 1466570261 Download
48 pages
OSY Practical No.5
100% (1)
OSY Practical No.5
7 pages
Customize Windows 10 Start Menu Layout Via GPO
No ratings yet
Customize Windows 10 Start Menu Layout Via GPO
17 pages
Signal Flow Graph
No ratings yet
Signal Flow Graph
16 pages
PDF - N31 Series LED Display Manual 20210701
No ratings yet
PDF - N31 Series LED Display Manual 20210701
6 pages
Norma's TNG Wallet Transactions
No ratings yet
Norma's TNG Wallet Transactions
7 pages
RanjithKumar - SQL Server - Mphasis
No ratings yet
RanjithKumar - SQL Server - Mphasis
5 pages
IT & Multimedia Expert Portfolio
No ratings yet
IT & Multimedia Expert Portfolio
7 pages
Unit - I
No ratings yet
Unit - I
17 pages
4th Project Class XI
No ratings yet
4th Project Class XI
2 pages

ML Day2

Uploaded by

ML Day2

Uploaded by

CS229 Machine Learning Notes:Day 2

Andrew Ng (Lecture Notes, Autumn 2018)

May 30, 2025

2 Supervised Learning Framework 3

3 Linear Regression Model 4

6 Batch vs. Stochastic Gradient Descent 5

1.1 Why Linear Regression?

1.2 Context: Housing Price Example

2 Supervised Learning Framework

2. Feed to a learning algorithm.

3 Linear Regression Model

3.2 Multiple Features

3.3 Vectorized Form

3. Repeat until convergence.

5.3 Learning Rate (𝛼)

6 Batch vs. Stochastic Gradient Descent

6.2 Stochastic Gradient Descent

• Pros: Fast for large datasets (terabytes).

• Stopping: Stop when 𝐽 (𝜃) plateaus.

6.3 Mini-Batch Gradient Descent

7.4 Cost Function in Matrix Form

7.5 Derivation Outline

You might also like