Deep Learning Assignment 1: Logistic Regression - Solutions
Problem 1: Sigmoid Function and Basic Computations (15 Points)
Part A (5 points) - Sigmoid Values
Formula: σ(z) = 1/(1 + e^(-z))
(i) σ(2.5)
σ(2.5) = 1/(1 + e^(-2.5)) = 1/(1 + 0.082) = 1/1.082 = 0.924
(ii) σ(-1.8)
σ(-1.8) = 1/(1 + e^(1.8)) = 1/(1 + 6.050) = 1/7.050 = 0.142
(iii) σ(0)
σ(0) = 1/(1 + e^0) = 1/(1 + 1) = 1/2 = 0.5
Part B (5 points) - Derivative Values
Formula: σ'(z) = σ(z)(1 - σ(z))
(i) σ'(1.2)
First: σ(1.2) = 1/(1 + e^(-1.2)) = 1/(1 + 0.301) = 0.769
Then: σ'(1.2) = 0.769 × (1 - 0.769) = 0.769 × 0.231 = 0.178
(ii) σ'(-0.5)
First: σ(-0.5) = 1/(1 + e^(0.5)) = 1/(1 + 1.649) = 0.378
Then: σ'(-0.5) = 0.378 × (1 - 0.378) = 0.378 × 0.622 = 0.235
(iii) Maximum of σ'(z)
σ'(z) = σ(z)(1 - σ(z)) is maximized when σ(z) = 0.5
This occurs when z = 0
Maximum value: σ'(0) = 0.5 × 0.5 = 0.25
Part C (5 points) - Solving for z
(i) σ(z) = 0.75
0.75 = 1/(1 + e^(-z))
0.75(1 + e^(-z)) = 1
0.75 + 0.75e^(-z) = 1
0.75e^(-z) = 0.25
e^(-z) = 1/3
-z = ln(1/3) = -ln(3) = -1.099
z = 1.099
(ii) σ(z) = 0.1
0.1 = 1/(1 + e^(-z))
0.1(1 + e^(-z)) = 1
0.1 + 0.1e^(-z) = 1
0.1e^(-z) = 0.9
e^(-z) = 9
-z = ln(9) = 2ln(3) = 2.197
z = -2.197
Problem 2: Logistic Regression Forward Propagation (20 Points)
Given:
w = [1.5, 2.3, -0.8]ᵀ, b = -0.5
Part A (10 points) - Single Email
Email features: x = [0.6, 1, 0.4]ᵀ
(i) Linear combination z
z = wᵀx + b = 1.5×0.6 + 2.3×1 + (-0.8)×0.4 + (-0.5)
z = 0.9 + 2.3 - 0.32 - 0.5 = 2.38
(ii) Predicted probability ŷ
ŷ = σ(2.38) = 1/(1 + e^(-2.38)) = 1/(1 + 0.093) = 0.915
(iii) Classification with threshold 0.5
Since ŷ = 0.915 > 0.5, classify as SPAM
Part B (10 points) - Three Emails
Email 1: x⁽¹⁾ = [0.2, 0, 0.8]ᵀ
z⁽¹⁾ = 1.5×0.2 + 2.3×0 + (-0.8)×0.8 + (-0.5) = 0.3 + 0 - 0.64 - 0.5 = -0.84
ŷ⁽¹⁾ = σ(-0.84) = 1/(1 + e^(0.84)) = 1/(1 + 2.316) = 0.302
Email 2: x⁽²⁾ = [1.1, 1, 0.3]ᵀ
z⁽²⁾ = 1.5×1.1 + 2.3×1 + (-0.8)×0.3 + (-0.5) = 1.65 + 2.3 - 0.24 - 0.5 = 3.21
ŷ⁽²⁾ = σ(3.21) = 1/(1 + e^(-3.21)) = 1/(1 + 0.040) = 0.961
Email 3: x⁽³⁾ = [0.0, 0, 1.2]ᵀ
z⁽³⁾ = 1.5×0 + 2.3×0 + (-0.8)×1.2 + (-0.5) = 0 + 0 - 0.96 - 0.5 = -1.46
ŷ⁽³⁾ = σ(-1.46) = 1/(1 + e^(1.46)) = 1/(1 + 4.306) = 0.188
Most likely spam: Email 2 with ŷ⁽²⁾ = 0.961
Problem 3: Cost Function Calculation (15 Points)
Given Loss Function: L(ŷ, y) = -y log(ŷ) - (1-y) log(1-ŷ)
Part A (10 points) - Individual Losses
Example 1: ŷ⁽¹⁾ = 0.9, y⁽¹⁾ = 1
L⁽¹⁾ = -1×log(0.9) - 0×log(0.1) = -log(0.9) = 0.105
Example 2: ŷ⁽²⁾ = 0.2, y⁽²⁾ = 0
L⁽²⁾ = -0×log(0.2) - 1×log(0.8) = -log(0.8) = 0.223
Example 3: ŷ⁽³⁾ = 0.7, y⁽³⁾ = 1
L⁽³⁾ = -1×log(0.7) - 0×log(0.3) = -log(0.7) = 0.357
Example 4: ŷ⁽⁴⁾ = 0.4, y⁽⁴⁾ = 0
L⁽⁴⁾ = -0×log(0.4) - 1×log(0.6) = -log(0.6) = 0.511
Part B (5 points) - Average Cost
J = (1/4) × (L⁽¹⁾ + L⁽²⁾ + L⁽³⁾ + L⁽⁴⁾)
J = (1/4) × (0.105 + 0.223 + 0.357 + 0.511) = 0.299
Problem 4: Gradient Computation and Parameter Updates (25 Points)
Given:
x = [2.1, -1.3]ᵀ, y = 1
w = [0.4, -0.7]ᵀ, b = 0.2
α = 0.3
Part A (10 points) - Forward Propagation
(i) Calculate z
z = wᵀx + b = 0.4×2.1 + (-0.7)×(-1.3) + 0.2 = 0.84 + 0.91 + 0.2 = 1.95
(ii) Compute ŷ
ŷ = σ(1.95) = 1/(1 + e^(-1.95)) = 1/(1 + 0.142) = 0.876
(iii) Calculate loss
L = -y×log(ŷ) - (1-y)×log(1-ŷ) = -1×log(0.876) - 0×log(0.124) = 0.132
Part B (10 points) - Gradients
(i) ∂L/∂z
∂L/∂z = ŷ - y = 0.876 - 1 = -0.124
(ii) ∂L/∂w₁
∂L/∂w₁ = (ŷ - y) × x₁ = (-0.124) × 2.1 = -0.260
(iii) ∂L/∂w₂
∂L/∂w₂ = (ŷ - y) × x₂ = (-0.124) × (-1.3) = 0.161
(iv) ∂L/∂b
∂L/∂b = ŷ - y = -0.124
Part C (5 points) - Parameter Updates
(i) New w₁
w₁_new = w₁ - α × (∂L/∂w₁) = 0.4 - 0.3 × (-0.260) = 0.4 + 0.078 = 0.478
(ii) New w₂
w₂_new = w₂ - α × (∂L/∂w₂) = -0.7 - 0.3 × 0.161 = -0.7 - 0.048 = -0.748
(iii) New b
b_new = b - α × (∂L/∂b) = 0.2 - 0.3 × (-0.124) = 0.2 + 0.037 = 0.237
Problem 5: Multiple Training Examples (20 Points)
Given Data:
Example x⁽ⁱ⁾ y⁽ⁱ⁾ ŷ⁽ⁱ⁾
1 [1.0, 0.5]ᵀ 1 0.8
2 [-0.5, 1.2]ᵀ 0 0.3
3 [0.8, -0.3]ᵀ 1 0.6
Part A (10 points) - Cost Function
(i) Individual Losses
L⁽¹⁾ = -1×log(0.8) - 0×log(0.2) = -log(0.8) = 0.223
L⁽²⁾ = -0×log(0.3) - 1×log(0.7) = -log(0.7) = 0.357
L⁽³⁾ = -1×log(0.6) - 0×log(0.4) = -log(0.6) = 0.511
(ii) Average Cost
J = (1/3) × (0.223 + 0.357 + 0.511) = 0.364
Part B (10 points) - Average Gradients
(i) ∂J/∂w₁
∂J/∂w₁ = (1/3) × [(0.8-1)×1.0 + (0.3-0)×(-0.5) + (0.6-1)×0.8]
= (1/3) × [(-0.2)×1.0 + (0.3)×(-0.5) + (-0.4)×0.8]
= (1/3) × [-0.2 - 0.15 - 0.32] = (1/3) × (-0.67) = -0.223
(ii) ∂J/∂w₂
∂J/∂w₂ = (1/3) × [(0.8-1)×0.5 + (0.3-0)×1.2 + (0.6-1)×(-0.3)]
= (1/3) × [(-0.2)×0.5 + (0.3)×1.2 + (-0.4)×(-0.3)]
= (1/3) × [-0.1 + 0.36 + 0.12] = (1/3) × 0.38 = 0.127
(iii) ∂J/∂b
∂J/∂b = (1/3) × [(0.8-1) + (0.3-0) + (0.6-1)]
= (1/3) × [-0.2 + 0.3 - 0.4] = (1/3) × (-0.3) = -0.1
Problem 6: Complete Logistic Regression Implementation (20 Points)
Given:
Training data: (0.7, 1), (0.3, 0)
Initial: w = 0.5, b = 0
Learning rate: α = 0.4
Part A (15 points) - One Complete Iteration
For Example 1: (x⁽¹⁾, y⁽¹⁾) = (0.7, 1)
(i) Calculate z⁽¹⁾
z⁽¹⁾ = w×x⁽¹⁾ + b = 0.5×0.7 + 0 = 0.35
(ii) Compute ŷ⁽¹⁾
ŷ⁽¹⁾ = σ(0.35) = 1/(1 + e^(-0.35)) = 1/(1 + 0.705) = 0.587
(iii) Gradients for Example 1
∂L⁽¹⁾/∂w = (ŷ⁽¹⁾ - y⁽¹⁾) × x⁽¹⁾ = (0.587 - 1) × 0.7 = -0.289
∂L⁽¹⁾/∂b = ŷ⁽¹⁾ - y⁽¹⁾ = 0.587 - 1 = -0.413
For Example 2: (x⁽²⁾, y⁽²⁾) = (0.3, 0)
(i) Calculate z⁽²⁾
z⁽²⁾ = w×x⁽²⁾ + b = 0.5×0.3 + 0 = 0.15
(ii) Compute ŷ⁽²⁾
ŷ⁽²⁾ = σ(0.15) = 1/(1 + e^(-0.15)) = 1/(1 + 0.861) = 0.537
(iii) Gradients for Example 2
∂L⁽²⁾/∂w = (ŷ⁽²⁾ - y⁽²⁾) × x⁽²⁾ = (0.537 - 0) × 0.3 = 0.161
∂L⁽²⁾/∂b = ŷ⁽²⁾ - y⁽²⁾ = 0.537 - 0 = 0.537
Parameter Updates:
(i) Average Gradients
∂J/∂w = (1/2) × [(-0.289) + 0.161] = (1/2) × (-0.128) = -0.064
∂J/∂b = (1/2) × [(-0.413) + 0.537] = (1/2) × 0.124 = 0.062
(ii) Updated Parameters
w_new = w - α × (∂J/∂w) = 0.5 - 0.4 × (-0.064) = 0.5 + 0.026 = 0.526
b_new = b - α × (∂J/∂b) = 0 - 0.4 × 0.062 = -0.025
Part B (5 points) - Prediction
For student with 0.5 study hours:
z = w_new × 0.5 + b_new = 0.526 × 0.5 + (-0.025) = 0.263 - 0.025 = 0.238
ŷ = σ(0.238) = 1/(1 + e^(-0.238)) = 1/(1 + 0.788) = 0.559
The probability that the student will pass is 0.559 or 55.9%