Deep Learning Assignment 01
Deep Learning Essentials
Question 1: Exploring Neural Network Architectures
Neural networks have advanced beyond simple fully connected architectures. Two
commonly used advanced models are Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs).
1. Convolutional Neural Networks (CNNs)
● Definition: CNNs are specialized neural networks used primarily for image
processing tasks.
● How They Work:
o CNNs use convolutional layers that apply filters (kernels) to an image to
detect patterns like edges, textures, and complex features.
o Pooling layers (e.g., max pooling) help reduce data dimensionality, making
computations more efficient.
o Unlike traditional neural networks, CNNs do not fully connect every neuron;
instead, they focus on local spatial features.
● Difference from Fully Connected Networks:
o CNNs take advantage of spatial structure in images, making them more
efficient by reducing parameters.
o Fully connected networks process each input independently, which does not
preserve spatial relationships in images.
● Real-World Applications:
o Image classification – Used in self-driving cars, medical image analysis,
and facial recognition.
o Object detection – Used in security surveillance and autonomous vehicles.
o Style transfer & image generation – Used in AI-generated artwork and
deepfake technology.
2. Recurrent Neural Networks (RNNs)
● Definition: RNNs are designed to handle sequential data by maintaining a
hidden state that carries past information.
● How They Work:
o Unlike traditional networks, RNNs have loops, allowing them to retain
memory of past inputs.
o This makes them useful for problems where context matters, such as
language processing.
● Difference from Fully Connected Networks:
o Fully connected networks treat each input as independent, while RNNs
maintain dependencies across sequences.
o RNNs are best suited for tasks that require time-series memory, such as
speech or text prediction.
● Real-World Applications:
o Speech recognition – Used in voice assistants like Siri and Google Assistant.
o Machine translation – Used by Google Translate to convert languages.
o Stock price prediction – Used in financial forecasting.
Question 2: Beyond Sigmoid - Activation Functions in
Neural Networks
Activation functions are essential in deep learning because they introduce non-linearity,
allowing neural networks to model complex relationships. Two widely used activation
functions beyond Sigmoid are ReLU and Tanh.
1. Rectified Linear Unit (ReLU)
● Definition: f(x)=max(0,x)f(x) = \max(0, x)f(x)=max(0,x)
● How It Works:
o If the input value is positive, it remains the same.
o If the input value is negative, it becomes zero.
● Advantages:
Prevents the vanishing gradient problem (which occurs in Sigmoid and Tanh).
Computationally efficient, making it faster than other activation functions.
● Common Usage:
o Used in almost all modern deep neural networks for tasks like image
recognition, object detection, and deep reinforcement learning.
● Limitation:
o Dying ReLU problem – Some neurons may output zero permanently if their
weights are not updated properly.
2. Hyperbolic Tangent (Tanh)
● Definition: f(x)=ex−e−xex+e−xf(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
f(x)=ex+e−xex−e−x
● How It Works:
o Outputs values between -1 and 1, making it centered around zero.
o Helps in cases where both negative and positive inputs are important.
● Advantages:
Provides better convergence than Sigmoid.
Helps networks learn patterns with both positive and negative values.
● Common Usage:
o Frequently used in Recurrent Neural Networks (RNNs) due to better gradient
flow.
● Limitation:
o Still suffers from vanishing gradient, though it is better than Sigmoid.
Question 3: Exploring Loss Functions
Loss functions measure how well a neural network’s predictions match actual values.
Two commonly used loss functions are Mean Squared Error (MSE) and Cross-Entropy
Loss.
1. Mean Squared Error (MSE)
● Formula: MSE=1n∑i=1n(yi−y^i)2MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}
_i)^2MSE=n1i=1∑n(yi−y^i)2
● Usage: Used in regression problems, where predictions are continuous values.
● Why It’s Suitable:
Penalizes larger errors more, ensuring a smoother gradient for optimization.
● Real-World Applications:
o Used in predicting house prices, weather forecasting, and stock market
trends.
2. Cross-Entropy Loss (for Multi-Class Classification)
● Formula: −∑iyilog(y^i)-\sum_{i} y_i \log(\hat{y}_i)−i∑yilog(y^i)
● Usage: Used in classification problems where multiple categories exist.
● Why It’s Suitable:
Helps optimize softmax outputs, ensuring valid probability distributions.
● Real-World Applications:
o Used in image classification (e.g., identifying objects in an image).
o Used in spam detection, sentiment analysis, and language modeling.