0% found this document useful (0 votes)
50 views8 pages

Neural Networks & Deep Learning - Study Notes

Uploaded by

xedac78301
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views8 pages

Neural Networks & Deep Learning - Study Notes

Uploaded by

xedac78301
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Deep Learning Notes - January 2025

Neural Networks & Deep Learning

From Perceptrons to Deep Architectures - My Learning Journey

Big Picture: Neural Networks are inspired by the human brain, using
layers of interconnected "neurons" to learn complex patterns. Deep
Learning = Neural Networks with many layers!

1. The Biological Inspiration

Just like our brain has ~86 billion neurons connected by synapses, artificial
neural networks have nodes (neurons) connected by weights. The magic
happens when these simple units work together!

Key Insight: Each neuron does something simple, but together they
can approximate ANY function (Universal Approximation Theorem)

2. The Perceptron - Where It All Began

Single Perceptron Model:

Output = activation(Σ(wi × xi) + bias) where: wi = weights


xi = inputs bias = threshold adjustment

Input Layer Perceptron Output

x1 ----w1----\
\
x2 ----w2----- [Σ → f()] → y
/
x3 ----w3----/
+
bias

Limitations of Single Perceptron:


Can only solve linearly separable problems

XOR problem exposed this limitation!

Solution? Stack multiple layers → Multi-Layer Perceptron (MLP)

3. Anatomy of a Neural Network

Essential Components:

Input Layer - Raw features (pixels, words, numbers)

Hidden Layers - Where the learning happens

Output Layer - Final predictions

Weights & Biases - The parameters we learn

Activation Functions - Add non-linearity

Simple Neural Network Architecture: Input Hidden Layer 1 Hidden Layer 2


Output O ---------> O \ O -----------> O O ---------> O - - - - -> O |
\ / \ v O ---------> O - - - -> O O ---------> [y] \ / / O ---------> O
- - -> O ------ [784 inputs] [128 neurons] [64 neurons] [10 classes]

4. Activation Functions - Adding Non-linearity

Without activation functions, even deep networks would just be linear


transformations!

Common Activation Functions:

1. ReLU (Rectified Linear Unit) - My go-to for hidden layers!

f(x) = max(0, x)

Pros: Simple, no vanishing gradient, fast Cons: Dead neurons problem

2. Sigmoid - Classic, outputs between 0 and 1


f(x) = 1 / (1 + e^(-x))

Use case: Binary classification output layer Issue: Vanishing gradients


in deep networks

3. Tanh - Centered around zero

f(x) = (e^x - e^(-x)) / (e^x + e^(-x))

Better than sigmoid for hidden layers

4. Softmax - For multi-class output

f(xi) = e^xi / Σ(e^xj)

Outputs probability distribution

5. Forward Propagation

The journey of data through the network:

1. Input data enters

2. Multiply by weights, add bias

3. Apply activation function

4. Pass to next layer

5. Repeat until output

Layer output: a[l] = activation(W[l] × a[l-1] + b[l])

6. Backpropagation - The Learning Magic

The Chain Rule is Everything!


Backprop = Computing gradients using chain rule + Gradient descent
Steps in Backpropagation:

1. Forward Pass: Compute predictions

2. Calculate Loss: How wrong were we?

3. Backward Pass: Compute gradients using chain rule

4. Update Weights: W = W - α × ∂L/∂W

Weight Update Rule: W[l] = W[l] - α × ∂L/∂W[l] b[l] = b[l] -


α × ∂L/∂b[l] where α = learning rate

7. Loss Functions

For Regression:

MSE : L = (1/n) × Σ(y_true - y_pred)²

MAE : L = (1/n) × Σ|y_true - y_pred|

For Classification:

Binary Cross-Entropy : -Σ(y×log(p) + (1-y)×log(1-p))

Categorical Cross-Entropy : -Σ(y×log(p))

8. Optimization Algorithms

Gradient Descent is great, but we can do better!

Evolution of Optimizers:

SGD (Stochastic Gradient Descent) - The classic

Momentum - Adds velocity to updates

RMSprop - Adaptive learning rates

Adam - Combines momentum + RMSprop (my favorite!)

Adam Update: m = β1×m + (1-β1)×gradient v = β2×v + (1-


β2)×gradient² W = W - α × m/(√v + ε)
9. Regularization Techniques

Fighting Overfitting: Great training accuracy but poor test accuracy?


Time to regularize!

Key Techniques:

L1/L2 Regularization - Add penalty to loss function

Dropout - Randomly "turn off" neurons during training

Early Stopping - Stop when validation loss increases

Batch Normalization - Normalize inputs to each layer

Data Augmentation - Create more training data

10. Deep Learning Architectures

Convolutional Neural Networks (CNNs)

Specialized for images - use convolution operations

CNN Architecture: Input → Conv → Pool → Conv → Pool → Flatten → Dense →


Output (Image) (Feature Maps) (Reduced) (Classification)

Recurrent Neural Networks (RNNs)

For sequential data - have memory!

Vanilla RNN - Simple but suffers from vanishing gradients

LSTM - Long Short-Term Memory (gates solve gradient problem)

GRU - Gated Recurrent Unit (simpler than LSTM)

Transformer Architecture

The revolution in NLP - "Attention is all you need"

Self-attention mechanism allows model to focus on relevant parts of


input
11. Training Tips & Tricks

Personal Best Practices:

Start with a small network, gradually increase complexity

Always monitor training AND validation loss

Learning rate is crucial - try 0.001 as starting point

Batch size affects convergence - powers of 2 work well

Save checkpoints regularly!

12. Common Problems & Solutions

Vanishing Gradients:

Use ReLU instead of sigmoid/tanh

Proper weight initialization (Xavier/He)

Batch normalization

Exploding Gradients:

Gradient clipping

Proper weight initialization

Lower learning rate

Overfitting:

More data!

Dropout layers

L1/L2 regularization

Reduce model complexity

13. PyTorch Implementation Snippet


Simple Neural Network in PyTorch:

import [Link] as nn

class SimpleNN([Link]):
def __init__(self):
super().__init__()
self.fc1 = [Link](784, 128)
self.fc2 = [Link](128, 64)
self.fc3 = [Link](64, 10)
[Link] = [Link]()
[Link] = [Link](0.2)

def forward(self, x):


x = [Link](self.fc1(x))
x = [Link](x)
x = [Link](self.fc2(x))
x = [Link](x)
x = self.fc3(x)
return x

14. Hyperparameter Tuning

Hyperparameters to Tune (in order of importance):

1. Learning rate

2. Number of layers & neurons

3. Batch size

4. Dropout rate

5. Activation functions

6. Optimizer choice

15. My Learning Resources


Deep Learning by Ian Goodfellow (the bible!)

[Link] courses - practical approach

3Blue1Brown neural network series - visual intuition

Papers With Code - latest research

PyTorch tutorials - hands-on practice

Final Thoughts:
Neural networks seemed like magic at first, but they're just clever math!
The key is understanding the fundamentals - forward prop, backprop,
and gradient descent. Everything else builds on these concepts.

"Deep learning is not a black box - it's a very complex but understandable
system of simple operations"

You might also like