Deep Learning
Deep Learning
(CO1)
1.
Explain the architecture and functioning of the Neocognitron. for 10 marks use this above text
book for answer
The Neocognitron is a pioneering artificial neural network architecture proposed by Kunihiko
Fukushima in the late 1970s, designed for robust pattern recognition tasks like handwritten
character identification. [1]
Biological Inspiration Models visual cortex hierarchy (simple & complex cells) [1]
Layer-by-Layer Structure
Input Layer: Accepts RGB images of size (original paper used ,
but is common in implementations). [3] [4]
Convolutional Layers:
1. Conv1: 96 filters of size , stride 4, followed by ReLU and max pooling.
2. Conv2: 256 filters of size , stride 1, followed by ReLU and max pooling.
3. Conv3: 384 filters of size , stride 1, followed by ReLU.
4. Conv4: 384 filters of size , stride 1, followed by ReLU.
5. Conv5: 256 filters of size , stride 1, followed by ReLU and max pooling. [5] [4] [3]
Fully Connected Layers:
6. FC1: 4096 neurons, ReLU, dropout.
7. FC2: 4096 neurons, ReLU, dropout.
8. FC3: 1000 neurons (for ImageNet), softmax output. [4] [2] [3]
Summary
AlexNet's architecture—deep convolutional layers, ReLU activations, dropout, and GPU training
—enabled it to learn complex features directly from data, outperforming traditional machine
learning approaches that relied on manual feature engineering. This model set the stage for
modern deep learning in computer vision.
⁂
Comparison: LeNet-5 vs. AlexNet
Let's break down the design and application differences between LeNet-5 and AlexNet, two
landmark convolutional neural network (CNN) architectures.
1. Design Differences
Feature LeNet-5 AlexNet
LeNet-5 is a shallow network designed for simple tasks like digit recognition (MNIST
dataset). It uses basic activations and average pooling.
AlexNet is much deeper and wider, designed for large-scale image classification
(ImageNet). It introduced ReLU activations, max pooling, dropout, and GPU training for
scalability and speed. [9] [10] [11] [12] [13] [14] [15]
2. Application Differences
Application
LeNet-5 AlexNet
Area
Dataset Size Small (MNIST: 60,000 images) Large (ImageNet: 1.2 million images)
LeNet-5 is ideal for simple, small-scale problems where speed and efficiency are important.
AlexNet excels in complex, large-scale problems, handling color images and learning rich
hierarchical features, but requires much more computational power and time.
3. Key Innovations
AlexNet improved upon LeNet-5 by:
Using ReLU for faster training.
Employing max pooling for better feature selection.
Adding dropout and normalization to reduce overfitting.
Leveraging GPUs for large-scale data.
4. Summary
LeNet-5: Simple, efficient, best for small grayscale image tasks.
AlexNet: Deep, powerful, best for large, complex color image tasks, and set the stage for
modern deep learning in computer vision.
Would you like to see a diagram or code example for either architecture?
⁂
1. One-Hot Encoding
Definition: Each word is represented by a binary vector. The vector's length equals the
vocabulary size. Only one position (corresponding to the word) is set to 1; all others are 0.
[16] [17]
Example:
Suppose our vocabulary is:
['cat', 'dog', 'apple', 'car']
cat [16]
dog [16]
apple [16]
car [16]
Usage: Simple, easy to implement, but does not capture meaning or relationships between
words.
2. Word2Vec
Definition: Each word is mapped to a dense, low-dimensional vector (embedding) learned
from context in large text corpora. Words with similar meanings have similar vectors.
Example:
Suppose Word2Vec produces 3-dimensional vectors for our vocabulary:
Usage: Captures semantic relationships (e.g., 'cat' and 'dog' vectors are close), enabling
models to understand word meaning and context.
Summary Table
Method Representation Captures Meaning? Example Vector for 'cat'
Key Points
One-hot encoding is simple but limited: no information about word similarity or meaning.
Word2Vec provides rich, learned representations that reflect word relationships and
context.
Would you like to try creating one-hot or Word2Vec vectors for a custom vocabulary?
⁂
Localist Representation
Definition: Each concept (e.g., a word) is represented by a single unit (neuron or vector
position) that is uniquely associated with that concept. The meaning of each unit is
interpretable on its own.
Example: One-hot encoding is a classic localist representation. For a vocabulary of four
words—['cat', 'dog', 'apple', 'car']—the word 'dog' is represented as:
Only the second position is '1', directly indicating 'dog'. Each word has its own unique
position; no overlap or shared meaning between units.
Distributed Representation
Definition: Each concept is represented by a pattern of activity across multiple units. Each
unit participates in representing many concepts, and the meaning of a unit depends on the
activity of others. The representation is typically dense and lower-dimensional than the
vocabulary size.
Example: Word2Vec embeddings are distributed representations. For the same vocabulary,
'dog' might be represented as:
Here, each value contributes to the meaning, and similar words (like 'cat') will have similar
vectors, reflecting semantic relationships. No single position uniquely identifies 'dog';
instead, the pattern as a whole does.
Localist: Each unit has a clear, standalone meaning. No overlap between words.
Distributed: Meaning is spread across units. Similar words have similar patterns, enabling
models to generalize and capture relationships.
Summary
Localist representations (like one-hot encoding) are simple and interpretable, but do not
capture relationships between words.
Distributed representations (like Word2Vec) are dense, share units among concepts, and
reflect semantic similarity, making them powerful for modern NLP tasks.
Would you like to see how these representations affect downstream NLP models or try building
them for a custom vocabulary?
⁂
5. Hierarchical Abstraction
Deep networks use distributed representations to build up multiple levels of abstraction.
Early layers might capture simple features (edges, colors), while deeper layers combine
these into complex concepts (objects, meanings). [31] [32]
Summary Table
Motivation Benefit
7.
What is ImageNet and ILSVRC? Describe their role in deep learning evolution. for 10 marks using
the above textbook
ImageNet is a large-scale, structured image database containing over 14 million annotated
images across more than 20,000 categories. It was created to provide researchers with a
comprehensive resource for training and evaluating computer vision algorithms, especially for
object recognition and classification tasks. [39] [40] [41] [42]
What is ILSVRC?
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is an annual competition that
began in 2010, using a subset of ImageNet with 1,000 object categories. Researchers compete
to develop algorithms that can classify and detect objects in images with the highest accuracy.
ILSVRC quickly became the gold standard for benchmarking computer vision models. [43] [41] [42]
ImageNet and ILSVRC have been instrumental in the evolution of deep learning, providing the
data and competitive environment needed to push the boundaries of computer vision and
artificial intelligence. [43] [42] [41]
⁂
3. Contextual Understanding
Sequence Models: RNNs and transformers process sequences of words, learning how word
meaning changes with context. For example, the word "bank" in "river bank" vs. "money
bank" gets different representations depending on surrounding words. [52] [51]
Self-Attention: Transformers use self-attention to weigh the importance of each word in a
sentence, enabling nuanced understanding of meaning and relationships.
4. Training Process
Supervised Learning: Models are trained on labeled data (e.g., sentiment labels, translation
pairs), learning representations that help solve specific tasks.
Unsupervised Learning: Models can also learn from unlabeled data by predicting missing
words (e.g., BERT's masked language modeling), discovering general language patterns.
[49] [51]
Summary Table
Step What Happens
L1 Regularization
Definition: Adds a penalty proportional to the sum of the absolute values of the weights: $
\Omega(w) = |w|_1 = \sum_i |w_i| $.
Effect: Encourages sparsity in the model by driving some weights exactly to zero. This can
act as a form of automatic feature selection, as irrelevant features are effectively removed
from the model.
Mathematical Impact: The gradient update rule is modified by adding a constant (the sign
of each weight), which can push weights to zero if they are not strongly supported by the
data.
Behavior: L1 regularization leads to sparse solutions, where many weights are exactly zero,
simplifying the model and potentially improving interpretability.
Comparison Table
Penalty
Regularization Typical Effect Use Case
Term
Summary
L2 regularization (weight decay) shrinks weights smoothly, reducing model complexity and
variance.
L1 regularization induces sparsity, setting many weights to zero and performing feature
selection.
Both are essential tools for building deep learning models that generalize well to new data.
If you'd like, we can work through a concrete example of how L1 and L2 regularization affect a
simple neural network's weights during training.
⁂
Dropout Regularization
Dropout: Randomly "drops out" (sets to zero) a fraction of neurons during each training
iteration. This means the network learns to be robust to missing information and cannot rely
on any single neuron or path. [61] [60] [58]
How It Works: Dropout is implemented as a layer that randomly disables a set percentage
of neurons in each forward pass. During inference, all neurons are used, but their outputs
are scaled to account for the training-time dropout rate.
Effect: Dropout acts like training an ensemble of many smaller networks and averaging their
predictions, which improves generalization and reduces overfitting. [58] [61]
Pros and Cons Table
Technique Pros Cons
L2
- Spreads weights, improves stability - Does not produce sparse models
Regularization
Summary
L1 and L2 directly penalize large weights, controlling model complexity and improving
generalization. L1 is best for feature selection; L2 is best for stability and when all features
are useful.
Dropout introduces randomness by dropping neurons, forcing the network to learn
redundant, robust representations. It's especially effective in deep, complex networks.
In practice, Dropout is often combined with L2 for best results. [62] [60] [58]
Would you like to see a code example or a visualization of how these techniques affect a neural
network during training?
⁂
7. Adding Noise
Injects random noise (e.g., salt-and-pepper, Gaussian) to make the model robust to
noisy inputs.
Example: Adding white and black dots to a scanned document image.
8. Color Jittering and Saturation
Randomly alters color properties like hue, saturation, and intensity.
Example: Changing the color tone of a fruit image to simulate different ripeness levels.
9. Perspective and Affine Transformations
Alters the viewpoint or geometry of the image.
Example: Tilting a building image to mimic different camera angles. [65]
10. Blurring and Sharpening
Applies filters to make images less or more detailed.
Example: Blurring a face image to simulate motion or sharpening a landscape photo.
Key Benefits
Prevents overfitting by stopping at the optimal point.
Reduces training time and computational cost.
Simple to implement with most deep learning frameworks. [74] [71] [72]
Summary Table
Step Description
Restore best model Use weights from epoch with best validation loss
In summary: Early stopping is a practical, effective way to prevent overfitting and improve
model generalization by halting training at the right moment based on validation performance.
⁂
Summary Table
Optimizer Key Idea How It Adapts Typical Use
RMSProp Moving average of squared gradients Per-parameter learning rate RNNs, noisy data
Moving average of gradients & squared Per-parameter learning rate Most deep
Adam
gradients (with bias correction) + momentum learning tasks
In practice: Both RMSProp and Adam are preferred over vanilla SGD for training deep neural
networks, as they speed up convergence and handle complex, high-dimensional loss surfaces
more effectively.
⁂
Impact of Xavier and He Initialization on Deep Model Training
Weight initialization is a crucial step in deep learning, directly affecting how well and how
quickly a neural network trains. Poor initialization can lead to vanishing or exploding gradients,
making deep models hard to optimize. Two widely used strategies—Xavier (Glorot)
initialization and He (Kaiming) initialization—were developed to address these issues for
different activation functions.
Xavier Initialization
Purpose: Designed for layers with sigmoid or tanh activations.
Method: Weights are initialized so that the variance of activations remains constant across
layers, preventing gradients from vanishing or exploding.
Formula: For a layer with $ n_{in} $ inputs and $ n_{out} $ outputs, weights are drawn from:
Uniform: $ w \sim U\left(-\sqrt{\frac{6}{n_{in} + n_{out}}}, \sqrt{\frac{6}{n_{in} +
n_{out}}}\right) $
Normal: $ w \sim N\left(0, \sqrt{\frac{2}{n_{in} + n_{out}}}\right) $
Effect: Maintains signal flow, allowing gradients to propagate effectively in deep networks
with tanh/sigmoid activations. [84] [85] [86] [87]
He Initialization
Purpose: Tailored for layers with ReLU or its variants.
Method: Weights are initialized with a higher variance to compensate for ReLU's tendency
to zero out half the inputs.
Formula: For a layer with $ n_{in} $ inputs:
Normal: $ w \sim N\left(0, \frac{2}{n_{in}}\right) $
Effect: Prevents "dying ReLU" and keeps gradients robust, enabling stable and fast training
in deep networks with ReLU activations. [88] [89] [86]
$ \frac{1}{n_{in}} $ or $ \frac{1}{n_{in} +
Xavier Sigmoid/Tanh Vanishing gradients
n_{out}} $
In practice: Match your initialization to your activation function—Xavier for tanh/sigmoid, He for
ReLU. This simple choice can dramatically improve training stability and final model accuracy.
[86] [87] [88] [84]
Would you like to see code examples for applying these initializations in PyTorch or TensorFlow?
⁂
Short Notes
a) Semi-supervised Learning
Semi-supervised learning is a machine learning approach that combines a small amount of
labeled data with a large amount of unlabeled data during training. The main motivation is that
labeled data can be expensive or time-consuming to obtain, while unlabeled data is often
abundant and easy to collect. Semi-supervised learning leverages the structure and patterns in
the unlabeled data to improve learning accuracy, especially when labeled examples are scarce.
How it works: The model is first trained on the labeled data, then uses the unlabeled data to
refine its understanding of the data distribution. Techniques include self-training, co-
training, and graph-based methods.
Benefits: Improves generalization, reduces the need for large labeled datasets, and can
achieve better performance than purely supervised or unsupervised methods in many real-
world scenarios.
Example: In image classification, a model might use a few labeled images of cats and dogs,
along with thousands of unlabeled images, to learn to distinguish between the two more
effectively.
b) Multi-task Learning
Multi-task learning (MTL) is a machine learning paradigm where a single model is trained to
perform multiple related tasks simultaneously, rather than training separate models for each
task. [91] [92] [93] [94] [95] [96]
How it works: MTL models typically share some layers (feature extractors) across all tasks,
while having task-specific output layers ("heads"). The shared layers learn representations
useful for all tasks, while the heads specialize for each task's output.
Benefits:
Improved generalization: By learning from multiple tasks, the model captures more
robust and general features, reducing overfitting.
Data efficiency: Shared representations allow the model to make better use of limited
labeled data for each task.
Reduced model complexity: One model handles several tasks, saving computational
resources.
Enhanced stability: Knowledge transfer between tasks can help compensate for
challenges in individual tasks.
Example: In natural language processing, a single neural network might be trained to
perform both part-of-speech tagging and named entity recognition on the same text,
sharing most of the network's parameters but having separate output layers for each task.
Summary:
Semi-supervised learning leverages both labeled and unlabeled data to improve learning
when labels are scarce.
Multi-task learning trains a single model on multiple related tasks, sharing knowledge to
boost generalization, efficiency, and stability.
⁂
Optimization
Definition: Optimization refers to the mathematical process of finding the best set of model
parameters (weights and biases) that minimize a loss function on the training data. [100] [101]
Goal: The primary goal is to reduce the training error by adjusting parameters using
algorithms like gradient descent, RMSProp, or Adam. [101]
Process: Optimization algorithms iteratively update parameters to find a minimum of the
loss function, often facing challenges like local minima, saddle points, and vanishing
gradients. [100]
Scope: Focuses on the training dataset and the objective function, without direct concern
for how well the model generalizes to new data. [100]
Learning
Definition: Learning in deep networks is the broader process of inferring a suitable model
from data, aiming to capture underlying patterns and generalize to unseen examples. [100]
Goal: The ultimate goal is to minimize generalization error, ensuring the model performs well
on new, unseen data—not just the training set. [100]
Process: Learning involves not only optimization but also regularization, model selection,
and validation to avoid overfitting and underfitting. [100]
Scope: Encompasses the entire modeling pipeline, including data preprocessing,
architecture design, regularization, and evaluation.
Key Differences
Aspect Optimization Learning
Challenges Local minima, saddle points, vanishing gradients Overfitting, underfitting, data bias
Outcome Best parameters for training data Best model for unseen data
Summary
Optimization is a subset of learning, concerned with finding parameters that minimize the
loss on training data.
Learning is the overall process of building a model that generalizes well, combining
optimization with strategies to prevent overfitting and ensure robustness. [101] [100]
In deep learning, successful training requires both effective optimization and thoughtful learning
strategies to achieve high performance on real-world tasks. [101] [100]
⁂
Sum:
So, output = 7.
Summary Table
Step Action
Would you like to try a different filter or see how padding changes the output size?
⁂
1. Convolution Operation
Convolution is the process where a small matrix (called a filter or kernel) slides over the input
image and computes a dot product at each position. This operation extracts features like edges,
textures, or patterns from the image. [108] [109] [110]
Diagram Description:
Imagine a 6×6 grayscale image as a grid of numbers.
Place a 3×3 filter (kernel) over the top-left corner of the image.
Multiply each filter value by the corresponding image pixel, sum the results, and write this
value in the output feature map at the corresponding location.
Slide the filter one pixel to the right (stride 1), repeat the process, and continue until the filter
has covered the entire image.
The output is a smaller grid (feature map) that highlights where the filter detected its
pattern.
Key Terms:
Kernel/Filter: The small matrix used for feature extraction.
Stride: How many pixels the filter moves at each step.
Padding: Adding extra pixels around the image to control output size.
Receptive Field: The region of the input image the filter covers at each step.
2. Pooling Operation
Pooling is used to reduce the spatial size of feature maps, making the network more efficient
and robust to small translations in the input. The most common type is max pooling. [109] [108]
Diagram Description:
Take a feature map (e.g., 4×4 grid).
Place a 2×2 window over the top-left corner.
Find the maximum value in that window and write it in the output.
Slide the window two pixels to the right (stride 2), repeat, and continue for the whole feature
map.
The output is a smaller grid (e.g., 2×2), where each value is the maximum from its window.
Types of Pooling:
Max Pooling: Takes the maximum value in each window.
Average Pooling: Takes the average value in each window.
Summary Table
Operation Purpose How It Works (Diagram)
In practice:
Convolution layers learn filters to detect features.
Pooling layers reduce feature map size, making the network faster and less sensitive to
small changes in the input.
If you want, I can walk you through a specific example with numbers or help you visualize these
steps for your own data.
⁂
Types of Pooling Layers in CNNs and Their Effects
Pooling layers are essential components in Convolutional Neural Networks (CNNs), used to
reduce the spatial dimensions of feature maps while retaining important information. This
process helps make models faster, more efficient, and more robust to variations in input images.
Let's explore the main types of pooling layers and their effects: [116] [117] [118]
1. Max Pooling
How it works: Divides the input feature map into non-overlapping regions (e.g., 2×2) and
selects the maximum value from each region as the output.
Effect: Retains the most prominent features, discards less important details, and provides
translation invariance (the output remains stable even if the input shifts slightly). [117] [118]
Use case: Most common pooling method in image recognition and object detection.
2. Average Pooling
How it works: Computes the average value of each region in the input feature map.
Effect: Produces smoother, more generalized feature maps by reducing the impact of
outliers and noise. [118] [119]
Use case: Useful when input features are noisy or when a more generalized representation
is needed.
3. Global Pooling
How it works: Applies max or average pooling over the entire spatial dimension of the
feature map, reducing each feature map to a single value. [120] [121] [117]
Effect: Drastically reduces dimensionality, often used before fully connected layers for
classification tasks.
Use case: Common in architectures like Global Average Pooling (GAP) before the output
layer.
4. Stochastic Pooling
How it works: Randomly selects a value from each pooling region based on a probability
distribution derived from the region's values. [122] [117]
Effect: Adds randomness, which can help regularize the model and improve generalization.
Use case: Less common, but can be useful for certain regularization needs.
Summary Table
Global Pooling Max/mean over all values Reduces to single value per channel
Pooling layers are chosen based on the task and desired properties. Max pooling is most
common, but other types can be more suitable for specific needs or architectures.
⁂
1. Image Segmentation
Task: Assign a class label to every pixel in an image.
Output: A 2D grid (same size as the input image) where each cell contains a class label
(e.g., background, car, person).
Structure: The output is a structured map, not a single value. Neighboring pixels often have
related labels, capturing spatial structure.
Application: Medical imaging (tumor segmentation), autonomous driving (road/lane
detection).
2. Object Detection
Task: Identify and locate multiple objects in an image.
Output: A set of bounding boxes, each with a class label and coordinates (x, y, width,
height).
Structure: The output is a list of structured records, each describing an object and its
position.
Application: Surveillance, robotics, self-driving cars.
3. Other Examples
Pose Estimation: Predicts coordinates of keypoints (e.g., joints in a human body),
outputting a structured set of points.
Instance Segmentation: Combines detection and segmentation, outputting a mask for each
detected object.
Summary Table
Task Output Type Example Output Structure
1. BCS714A-module-1-textbook.pdf
2. https://en.wikipedia.org/wiki/AlexNet
3. https://viso.ai/deep-learning/alexnet/
4. https://www.digitalocean.com/community/tutorials/popular-deep-learning-architectures-alexnet-vgg-g
ooglenet
5. http://d2l.ai/chapter_convolutional-modern/alexnet.html
6. https://www.geeksforgeeks.org/deep-learning/difference-between-alexnet-and-googlenet/
7. https://www.geeksforgeeks.org/machine-learning/ml-getting-started-with-alexnet/
8. https://www.kaggle.com/code/blurredmachine/alexnet-architecture-a-complete-guide
9. https://massedcompute.com/faq-answers/?question=What+are+the+key+differences+between+LeNet+
and+AlexNet+in+terms+of+architecture+and+applications%3F
10. https://www.geeksforgeeks.org/machine-learning/convolutional-neural-network-cnn-architectures/
11. https://pabloinsente.github.io/the-convolutional-network
12. https://www.iieta.org/download/file/fid/182194
13. https://www.cse.iitm.ac.in/~miteshk/CS7015/Slides/Teaching/pdf/Lecture11.pdf
14. https://www.youtube.com/watch?v=QJVKIHyQzWU
15. https://www.kaggle.com/code/samuelcortinhas/a-piece-of-history-lenet-5-alexnet-from-scratch
16. https://www.geeksforgeeks.org/nlp/one-hot-encoding-in-nlp/
17. https://www.educative.io/answers/one-hot-encoding-of-text-data-in-natural-language-processing
18. https://eavelardev.github.io/gcp_courses/nlp_on_gcp/text_representation/one_hot_encoding_and_bag_o
f_words.html
19. https://www.geeksforgeeks.org/machine-learning/ml-one-hot-encoding/
20. https://www.cloudskillsboost.google/course_templates/40/video/534085
21. https://www.youtube.com/watch?v=2d8iP2_cS-U
22. https://www.youtube.com/watch?v=4l_ybHoKK_4
23. https://ntanmayee.github.io/articles/2017/09/15/distributed-vs-distributional.html
24. https://med.libretexts.org/Bookshelves/Pharmacology_and_Neuroscience/Computational_Cognitive_Neu
roscience_3e_(O'Reilly_and_Munakata)/03:_Networks/3.03:_Categorization_and_Distributed_Representa
tions
25. https://www.cs.toronto.edu/~lczhang/321/notes/notes07.pdf
26. https://pmc.ncbi.nlm.nih.gov/articles/PMC3576056/
27. https://www.biorxiv.org/content/10.1101/2023.02.01.526470v2.full-text
28. https://www.sciencedirect.com/science/article/abs/pii/S0925231218307902
29. https://www.tandfonline.com/doi/full/10.1080/23273798.2016.1267782
30. https://deepai.org/machine-learning-glossary-and-terms/distributed-representation
31. https://www.oreilly.com/content/how-neural-networks-learn-distributed-representations/
32. https://rinuboney.github.io/2015/10/18/theoretical-motivations-deep-learning.html
33. https://stanford.edu/~jlmcc/papers/PDP/Chapter3.pdf
34. https://arxiv.org/abs/2312.17285
35. https://www.sciencedirect.com/science/article/abs/pii/S0950705124002703
36. https://www.dbs.ifi.lmu.de/Lehre/DLAI/WS18-19/script/07_representation.pdf
37. https://www.sciencedirect.com/science/article/abs/pii/S0020025522000585
38. http://www.cs.toronto.edu/~bonner/courses/2014s/csc321/lectures/lec5.pdf
39. https://www.image-net.org/about.php
40. https://www.historyofdatascience.com/imagenet-a-pioneering-vision-for-computers/
41. https://deepai.org/machine-learning-glossary-and-terms/imagenet
42. BCS714A-module-1-textbook.pdf
43. https://viso.ai/deep-learning/imagenet/
44. https://www.pinecone.io/learn/series/image-search/imagenet/
45. https://en.wikipedia.org/wiki/ImageNet
46. https://www.image-net.org
47. https://www.kaggle.com/getting-started/149448
48. https://journals.sagepub.com/doi/full/10.1177/20539517211035955
49. https://viso.ai/deep-learning/representation-learning/
50. https://en.wikipedia.org/wiki/Deep_learning
51. https://www.ibm.com/think/topics/deep-learning
52. https://deepgram.com/ai-glossary/representation-learning
53. https://www.geeksforgeeks.org/deep-learning/introduction-deep-learning/
54. https://www.sciencedirect.com/science/article/pii/S1532046420302653
55. https://onlinecourses.nptel.ac.in/noc25_cs22/preview
56. BCS714A-module-2-textbook.pdf
57. https://www.e2enetworks.com/blog/regularization-in-deep-learning-l1-l2-dropout
58. https://www.linkedin.com/pulse/understanding-regularization-techniques-l1-l2-dropout-joshua-cox-aig
uc
59. https://towardsdatascience.com/l1-vs-l2-regularization-in-machine-learning-differences-advantages-a
nd-how-to-apply-them-in-72eb12f102b5/
60. https://www.skillcamper.com/blog/the-role-of-regularization-in-deep-learning-models
61. https://www.geeksforgeeks.org/deep-learning/dropout-regularization-in-deep-learning/
62. https://massedcompute.com/faq-answers/?question=What+are+the+differences+between+dropout+an
d+L1+and+L2+regularization%3F
63. https://encord.com/blog/data-augmentation-guide/
64. https://research.aimultiple.com/data-augmentation-techniques/
65. https://www.ultralytics.com/blog/the-ultimate-guide-to-data-augmentation-in-2025
66. https://blog.roboflow.com/data-augmentation/
67. https://www.ccslearningacademy.com/what-is-data-augmentation/
68. https://viso.ai/computer-vision/image-data-augmentation-for-computer-vision/
69. https://aws.amazon.com/what-is/data-augmentation/
70. https://www.sciencedirect.com/science/article/pii/S2590005622000911
71. https://www.geeksforgeeks.org/deep-learning/using-early-stopping-to-reduce-overfitting-in-neural-n
etworks/
72. https://www.geeksforgeeks.org/machine-learning/regularization-by-early-stopping/
73. https://milvus.io/ai-quick-reference/what-is-early-stopping
74. https://www.linkedin.com/pulse/real-world-ml-early-stopping-deep-learning-guide-olamendy-turruella
s-pip9c
75. https://github.com/phuongpho/early-stopping
76. https://www.machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-tim
e-using-early-stopping/
77. https://studyglance.in/dl/display.php?tno=12&topic=Early-stopping
78. https://codesignal.com/learn/courses/modeling-the-iris-dataset-with-tensorflow/lessons/implementing-
early-stopping-in-tensorflow-to-prevent-overfitting
79. https://towardsdatascience.com/understanding-deep-learning-optimizers-momentum-adagrad-rmspro
p-adam-e311e377e9c2/
80. https://community.deeplearning.ai/t/difference-between-rmsprop-and-adam/310187
81. https://joiv.org/index.php/joiv/article/view/1818
82. https://www.digitalocean.com/community/tutorials/intro-to-optimization-momentum-rmsprop-adam
83. https://www.kaggle.com/code/harpdeci/intuitive-explanation-of-sgd-adam-and-rmsprop
84. https://cs230.stanford.edu/section/4/
85. https://www.geeksforgeeks.org/deep-learning/xavier-initialization/
86. https://businessanalytics.substack.com/p/weight-initialization-in-neural-networks
87. https://365datascience.com/tutorials/machine-learning-tutorials/what-is-xavier-initialization/
88. https://stackoverflow.com/questions/48641192/xavier-and-he-normal-initialization-difference
89. https://www.deeplearning.ai/ai-notes/initialization/
90. https://en.wikipedia.org/wiki/Weight_initialization
91. https://www.geeksforgeeks.org/deep-learning/introduction-to-multi-task-learningmtl-for-deep-learnin
g/
92. https://www.infosysbpm.com/glossary/multi-task-learning.html
93. https://www.v7labs.com/blog/multi-task-learning-guide
94. https://studyglance.in/dl/display.php?tno=11&topic=Multi-Task-Learning
95. https://milvus.io/ai-quick-reference/how-does-multitask-learning-work-in-deep-learning
96. https://codefinity.com/blog/What-is-Multi-task-Learning
97. https://www.jmlr.org/papers/volume17/15-242/15-242.pdf
98. https://www.sciencedirect.com/science/article/abs/pii/S0010482522012045
99. https://arxiv.org/abs/2404.18961
100. http://d2l.ai/chapter_optimization/optimization-intro.html
101. https://www.geeksforgeeks.org/deep-learning/optimization-rule-in-deep-neural-networks/
102. https://aws.amazon.com/compare/the-difference-between-deep-learning-and-neural-networks/
103. https://arxiv.org/abs/2007.14166
104. https://www.worldscientific.com/doi/10.1142/S0218001420520138
105. https://www.ibm.com/think/topics/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks
106. https://www.reddit.com/r/deeplearning/comments/1dgkut0/why_are_neural_networks_optimized_instea
d_of_just/
107. BCS714A-module-3-textbook.pdf
108. https://learnopencv.com/understanding-convolutional-neural-networks-cnn/
109. https://viso.ai/deep-learning/convolution-operations/
110. https://en.wikipedia.org/wiki/Convolutional_neural_network
111. https://www.geeksforgeeks.org/machine-learning/introduction-convolution-neural-network/
112. https://poloclub.github.io/cnn-explainer/
113. https://towardsdatascience.com/convolutional-neural-network-cnn-architecture-explained-in-plain-eng
lish-using-simple-diagrams-e5de17eacc8f/
114. https://www.sciencedirect.com/topics/computer-science/convolution-operation
115. https://developer.nvidia.com/discover/convolutional-neural-network
116. https://www.geeksforgeeks.org/deep-learning/cnn-introduction-to-pooling-layer/
117. https://www.deepchecks.com/glossary/pooling-layers-in-cnn/
118. https://www.linkedin.com/pulse/pooling-cnn-types-its-use-priyanka-yadav-5innc
119. https://www.nature.com/articles/s41598-024-51258-6
120. https://www.baeldung.com/cs/neural-networks-pooling-layers
121. https://www.digitalocean.com/community/tutorials/pooling-in-convolutional-neural-networks
122. https://en.wikipedia.org/wiki/Pooling_layer
123. https://www.janbasktraining.com/tutorials/deep-learning-structured-outputs/
124. https://www.scribd.com/document/851025794/4-Structured-outputs-Data-types
125. https://cookbook.openai.com/examples/structured_outputs_intro
126. https://www.geeksforgeeks.org/deep-learning/convolutional-neural-network-cnn-in-machine-learning/
127. https://huggingface.co/docs/inference-providers/en/guides/structured-output
128. https://python.langchain.com/docs/concepts/structured_outputs/
129. https://www.upgrad.com/blog/basic-cnn-architecture/
130. https://towardsdatascience.com/structured-outputs-and-how-to-use-them-40bd86881d39/
131. https://community.openai.com/t/structured-outputs-deep-dive/930169