0% found this document useful (0 votes)

8 views16 pages

Neural Network Weights & MLP Explained

vision transformer working process

Uploaded by

penta.sravani2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views16 pages

Neural Network Weights & MLP Explained

vision transformer working process

Uploaded by

penta.sravani2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

1.

Initial size and weight:

input_shape = (32, 32, 3)

weight_decay = 0.0001

2. Resize:
Image size: 72 X 72
Patch size: 6 X 6
Patches per image: 144
Elements per patch: 108
new weight is 0.0001

3. Weights before training

a. Data_augmentation:
Shape: (3,)
<[Link] 'mean:0' shape=(3,) dtype=float32, numpy=array([129.30385,
124.06998, 112.43418], dtype=float32)>
------------------------------
mean: 129.30385
========================================
Shape: (3,)
<[Link] 'variance:0' shape=(3,) dtype=float32, numpy=array([4647.1533,
4276.0635, 4958.714 ], dtype=float32)>
------------------------------
variance: 4647.1533
========================================
Shape: ()
<[Link] 'count:0' shape=() dtype=int64, numpy=51200000>
------------------------------
Example single weight value: 51200000

b. patch_encoder
Shape: (108, 64)
<[Link] 'patch_encoder/dense/kernel:0' shape=(108, 64) dtype=float32,
numpy=

single weight value: -0.16472481

Shape: (64,)
<[Link] 'patch_encoder/dense/bias:0' shape=(64,) dtype=float32,
numpy=
single weight value: -0.010926064

c. layer_normalization
Shape: (64,)
<[Link] 'layer_normalization/gamma:0' shape=(64,) dtype=float32,
numpy=
single weight value: 0.9268312
Shape: (64,)
<[Link] 'layer_normalization/beta:0' shape=(64,) dtype=float32,
numpy=
single weight value: -0.0042783115

d. multi_head_attention
Shape: (64, 4, 64)
<[Link] 'multi_head_attention/query/kernel:0' shape=(64, 4, 64)
dtype=float32, numpy=
single weight value: 0.049906813
Shape: (4, 64)
<[Link] 'multi_head_attention/query/bias:0' shape=(4, 64)
dtype=float32, numpy=
single weight value: -0.007588707
Shape: (64, 4, 64)
<[Link] 'multi_head_attention/key/kernel:0' shape=(64, 4, 64)
dtype=float32, numpy=
single weight value: 0.036903888
Shape: (4, 64)
<[Link] 'multi_head_attention/key/bias:0' shape=(4, 64)
dtype=float32, numpy=
single weight value: 1.3176156e-06
Shape: (64, 4, 64)
<[Link] 'multi_head_attention/value/kernel:0' shape=(64, 4, 64)
dtype=float32, numpy=
single weight value: -0.012122304
Shape: (4, 64)
<[Link] 'multi_head_attention/value/bias:0' shape=(4, 64)
dtype=float32, numpy=
single weight value: 0.023671346
Shape: (4, 64, 64)
<[Link] 'multi_head_attention/attention_output/kernel:0' shape=(4,
64, 64) dtype=float32, numpy=
single weight value: -0.06731634
Shape: (64,)
<[Link] 'multi_head_attention/attention_output/bias:0' shape=(64,)
dtype=float32, numpy=
single weight value: -0.023092788

e. layer_normalization_1
Shape: (64,)
<[Link] 'layer_normalization_1/gamma:0' shape=(64,) dtype=float32,
numpy=
single weight value: 1.0274497
Shape: (64,)
<[Link] 'layer_normalization_1/beta:0' shape=(64,) dtype=float32,
numpy=
weight value: -0.004183028

f. dense_1
Shape: (64, 128)
<[Link] 'dense_1/kernel:0' shape=(64, 128) dtype=float32, numpy=
weight value: 0.0751641
Shape: (128,)
<[Link] 'dense_1/bias:0' shape=(128,) dtype=float32, numpy=
single weight value: -0.021522796

dense_2
Shape: (128, 64)
<[Link] 'dense_2/kernel:0' shape=(128, 64) dtype=float32, numpy=
single weight value: -0.10457891
Shape: (64,)
<[Link] 'dense_2/bias:0' shape=(64,) dtype=float32, numpy=
single weight value: -0.024975048

g. layer_normalization_2
Shape: (64,)
<[Link] 'layer_normalization_2/gamma:0' shape=(64,) dtype=float32,
numpy=
single weight value: 1.0829282
Shape: (64,)
<[Link] 'layer_normalization_2/beta:0' shape=(64,) dtype=float32,
numpy=
single weight value: 0.0019328239

You
def mlp(x, hidden_units, dropout_rate): for units in hidden_units: x = [Link](units,
activation=[Link])(x) x = [Link](dropout_rate)(x) return x
sparkGemini
It looks like you're defining a Multi-Layer Perceptron (MLP) block. This is a common
component in various neural network architectures, including Transformers. Here's a
breakdown of the code and how it works:
content_copy
import [Link] as keras
from [Link] import layers

def mlp(x, hidden_units, dropout_rate):

"""
Defines a multi-layer perceptron (MLP) block.

Args:
x: Input tensor.
hidden_units: A list of integers, specifying the number of units in eac
h hidden layer.
dropout_rate: Float between 0 and 1, specifying the dropout rate.

Returns:
Output tensor after passing through the MLP block.
"""
for units in hidden_units:
x = [Link](units, activation=[Link])(x)
x = [Link](dropout_rate)(x)
return x
Use code with caution
Explanation:

1. Input (x): The MLP block takes a tensor x as input. This could be the output of a
previous layer, like the patch embeddings in a Vision Transformer.
2. Hidden Layers: The for loop iterates through the list of hidden_units, creating a
Dense layer for each.
o [Link](units): Creates a fully connected layer with the specified
number of units.
o activation=[Link]: Applies the Gaussian Error
Linear Unit (GELU) activation function to introduce non-linearity. GELU is a
common choice in modern architectures.
o [Link](dropout_rate): Adds a dropout layer to prevent
overfitting by randomly dropping out a fraction of the activations during
training.
3. Output: After passing through all the hidden layers, the final tensor x is returned as
the output of the MLP block.

How this fits into a larger architecture (e.g., Transformer):

 In a Transformer, the MLP block is typically used within each TransformerBlock,

following the self-attention mechanism.
 It helps to process and transform the information extracted by the self-attention layer,
adding further complexity and representational power to the model.

Example usage:
content_copy
# Example input tensor
input_tensor = [Link](shape=(64,)) # Assume input dimension is 64

# Define hidden units and dropout rate

hidden_units = [128, 64]
dropout_rate = 0.1

# Apply the MLP block

output_tensor = mlp(input_tensor, hidden_units, dropout_rate)

# Print the shape of the output

print(output_tensor.shape) # (None, 64
input_1 3072
(InputLayer)

why i am getting different different shapes after each layer

The shapes of tensors change after each layer in a neural network because each layer
performs a different transformation on the data. These transformations can include:

 Convolutional layers: These layers apply filters to the input, which can reduce or
increase the spatial dimensions of the data.
 Pooling layers: These layers downsample the input, reducing its spatial dimensions.
 Dense layers: These layers perform matrix multiplications, which can change the
number of features in the data.
 Reshaping layers: These layers explicitly change the shape of the data.

In your code, the Patches layer extracts patches from the input image, changing its shape
from (image_height, image_width, channels) to (num_patches, patch_size * patch_size *
channels). The PatchEncoder layer then projects these patches to a lower-dimensional
space, changing the shape to (num_patches, projection_dim). The subsequent Transformer
and MLP layers further transform the data, leading to different shapes at each layer.
These shape changes are essential for the network to learn hierarchical representations of the
data. By progressively transforming the data, the network can extract increasingly complex
and abstract features, ultimately leading to better performance on the task at hand.
*Here's a breakdown of how Transformer and MLP layers in your code impact the shape of
data as it flows through the network:
Transformer Layers:

1. Layer Normalization: Doesn't change the shape. It normalizes the activations across
features for each input to stabilize training.
2. Multi-Head Attention: The core of the Transformer. It computes attention weights
between different patches, but the output shape remains the same as the input
(num_patches, projection_dim). Think of it as re-weighting the importance of each
patch based on its relationship to other patches.
3. Skip Connection 1: Adds the output of the attention layer to the original encoded
patches. Again, no change in shape.
4. Layer Normalization 2: Another normalization step, no shape change.
5. MLP (within Transformer): This is where shape changes can occur. Your code uses
two dense layers within the MLP. The first layer expands the features
to projection_dim * 2, and the second layer brings it back down
to projection_dim.
6. Skip Connection 2: Adds the MLP output to the result of the first skip connection,
maintaining the (num_patches, projection_dim) shape.

After the Transformer Blocks:

1. Layer Normalization: Final normalization, no shape change.

2. Flatten: This is a major shape change. It collapses the (num_patches, projection_dim)
tensor into a (num_patches * projection_dim) vector for each input in the batch. This
prepares the data for the final classification.
3. Dropout: No shape change, just randomly sets some activations to zero to prevent
overfitting.

MLP Head (Final Classifier):

1. Dense Layers: Your MLP head has two dense layers. These can change the number
of features. The first layer expands to 2048, the second reduces to 1024.
2. Final Dense Layer: This layer maps the 1024 features to the number of output
classes (100 in your case).

Key Points:

 Transformers primarily focus on re-weighting information within the same

shape. The shape changes within the MLP block of the Transformer are temporary
and resolved by the skip connection.
 The Flatten layer is the most significant shape transformation, preparing the
learned representations for final classification.
 The MLP head's dense layers can further adapt the number of features to be
suitable for the classification task.
*how the image is processing in each layer pixel by pixel or something else
Images are not processed pixel by pixel in your model. Instead, they are processed
in patches and through matrix operations. Here's a breakdown:
1. Patch Extraction:

 The Patches layer divides the input image into non-overlapping patches of a fixed
size (6x6 in your case).
 Each patch is treated as an individual unit of information.

2. Patch Encoding:

 The PatchEncoder layer flattens each patch into a vector and applies a linear
transformation (matrix multiplication) to project it to a lower-dimensional space.
 This step essentially extracts relevant features from each patch.

3. Transformer Layers:

 The Transformer layers process the encoded patches collectively.

 The Multi-Head Attention mechanism within the Transformer computes attention
weights between all pairs of patches, allowing the model to capture relationships and
dependencies between different parts of the image.
 This process is not pixel-based but rather focuses on how different patches interact
with each other.

4. MLP Layers:

 The MLP layers (both within the Transformer and in the final classifier head) perform
further transformations on the patch representations through matrix multiplications
and non-linear activation functions.
 These operations are applied to the entire set of patch features, not individual pixels.

Key Point: The model works with patches and their relationships, not individual pixels. This
allows it to capture higher-level patterns and structures within the image, which is crucial for
tasks like image classification.
Think of it this way: instead of focusing on the color of each pixel, the model tries to
understand the content of each patch (e.g., a part of a car, a piece of sky) and how these
patches relate to each other to form the overall image.
how the patches

After the image is split into patches, each patch is flattened into a vector and linearly
projected to a lower-dimensional embedding space using a Dense layer
(the PatchEncoder). This turns each patch into a 1D vector representing its features.
These patch embeddings are then fed into a series of Transformer blocks. Each block consists
of:
1. Layer Normalization: To stabilize the activations.
2. Multi-Head Attention: This allows the model to weigh the importance of different
patches in relation to each other for each patch. It's like each patch is "looking" at the
other patches to gather contextual information.
3. Skip Connection: The output of the attention is added to the original input of the
block.
4. Layer Normalization: Another normalization step.
5. MLP (Multi-Layer Perceptron): A feedforward network applied to each patch
independently to further process the features.
6. Skip Connection: The output of the MLP is added to the output of the previous skip
connection.

This process of Transformer blocks is repeated multiple times, allowing the model to learn
complex relationships between patches and build a rich representation of the image.
Finally, the output of the last Transformer block is:

1. Normalized one last time.

2. Flattened into a 1D vector.
3. Passed through a Dropout layer for regularization.
4. Fed into an MLP (the final classifier) to produce logits for each of the classes.

All of these operations (projection, attention, MLPs) are essentially matrix multiplications
and additions, allowing for efficient processing on hardware like GPUs.

*Inside the multi-head self-attention block in your code, there are 4 main components (or
"layers"):
1. Linear projections for queries, keys, and values: These are dense layers that
transform the input embeddings into separate query, key, and value representations.
o Mathematical relationship: Each linear projection is a matrix
multiplication: Output = Input * Weight + Bias.
2. Scaled dot-product attention: This layer computes the attention weights by taking
the dot product of queries and keys, scaling it, applying softmax to get probabilities,
and finally weighting the values.
o Mathematical relationship: Attention(Q, K, V) = softmax(Q*K^T /
sqrt(d_k)) * V, where d_k is the dimensionality of keys.
3. Concatenation: The outputs from the multiple attention heads are concatenated into a
single tensor.
o Mathematical relationship: Simple concatenation along the feature dimension.
4. Final linear layer: This layer projects the concatenated attention outputs to the
desired output dimension.
o Mathematical relationship: Again, a matrix multiplication: Output = Input
* Weight + Bias.

Relationship between the layers:

These layers work together to compute attention in multiple "heads" (parallel computations),
allowing the model to focus on different aspects of the input sequence simultaneously. The
linear projections create distinct representations for each head, the scaled dot-product
attention computes attention weights for each head, concatenation combines the information
from all heads, and the final linear layer integrates this information into a unified
representation.

*whether it is single precession or double precession

By default, Keras uses single precision (32-bit floating point numbers). You haven't
explicitly set the data type to double precision.
To use double precision, you would typically specify dtype='float64' when creating
layers or tensors.
*The most computationally complex modules during both training and inference in your
Vision Transformer (ViT) model are:
Training:
1. Multi-Head Attention layers: Calculating attention weights involves dot product
operations and softmax calculations for each head, which scales with sequence length
(number of patches) and embedding dimension.
2. MLP layers within Transformer blocks: The dense layers within the MLP blocks
contribute significantly due to matrix multiplications.
3. Backpropagation: During training, computing gradients for weight updates adds
computational overhead, especially across the attention and MLP layers.
Inference:
1. Multi-Head Attention layers: Similar to training, attention calculations remain a
major contributor to computational complexity during inference.
2. MLP layers within Transformer blocks: Dense layer operations in the MLP blocks
still require significant computation.
While the final MLP classifier head and other components contribute, the attention
mechanism and transformer MLPs are the primary drivers of computational complexity in
ViT models.
*Within the mlp function used in the transformer block, each Dense layer performs a matrix
multiplication.
Steps to understand the matrix multiplications:

1. Input to Dense Layer: The input to a Dense layer is a tensor of

shape (batch_size, input_dim).
2. Weight Matrix: Each Dense layer has a weight matrix of shape (input_dim,
units), where units is the number of neurons in that layer.
3. Bias Vector: Additionally, each Dense layer has a bias vector of shape (units,).
4. Matrix Multiplication and Bias Addition: The Dense layer performs the following
operation:
# Example of matrix multiplication in a neural network layer

import numpy as np

# Sample input (imagine this is the output of a previous layer)

inputs = [Link]([[1, 2], [3, 4]])

# Sample weights of a Dense layer

weights = [Link]([[0.5, 0.1], [0.2, 0.3]])

# Sample biases of the Dense layer

biases = [Link]([0.1, 0.2])

# Matrix multiplication (dot product) between inputs and weights

output = [Link](inputs, weights) + biases

print(output)

In the context of patches_2 (Patches) (None, 144, 108), the None represents the
batch dimension. It signifies that the layer can process an arbitrary number of input samples
(images) in a batch.
The remaining dimensions, 144 and 108, specify the shape of the output patches for each
image in the batch.
*The math behind resizing an image from 32x32 to 72x72 using bilinear interpolation
involves calculating the pixel values in the larger image based on a weighted average of the
surrounding pixels in the original image.
Here's a breakdown of the process:
1. Scaling Factor:

 Calculate the scaling factors in both horizontal (x) and vertical (y) directions:
o scale_x = new_width / original_width = 72 / 32 = 2.25
o scale_y = new_height / original_height = 72 / 32 = 2.25

2. Mapping Coordinates:

 For each pixel (x', y') in the resized image, find the corresponding location (x, y) in
the original image:
o x = x' / scale_x
o y = y' / scale_y

3. Finding Neighboring Pixels:

 Since the calculated (x, y) coordinates might not be integers, identify the four nearest
pixels in the original image that surround this point:
o x1 = floor(x)
o x2 = ceil(x)
o y1 = floor(y)
o y2 = ceil(y)

4. Calculating Weights:

 Determine the weights for each of the four neighboring pixels based on their distance
from the calculated (x, y) point:
o w1 = (x2 - x) * (y2 - y)
o w2 = (x - x1) * (y2 - y)
o w3 = (x2 - x) * (y - y1)
o w4 = (x - x1) * (y - y1)

5. Weighted Average:

 Calculate the value of the pixel (x', y') in the resized image by taking a weighted
average of the four neighboring pixels:
o pixel_value(x', y') = w1 * pixel_value(x1, y1) + w2 *
pixel_value(x2, y1) + w3 * pixel_value(x1, y2) + w4 *
pixel_value(x2, y2)

Note:

 If the calculated (x, y) coordinate falls exactly on a pixel in the original image, the
weight for that pixel will be 1 and the weights for all other pixels will be 0.
 The floor and ceil functions round down and up to the nearest integer,
respectively.
In essence, bilinear interpolation smoothly blends the values of neighboring pixels to
create a more visually appealing resized image, avoiding the blocky appearance that can
occur with simpler methods like nearest neighbor interpolation.

*Let's break down the math behind each layer in your data augmentation pipeline:
1. [Link]():

 Math: This layer calculates the mean and standard deviation of your training data and
uses them to normalize the pixel values.
o For each pixel: normalized_pixel = (pixel - mean) /
standard_deviation
 Purpose: Normalization helps improve model convergence and performance by
ensuring that all features (pixel values in this case) have a similar scale.

2. [Link](image_size, image_size):

 Math: As discussed earlier, this layer uses bilinear interpolation (by default) to resize
images to the specified image_size.
o It involves calculating weighted averages of neighboring pixels to determine
the values in the resized image.
 Purpose: Resizing ensures that all images have a consistent size, which is often
necessary for inputting them into a neural network.

3. [Link]("horizontal"):

 Math: No complex math here. It randomly flips images horizontally with a 50%
probability.
 Purpose: Introduces variability in the dataset, making the model more robust to
variations in object orientation.

4. [Link](factor=0.02):

 Math: Rotates the image randomly within a range determined by the factor. The
rotation angle is sampled from a uniform distribution between [-factor,
factor] (in radians).
o It uses rotation matrices and interpolation to calculate the pixel values in the
rotated image.
 Purpose: Helps the model generalize to different object orientations.

5. [Link](height_factor=0.2, width_factor=0.2):

 Math: Randomly zooms in or out on the image. The zoom factors are sampled from a
uniform distribution within the specified ranges ([1 - height_factor, 1 +
height_factor] and [1 - width_factor, 1 + width_factor]).
o It involves interpolation to calculate the pixel values in the zoomed image.
 Purpose: Increases the model's robustness to variations in object scale.
In Summary:
This data augmentation pipeline combines multiple transformations to create variations of the
original images. These variations help the model learn more general features and improve its
ability to handle real-world data that might have different orientations, sizes, and lighting
conditions. The math behind these transformations involves techniques like normalization,
interpolation, rotation matrices, and random sampling.

*The last piece of code you shared defines a function

called bilinear_interpolation and provides an example of how to use it.
Explanation:
1. Bilinear Interpolation: This function performs bilinear interpolation, a technique
used to estimate the value of a pixel at a non-integer position within an image. It takes
an image and the (x, y) coordinates of the desired point in the resized image as input.
2. Coordinate Handling: It first finds the four nearest integer pixel coordinates to the
desired point. It handles edge cases where the desired point might fall outside the
original image boundaries.
3. Weight Calculation: The function calculates weights (w1, w2, w3, w4) for each of
the four neighboring pixels based on their distance to the desired point. These weights
determine the contribution of each neighbor to the final interpolated value.
4. Interpolation: It then uses these weights to compute the interpolated pixel value as a
weighted average of the four neighbors.
5. Example Usage: The code demonstrates how to use the function with a sample image
and coordinates. It calculates and prints the interpolated pixel value at the given
position.

*The [Link](factor=0.02) code in TensorFlow's Keras API applies a random rotation

to images during data augmentation. Here's the math behind it:

1. Rotation Range:

 The factor parameter determines the range of possible rotation angles.

 In this case, factor=0.02 means the image will be randomly rotated by an angle between -
0.02 and +0.02 radians.

2. Radians to Degrees:

 To understand the rotation in degrees, you can convert radians to degrees using the
following formula:

o degrees = radians * (180 / pi)

 For factor=0.02, the rotation range is approximately -1.15 to +1.15 degrees.

3. Rotation Matrix:

 The actual rotation is performed using a rotation matrix. For a 2D rotation by an angle θ
(theta), the matrix is:
content_copy

[ cos(θ) -sin(θ) ]
[ sin(θ) cos(θ) ]

 This matrix is used to transform the coordinates of each pixel in the image.

4. Interpolation:

 After rotating the image, some pixel values may not align perfectly with the original grid.

 Interpolation methods (e.g., bilinear, nearest neighbor) are used to estimate the values of
these pixels based on the surrounding pixels.

In summary: [Link](factor=0.02) randomly rotates images within a small range of

angles using a rotation matrix and interpolation to handle pixel misalignment. This helps to increase
the diversity of the training data and improve the model's ability to generalize to different
orientations of objects.

RandomFlip:

[Link]("horizontal") what is the maths involved in this

[Link]("horizontal") in TensorFlow's Keras API flips images horizontally with a 50%

probability. While it might seem simple, there's some underlying math involved:

1. Probability:

 The layer randomly decides whether to flip the image or not with a probability of 0.5 (50%).
You could think of this as a coin toss – heads it flips, tails it doesn't.

2. Matrix Transformation (Optional):

 Although not strictly necessary for a simple horizontal flip, you can represent this operation
with a transformation matrix. A horizontal flip can be achieved by multiplying the image's
pixel coordinates with the following matrix:

content_copy

[ -1 0 ]
[ 0 1]

 This matrix negates the x-coordinate of each pixel, effectively mirroring it across the vertical
axis.

3. Pixel Remapping:

 In practice, the flipping operation is usually implemented by directly remapping the pixel
indices. For an image with width w, the pixel at column x is moved to column w - 1 - x.

In summary: [Link]("horizontal") introduces randomness into the data augmentation

process by flipping images horizontally with a 50% chance. This can be implemented through direct
pixel remapping or, conceptually, by using a transformation matrix to mirror the image.

RandomZoom:
[Link](height_factor=0.2, width_factor=0.2) in TensorFlow's Keras API applies a
random zoom to images during data augmentation. Here's a breakdown of the math involved:

1. Zoom Factor Range:

 height_factor=0.2 means the zoom factor for the height will be randomly chosen between 1 -
0.2 = 0.8 and 1 + 0.2 = 1.2.

 Similarly, width_factor=0.2 means the zoom factor for the width will be randomly chosen
between 0.8 and 1.2.

2. Scaling Matrix:

 Zooming can be represented mathematically using a scaling matrix. A zoom with

factors zx (horizontal) and zy (vertical) can be achieved by multiplying the image's pixel
coordinates with the following matrix:

content_copy

[ zx 0 ]
[ 0 zy ]

 This matrix scales the x-coordinates by zx and the y-coordinates by zy.

3. Interpolation:

 When zooming in, new pixel values need to be generated within the zoomed area.

 When zooming out, some pixels from the original image will be removed.

 Interpolation methods (e.g., bilinear, nearest neighbor) are used to estimate pixel values in
both cases.

4. Cropping or Padding:

 After zooming, the image dimensions might have changed.

 If the image is larger than desired, it's cropped to the original size.

 If the image is smaller, it's padded (usually with zeros) to match the original size.

In summary: [Link](height_factor=0.2, width_factor=0.2) randomly zooms images

within a specified range using a scaling matrix and interpolation. Cropping or padding ensures the
output image has the same dimensions as the input.

*This code defines a function that creates a multilayer perceptron (MLP). An MLP is a type of artificial
neural network with multiple layers.

Here's a breakdown of the math involved:

 Dense Layer: [Link](units, activation=[Link])(x)

o Each dense layer performs a linear transformation on the input data: output =
activation(dot(input, weights) + bias).

o weights is a matrix of weights that are learned during training.

o bias is a vector of biases that are also learned during training.

o activation is the activation function, which introduces non-linearity. In this case, it's
GELU (Gaussian Error Linear Unit).

 GELU Activation: [Link]

o GELU is an activation function that is similar to ReLU but smoother. It is defined

as: GELU(x) = x * Φ(x), where Φ(x) is the cumulative distribution function of the
standard normal distribution.

 Dropout: [Link](dropout_rate)(x)

o Dropout is a regularization technique that randomly sets a fraction of input units to 0

at each update during training. This helps prevent overfitting.

o dropout_rate is the fraction of units to drop.

 Loop: for units in hidden_units:

o This loop iterates through the hidden_units list, creating a dense layer with the
specified number of units for each element in the list.

The code defines an MLP by stacking multiple dense layers with GELU activation and dropout. The
number of layers and units in each layer are determined by the hidden_units argument.

MLP Digit Recognition with TensorFlow
No ratings yet
MLP Digit Recognition with TensorFlow
21 pages
ANN Detection Technique
No ratings yet
ANN Detection Technique
20 pages
Multi Layer Perceptron Tf2 Code Description
No ratings yet
Multi Layer Perceptron Tf2 Code Description
10 pages
Visual Transformers
No ratings yet
Visual Transformers
26 pages
MultilayerPerceptron v3
No ratings yet
MultilayerPerceptron v3
78 pages
Report 2
No ratings yet
Report 2
17 pages
2B MultiLayer Perceptron Assignment
No ratings yet
2B MultiLayer Perceptron Assignment
3 pages
Transformer Flux
No ratings yet
Transformer Flux
11 pages
Multi LP
No ratings yet
Multi LP
8 pages
Build Image Classifier with Keras
No ratings yet
Build Image Classifier with Keras
17 pages
Could Give Me A Whole Docs Explain Easha ND Evert...
No ratings yet
Could Give Me A Whole Docs Explain Easha ND Evert...
5 pages
9 MLP Example 08 08 2024
No ratings yet
9 MLP Example 08 08 2024
50 pages
TXT
No ratings yet
TXT
7 pages
Cgan Full Code
No ratings yet
Cgan Full Code
9 pages
PyTorch Neural Network Basics Guide
No ratings yet
PyTorch Neural Network Basics Guide
12 pages
Mlp-Fromscratch Sigmoid-Mse
No ratings yet
Mlp-Fromscratch Sigmoid-Mse
13 pages
Tut4 NN Pytorch Updated - Ipynb - Colab
No ratings yet
Tut4 NN Pytorch Updated - Ipynb - Colab
11 pages
Deep Learning Classification-3
No ratings yet
Deep Learning Classification-3
17 pages
Causal Self-Attention in PyTorch
No ratings yet
Causal Self-Attention in PyTorch
10 pages
Final Code
No ratings yet
Final Code
16 pages
DL Print Final
No ratings yet
DL Print Final
36 pages
DL Lab Manual
No ratings yet
DL Lab Manual
38 pages
ML Unit-5
No ratings yet
ML Unit-5
14 pages
Font Image Augmentation & Model Training
No ratings yet
Font Image Augmentation & Model Training
78 pages
MNIST Handwritten Digit Classification
No ratings yet
MNIST Handwritten Digit Classification
5 pages
Lab 4-Image Segmentation Using U-Net
No ratings yet
Lab 4-Image Segmentation Using U-Net
9 pages
Multi-Layer Perceptron Learning in Tensorflow - GeeksforGeeks
No ratings yet
Multi-Layer Perceptron Learning in Tensorflow - GeeksforGeeks
15 pages
Convolutional Autoencoder in Pytorch On MNIST Dataset - by Eugenia Anello - DataSeries - Medium
No ratings yet
Convolutional Autoencoder in Pytorch On MNIST Dataset - by Eugenia Anello - DataSeries - Medium
18 pages
Aditya Joshi 23252595 Assign 5
No ratings yet
Aditya Joshi 23252595 Assign 5
7 pages
Homework IntroToDL
No ratings yet
Homework IntroToDL
3 pages
Pytorch MNIST Digits Prediction Hands On 1
No ratings yet
Pytorch MNIST Digits Prediction Hands On 1
16 pages
MLP Pytorch Sigmoid Mse
No ratings yet
MLP Pytorch Sigmoid Mse
20 pages
NN From Scratch
No ratings yet
NN From Scratch
5 pages
Video 5 - Building A Multilayer Perceptron For Regression in PyTorch
No ratings yet
Video 5 - Building A Multilayer Perceptron For Regression in PyTorch
17 pages
Deep Learning Python Code Notebook
No ratings yet
Deep Learning Python Code Notebook
9 pages
Week 4
No ratings yet
Week 4
15 pages
Introduction To ANN With Steps 10 25
No ratings yet
Introduction To ANN With Steps 10 25
30 pages
MNIST MLP Digit Classifier Guide
No ratings yet
MNIST MLP Digit Classifier Guide
43 pages
PyTorch Cheat Sheet & Quick Reference
No ratings yet
PyTorch Cheat Sheet & Quick Reference
6 pages
GAN Implementation for Fashion MNIST
No ratings yet
GAN Implementation for Fashion MNIST
4 pages
Deep Face Recognition with Keras
No ratings yet
Deep Face Recognition with Keras
128 pages
DeepTrading With TensorFlow 4 - TodoTrader
No ratings yet
DeepTrading With TensorFlow 4 - TodoTrader
14 pages
MiniProject - ML - Ipynb - Colaboratory
No ratings yet
MiniProject - ML - Ipynb - Colaboratory
26 pages
Assignment3 AL
No ratings yet
Assignment3 AL
23 pages
01 - Mnist - Ipynb (4) - JupyterLab
No ratings yet
01 - Mnist - Ipynb (4) - JupyterLab
23 pages
TensorFlow Debugging Guide
No ratings yet
TensorFlow Debugging Guide
28 pages
Csc413 Project Semantic Segmentation
No ratings yet
Csc413 Project Semantic Segmentation
84 pages
HW4 Pedro Aguiar
No ratings yet
HW4 Pedro Aguiar
6 pages
TensorFlow Autoencoder Implementation Guide
No ratings yet
TensorFlow Autoencoder Implementation Guide
23 pages
CNN TF Keras
No ratings yet
CNN TF Keras
6 pages
Brain Tumor Classification Using Vision Transformer (Vit) : Import As Import As
No ratings yet
Brain Tumor Classification Using Vision Transformer (Vit) : Import As Import As
11 pages
Autoencoder - MPL - Basic - Ipynb - Colaboratory PDF
No ratings yet
Autoencoder - MPL - Basic - Ipynb - Colaboratory PDF
21 pages
(Deep Learning Using PyTorch) (Cheatsheet)
No ratings yet
(Deep Learning Using PyTorch) (Cheatsheet)
7 pages
Applied Machine and Deep Learning
No ratings yet
Applied Machine and Deep Learning
34 pages
Document 2
No ratings yet
Document 2
8 pages
TensorFlow Image Classification Lab
No ratings yet
TensorFlow Image Classification Lab
15 pages
MLG Tensor
No ratings yet
MLG Tensor
34 pages
Original Code
No ratings yet
Original Code
3 pages
PDS Nytro Libra EN PDF
No ratings yet
PDS Nytro Libra EN PDF
2 pages
Psqca
100% (1)
Psqca
2 pages
Lead Acid Battery Test Report HST201602-0576
100% (1)
Lead Acid Battery Test Report HST201602-0576
13 pages
Degreasing
No ratings yet
Degreasing
6 pages
Lecture 9 - Verification and Validation of Simulation Models
No ratings yet
Lecture 9 - Verification and Validation of Simulation Models
28 pages
Phic Yakap
No ratings yet
Phic Yakap
1 page
Civil Estimate
No ratings yet
Civil Estimate
2 pages
1st Trombone Solo For Vehicle
No ratings yet
1st Trombone Solo For Vehicle
1 page
Daly 1
No ratings yet
Daly 1
8 pages
Case Study On Service Marketing
No ratings yet
Case Study On Service Marketing
29 pages
Chapter1 Introduction Digital Electronics
No ratings yet
Chapter1 Introduction Digital Electronics
37 pages
Sow Mac - July2024 Ads504
No ratings yet
Sow Mac - July2024 Ads504
18 pages
Modeling Lab Final Exam (A)
No ratings yet
Modeling Lab Final Exam (A)
3 pages
Hospital Panel List 2025-Q2
No ratings yet
Hospital Panel List 2025-Q2
1 page
Form No 3CEB Clause by Clause Analysis
No ratings yet
Form No 3CEB Clause by Clause Analysis
21 pages
WW2 Map of Europe 1
100% (1)
WW2 Map of Europe 1
3 pages
Dec 2022 UG-EEE Exam Hall Ticket
No ratings yet
Dec 2022 UG-EEE Exam Hall Ticket
1 page
Grade 1 Catch-Up Fridays Program
No ratings yet
Grade 1 Catch-Up Fridays Program
3 pages
BÀI TẬP ÔN UNIT 1
No ratings yet
BÀI TẬP ÔN UNIT 1
5 pages
Entrepreneurship Unit 3 Part 2
No ratings yet
Entrepreneurship Unit 3 Part 2
21 pages
Testament
No ratings yet
Testament
212 pages
Operating Manual ACM Advanced
No ratings yet
Operating Manual ACM Advanced
48 pages
Solution Manual For Electrical Engineering Concepts and Applications 1st Edition by Zekavat ISBN 0132539187 9780132539180 Instant Download
100% (25)
Solution Manual For Electrical Engineering Concepts and Applications 1st Edition by Zekavat ISBN 0132539187 9780132539180 Instant Download
130 pages
Trigonometry of Right Triangles Explained
No ratings yet
Trigonometry of Right Triangles Explained
20 pages
The TwoStep Cluster Analysis - SPSS
No ratings yet
The TwoStep Cluster Analysis - SPSS
16 pages
Quantum Physics
No ratings yet
Quantum Physics
6 pages
BAC 582 Chap 1-7
No ratings yet
BAC 582 Chap 1-7
104 pages
Challenges of Agile Software Development in The Banking Sector - A Systematic Literature Review
No ratings yet
Challenges of Agile Software Development in The Banking Sector - A Systematic Literature Review
8 pages
English Vocabulary Exercises
No ratings yet
English Vocabulary Exercises
2 pages
PG - M.B.a Human Resouce Management - Human Resource Management (English) - 343 34 LABOUR LEGISLATIONS-I - 3819
No ratings yet
PG - M.B.a Human Resouce Management - Human Resource Management (English) - 343 34 LABOUR LEGISLATIONS-I - 3819
216 pages