0% found this document useful (0 votes)
3 views20 pages

Deep Learning Report

Uploaded by

Hıhıhı Wıyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

Deep Learning Report

Uploaded by

Hıhıhı Wıyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Deep Learning

Prepared by: Ali Babadoustsani and Ramazan Yıldız

Abstract: This report provides a comprehensive overview of deep learning, a


transformative branch of artificial intelligence. It explores the core principles, including
artificial neural networks and their ability to process vast datasets e>iciently. The report
delves into pivotal architectures such as Convolutional Neural Networks (CNNs),
Recurrent Neural Networks (RNNs), and Transformers, highlighting their applications
across diverse fields, from image recognition to natural language processing. Additionally,
it discusses the evolution of deep learning, from early theoretical foundations to
breakthroughs enabled by advanced computing power and data availability. By integrating
theoretical insights with practical implementations, this report underscores deep
learning’s role in addressing complex challenges and driving innovation in various
industries.

Table of Contents:

1. Introduction to Deep Learning


1.1. What is Deep Learning?
1.2. Key Features of Deep Learning

2. Historical Evolution
2.1. Foundational Exploration (1950s–1980s)
2.2. Revival and Theoretical Advances (1990s–2000s)
2.3. Data and Hardware Revolution (2000s)
2.4. Breakthrough Architectures (2010s–Present)

3. Deep Learning Architectures


3.1. Artificial Neural Networks
3.2. Convolutional Neural Networks (CNNs)
3.3. Recurrent Neural Networks (RNNs)
3.4. Transformers

4. How Deep Learning Works


4.1. Mathematical Foundations
4.2. Learning Paradigms: Supervised, Unsupervised, and Reinforcement
Learning
4.3. Training Mechanisms and Challenges

5. Applications of Deep Learning


5.1. Image and Video Processing
5.2. Natural Language Processing (NLP)
5.3. Autonomous Systems
5.4. Healthcare and Diagnostics

6. Future Perspectives
6.1. Emerging Trends in Deep Learning
6.2. Ethical and Societal Implications
6.3. Innovations in Computational Techniques

7. References

Introduction to Deep Learning

Deep learning represents a transformative approach within the broader field of artificial
intelligence (AI). By leveraging artificial neural networks that simulate the structure and
function of the human brain, deep learning excels at analyzing vast amounts of data to
uncover patterns and make decisions.

As a subset of machine learning, deep learning distinguishes itself by automating feature


discovery and demonstrating unparalleled performance on diverse data types, including
both structured and unstructured data.

Key Features of Deep Learning:


1. Artificial Neural Networks: Deep learning mimics the human brain by utilizing
artificial neural networks, which consist of multiple layers designed to process and
learn from data hierarchically.

2. Large-Scale Data Processing: It is particularly e>ective at processing vast


datasets, identifying patterns that are often imperceptible to traditional algorithms.

3. Complex Task Handling: Deep learning powers advanced applications such as:

o Image recognition

o Language translation

o Speech synthesis

o Autonomous driving

4. Autonomous Feature Learning: Unlike traditional machine learning, deep learning


does not require manual intervention in feature extraction, making it highly e>icient
and adaptive.

5. Scalability: Its performance scales significantly with larger datasets and higher
computational power, benefiting from advancements in GPU and TPU technology.

Evolution and Importance

Deep learning’s development is rooted in early AI research but gained momentum due to
technological advancements:

• 1950s-1980s: Foundational Exploration: During this period, researchers began


exploring the concept of artificial neural networks. Frank Rosenblatt’s development
of the Perceptron in 1958 was a milestone, introducing a model capable of simple
pattern recognition. However, limitations in computational power and the inability to
train multi-layer networks hindered progress.

• 1990s-2000s: Revival and Theoretical Advances: The backpropagation algorithm,


popularized in the 1980s, allowed for the training of deeper networks. This period
saw the rise of support vector machines and advancements in hardware, which set
the stage for practical implementations of deep learning.

• 2000s: Data and Hardware Revolution: Increased computational power, thanks to


GPUs, and the availability of massive datasets (e.g., ImageNet) catalyzed deep
learning’s progress. Frameworks like TensorFlow and PyTorch emerged, making it
easier for researchers to experiment and deploy models.
• 2010s-Present: Breakthrough Architectures: Key innovations such as
Convolutional Neural Networks (CNNs) for image processing, Recurrent Neural
Networks (RNNs) for sequence modeling, and Transformers for natural language
processing revolutionized the field. These architectures enabled groundbreaking
applications, including real-time language translation and generative models like
GPT and DALL-E.

Why Does Deep Learning Matter?

Deep learning enables machines to achieve human-like capabilities in various domains,


driving advancements across industries. Here are some key reasons why deep learning is
critical:

1. Understanding: It can extract meaningful insights from complex data formats such
as text, images, and audio. This capability powers applications like sentiment
analysis, medical imaging diagnostics, and audio transcription.

2. Decision-Making: By generating predictions and recommendations, deep learning


supports decision-making processes in real-time scenarios such as fraud
detection, stock market analysis, and personalized content delivery.

3. Automation: Deep learning reduces the need for human intervention in data-driven
tasks, enabling automation in areas like robotic process automation, autonomous
vehicles, and smart home systems.

This foundation positions deep learning as a pivotal technology shaping industries from
healthcare to finance and beyond. It continues to unlock new possibilities by tackling
complex challenges with unprecedented accuracy and e>iciency.

How Does Deep Learning Work?

Deep learning operates through artificial neural networks that process data across multiple
layers. Each layer extracts progressively complex features from the input, enabling the
system to handle intricate patterns and relationships e>ectively. Below is a detailed
breakdown of its operational principles:

Mathematical Approach:

1. Forward Propagation:

o Each neuron’s output is calculated as:


"

! = # $! ⋅ &! + (
!#$
§ Here, ! is the weighted sum of inputs, ww represents weights, xx is the
input, and bb is the bias.

o An activation function *(!) is applied: - = *(!)

2. Loss Calculation:

o The loss measures the di>erence between predicted outputs and actual
targets. Common loss functions include:

o Mean Squared Error:


'
1
. = #(1! − 13% )&
0
!#$

o Cross-Entropy Loss:
(

. = − # 1! log(13)
%
!#$

3. Backpropagation:

o Gradients of the loss function with respect to weights are computed using
the chain rule:
7. 7. 7- 7!
= ⋅ ⋅
7$ 7- 7! 7$

o This derivative is used to update the weights:


∂.
$ =$−η⋅
∂$
: is learning rate.
Deep learning employs di>erent approaches depending on the task and data:

1. Supervised Learning:

o Models are trained on labeled data to learn input-output mappings.

o Example applications include image classification, speech recognition, and


sentiment analysis.

2. Unsupervised Learning:

o Models identify patterns and structures in unlabeled data.

o Example applications include clustering, anomaly detection, and


dimensionality reduction techniques such as PCA and autoencoders.

3. Reinforcement Learning:
o Models learn through trial-and-error by interacting with dynamic
environments and receiving rewards or penalties.

o Example applications include robotics, gaming, and autonomous navigation.

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) are a specialized class of deep learning models
designed to process grid-like data structures, such as images and videos. By leveraging
convolutional operations, CNNs e>ectively capture spatial hierarchies and patterns within
the data, making them the backbone of computer vision applications like image
recognition, object detection, and medical imaging analysis.

How CNNs Work

A CNN typically consists of multiple layers designed to progressively extract features from
input data. These layers include convolutional layers, activation functions, pooling layers,
and fully connected layers.

1. Convolutional Layer
The convolutional layer is the core building block of a CNN. It applies filters (kernels)
to the input data to detect local patterns such as edges, textures, or shapes. The
mathematical operation performed is:
+
!!,* = (& ∗ $ + )!,* + ( +

Where:
+
o !!,* is the output of the <-th filter at position (=, ?)

o & is the input data (e.g., an image),

o $ + is the <-th filter (a small matrix of weights),

o ∗ denotes the convolution operation,

o ( + is the bias term for the <-th filter.

The result of this operation is a feature map, which highlights regions in the input that
correspond to the learned pattern of the filter.

2. Activation Function
After convolution, an activation function, typically ReLU (Rectified Linear Unit), is
applied to introduce non-linearity:

ReLU(!) = max(0, !)
This ensures that the network can model complex, non-linear relationships in the data.

3. Pooling Layer
Pooling layers reduce the spatial dimensions of feature maps, making the model
computationally e>icient and less prone to overfitting. The most common pooling
method is max pooling:

!!,* = maxH{!'," }K

Here, !'," represents the values within a pooling window, and the maximum value is
selected as the output.

4. Fully Connected Layer


In the final stages of a CNN, fully connected layers are used to combine extracted
features and make predictions. These layers flatten the feature maps and pass the
data through a standard feedforward neural network.

Strengths and Challenges

CNNs excel at detecting spatial features in images, making them indispensable for
computer vision tasks. By sharing parameters (filters) across spatial locations, they
significantly reduce the number of learnable parameters compared to fully connected
networks, improving generalization and computational e>iciency. However, CNNs can be
computationally expensive for very large images or datasets, and they may struggle with
invariance to rotations or scale changes in the input.

Applications of CNNs

CNNs are widely used in numerous fields:

• Image Recognition and Classification: Identifying objects, faces, or scenes in


images. Applications include security systems, photo tagging, and content
moderation.

• Medical Imaging: Analyzing X-rays, CT scans, and MRIs to detect abnormalities


such as tumors.

• Autonomous Vehicles: Processing camera input to identify obstacles, road signs,


and lane markings for navigation.
• Natural Language Processing: Extracting features from text or sentences in tasks
like sentence classification.

Convolutional Neural Networks have revolutionized the field of computer vision by


introducing architectures capable of learning hierarchical spatial features. While newer
architectures like Vision Transformers (ViTs) are gaining popularity for certain tasks, CNNs
remain a cornerstone of deep learning, especially for image-related applications.

Recurrent Neural Networks (RNNs)

Recurrent Neural Networks (RNNs) are a type of neural network architecture designed to
process sequential data, where the order of the data points is essential. RNNs are capable
of maintaining an internal memory by leveraging feedback connections, allowing them to
use information from previous steps to inform the current computation. This makes them
particularly suited for tasks such as natural language processing, time series forecasting,
and speech recognition.

How RNNs Work

At each time step tt, an RNN takes an input vector xtx_t and updates its hidden state hth_t,
which serves as the network's memory of past inputs. The hidden state is computed as
follows:

ℎ, = tanh(P-. &, + P.. ℎ,/$ + (. )


Here:

• ℎ, is the hidden state at time step Q,

• P-. is the weight matrix for the input-to-hidden connections,

• P.. is the weight matrix for the hidden-to-hidden connections,

• ℎ,/$ is the hidden state from the previous time step,

• (. is the bias term,

• tanh is the activation function that introduces non-linearity.

The output 1, at each time step is then calculated based on the hidden state:

1, = P.0 ℎ, + (0
Where:

• 1, is the output at time step tt,

• P.0 is the weight matrix for the hidden-to-output connections,

• (0 is the output bias term.

Strengths and Challenges

RNNs excel at capturing short-term dependencies in sequences. For example, in language


modeling, they can predict the next word in a sentence based on the context provided by
previous words. However, they struggle with long-term dependencies due to the vanishing
gradient problem, where gradients diminish as they are propagated backward through
many time steps during training. This makes it di>icult for the network to learn from earlier
inputs in long sequences.

Variants to Address Limitations

To address the challenges of standard RNNs, advanced variants such as Long Short-Term
Memory (LSTM) networks and Gated Recurrent Units (GRUs) have been introduced. These
architectures incorporate gating mechanisms to control the flow of information. For
instance, LSTMs use three gates—input, forget, and output gates—to manage what
information to add, forget, or output from their memory cells, enabling them to capture
long-term dependencies e>ectively.

Applications of RNNs

RNNs are widely used in natural language processing tasks such as machine translation,
sentiment analysis, and text generation. In time series analysis, they are employed for
tasks like stock price prediction and weather forecasting. Additionally, in speech
recognition, RNNs process audio signals sequentially to transcribe spoken language into
text.

Despite their limitations with long sequences, RNNs have been foundational in advancing
deep learning for sequential data. While newer architectures like Transformers have
surpassed them in many tasks, RNNs remain a crucial concept and are still used in
applications where sequential modeling is essential.

Transformers

Transformers have revolutionized the field of natural language processing (NLP) and
sequential data tasks. Unlike traditional recurrent neural networks (RNNs), which process
data sequentially, transformers use self-attention mechanisms that allow them to consider
all elements of a sequence simultaneously. This enables them to capture long-range
dependencies more e>iciently, making them highly e>ective for tasks like language
modeling, text generation, translation, and even image processing.

How Transformers Work

Transformers rely heavily on a mechanism called self-attention, which allows the model to
weigh the importance of di>erent words (or elements) in a sequence relative to each other,
regardless of their positions. This is done through multiple layers of attention and
feedforward neural networks. The architecture consists of two main parts: the encoder
and the decoder.

1. Self-Attention Mechanism

The self-attention mechanism computes attention scores for each word in a sequence with
respect to every other word. For each word in the input, self-attention computes a weighted
sum of all words, where the weights reflect how much focus the model should place on
each word relative to the current word. The formula for computing the attention score is as
follows:

TU 1
Attention(T, U, V) = softmax Y \V
Z[+

Where:

o T is the query matrix, derived from the input sequence,

o U is the key matrix, also derived from the input sequence,

o V is the value matrix, which represents the actual data to be processed,

o [+ is the dimension of the key vector, used for scaling.

The result of this operation is a weighted sum of the values, which is passed to the next
layer.

2. Positional Encoding

Since transformers do not inherently capture the order of the sequence, positional
encodings are added to the input embeddings to inject information about the position of
each element in the sequence. The positional encoding PEPE is typically computed using
sine and cosine functions of di>erent frequencies:
`ab
]^(345,&!) = sin _ c
10000&!/8

`ab
]^(345,&!9$) = cos _ c
10000&!/8
Where:

o `ab is the position of the word in the sequence,

o = is the dimension index,

o [ is the dimensionality of the embedding.

These encodings are added to the input embeddings to provide the model with information
about the relative positions of words in the sequence.

3. Multi-Head Attention

In transformers, the self-attention mechanism is applied in parallel multiple times using


di>erent sets of weights, a process called multi-head attention. This allows the model to
capture di>erent types of relationships and dependencies within the data. The results from
each head are concatenated and linearly transformed to produce the final attention
output.

4. Feedforward Neural Networks

After the self-attention layer, each attention output is passed through a position-wise
feedforward neural network, which consists of two layers with a ReLU activation in
between. This enables the model to learn complex, non-linear transformations of the
attention outputs.

5. Encoder-Decoder Architecture

The transformer model is typically split into an encoder and a decoder:

o The encoder processes the input sequence and creates a representation of


the data using the self-attention mechanism and feedforward networks.

o The decoder generates the output sequence, attending to the encoder's


output and the previously generated tokens to produce the next token in the
sequence.

The encoder and decoder are stacked in multiple layers, allowing the model to learn
hierarchical representations of the input and output sequences.
Strengths and Challenges

Transformers are highly e>icient at capturing long-range dependencies due to their self-
attention mechanism, which allows them to process all parts of the sequence
simultaneously. This makes them more parallelizable than RNNs, enabling faster training
and more e>ective handling of long sequences. Additionally, transformers can scale better
as the data size increases, making them highly e>ective for large datasets.

However, transformers can be computationally expensive, especially for long sequences,


as the self-attention mechanism requires O(n2)O(n^2) time complexity with respect to the
sequence length nn. This has led to various optimizations, such as sparse attention
mechanisms, that aim to reduce computational complexity while maintaining
performance.

Applications of Transformers

Transformers have transformed the field of natural language processing and beyond:

• Language Modeling and Text Generation: Models like GPT and BERT are based on
transformers and have been used for a wide range of NLP tasks, including text
generation, summarization, and question answering.

• Machine Translation: Transformers, particularly in the form of models like the


original Transformer (Vaswani et al., 2017), have surpassed RNN-based models in
translation tasks.

• Text Classification and Sentiment Analysis: Transformers are used for classifying
text data, such as detecting sentiment in social media posts or categorizing news
articles.

• Image Processing: Vision transformers (ViTs) have applied the transformer


architecture to image data, showing competitive results compared to CNNs in tasks
like image classification and object detection.

Transformers have redefined the landscape of deep learning by enabling more e>icient
handling of sequential data, especially in NLP. Their ability to model complex relationships
in data has led to state-of-the-art results in various fields, making them the go-to
architecture for many modern AI systems.
This project focuses on handwritten character recognition using neural networks,
specifically trained on the EMNIST dataset. The code begins by setting up the environment,
clearing any previous variables or figures, and loading the pre-trained neural network from
a file named [Link]. The necessary folders, such as those containing test images,
are added to the path to ensure smooth access to resources.
The first step is preprocessing the input image, which is read from a file called [Link].
This image, initially in RGB format, is displayed to show its original state. It is then
converted into a grayscale image, simplifying the data for further processing. After this,
adaptive thresholding is applied to transform the grayscale image into a binary format,
where the text appears white on a black background. Noise is reduced by removing small
objects containing fewer than 30 pixels. The modified image is displayed, and bounding
boxes are drawn around the connected components, visually segmenting the individual
characters or regions of interest.

Next, the segmented regions are processed to extract individual characters. Each
character is resized to a standard dimension of 128x128 pixels and smoothed using a
Gaussian filter to remove any irregularities. It is then further resized to 20x20 pixels and
padded symmetrically to create a uniform size of 28x28 pixels, matching the input
requirements of the neural network. These preprocessed character images are saved as
individual files in a folder named segmentedImages.

Once the characters are prepared, they are fed into the neural network for recognition.
Each character image is reshaped into a vector format suitable for the network’s input
layer. The network processes the image and outputs a probability distribution across all
possible character classes. The character with the highest probability is selected as the
predicted output. This label is then mapped to the corresponding character (either a digit,
an uppercase letter, or a selected lowercase letter) using the imageLabeler function. This
function ensures consistency with the EMNIST dataset’s label structure.

Finally, the detected characters are concatenated to form the complete text, which is
displayed in the MATLAB command window. This marks the culmination of the process,
where a handwritten image is successfully converted into digital text using a combination
of image processing and neural network techniques. The project showcases the practical
application of neural networks in solving real-world problems like handwritten character
recognition, integrating multiple steps seamlessly from preprocessing to final output.

The net object represents the neural network and contains fields that define its
architecture, parameters, and functions. It includes general information about the network,
such as the version, name, e>iciency metrics, and any user-defined metadata stored in the
userdata field.
Visualization of The Neural Networks in the [Link] file using this python code:
The network's architecture is defined by several parameters, including the number of input
nodes (numInputs), layers (numLayers), and output nodes (numOutputs). Additionally, it
specifies delays, such as input delays (numInputDelays), layer delays (numLayerDelays),
and feedback delays (numFeedbackDelays). The total number of weights in the network is
indicated by the numWeightElements field.

Connections and weights are a critical part of the network's structure. These include the
bias configuration (biasConnect), input connections (inputConnect), connections between
layers (layerConnect), and output connections (outputConnect). The weights applied to
inputs (inputWeights) and within layers (layerWeights) are also detailed, along with the bias
values (biases) for each layer.

The network's functions and parameters are equally important. It uses a specific training
algorithm (trainFcn) with associated training parameters (trainParam). The network's
performance is evaluated using a performance function (performFcn) and its parameters
(performParam). Other functions include the adaptation function (adaptFcn), which
adjusts weights during training, the data division method (divideFcn) for splitting datasets,
and the initialization function (initFcn) for setting initial weights and biases. The gradient
function (gradientFcn) is responsible for calculating gradients during optimization.

Finally, the network contains additional elements like the input weights matrix (IW), which
connects inputs to layers, and the layer weights matrix (LW), which connects layers to each
other. Bias values for each layer are stored in the b field. Visualization tools for plotting
network behaviour are provided by the plotFcns field. These components collectively
describe the structure and functionality of the net object.

References:

[Link]
recognition

[Link]

You might also like