0% found this document useful (0 votes)

32 views34 pages

Lecture02-Basics of Deep Learning

The document is a lecture on the basics of deep learning, covering topics such as data analysis, machine learning approaches, deep learning algorithms, and the importance of data quality and structure. It discusses various model structures, activation functions, loss functions, and optimization techniques including SGD and Adam. Additionally, it highlights the significance of dataset splitting and mini-batching for efficient training of neural networks.

Uploaded by

chatpgtzhangyue

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views34 pages

Lecture02-Basics of Deep Learning

Uploaded by

chatpgtzhangyue

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Advanced Natural Language Processing

Lecture 2: Basics of Deep Learning

陈冠华 CHEN Guanhua

Department of Statistics and Data Science
Content

• Introduction
• History
• Model
• Optimization
• Training
• Coding

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 2
Data Analyses

• To create a function to map an input X into an output Y,

• Examples:

• To create such a system, we can use

• Manual creation of rules
• Machine learning from paired data <X, Y>

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 3
Machine Learning

• Statistical approach
• Generalized linear model, linear regression, logistic regression
• Gaussian mixture models
• Support vector machine (SVM)
• Decision trees, random forests
• Deep learning approach
• Modeling with different deep neural networks

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 4
Why didn’t They Work Before?

• Datasets too small

• For machine translation, not really better until you have 1M+ parallel sentences
(and really need a lot more)
• Optimization not well understood
• Good initialization
• Momentum (Adagrad/Adam) work best out-of-the-box
• Other innovations
• Word embedding
• Dropout, layer normalization, residual connection
• Large-scale computing system

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 5
Deep Learning

• Modeling with deep neural networks

• Optimized on big data
• Research Contributions

Collect/create the
Design a model
dataset

Design a task Optimize a model

Analyze and
Design an evaluation
understand the model
metric
and results

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 6
Deep Learning Algorithm Sketch

• Create a model and define a loss

• For each example
• Forward process: calculate the result (prediction & loss) of that example
• if training
• Perform back propagation
• Update parameters

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 7
Deep Learning Algorithm Sketch

Forward Pass

Python files
Data Neural Model Ground Truth • [Link]
Input Network Output (Golden Label) • [Link]
• [Link]
• [Link]
• [Link]
Backward Pass
Optimizer Training
Gradient Update Loss

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 8
Dataset Split

When creating a system, use three sets of data

• Training Set
• Generally larger dataset, used during system design, creation, and learning of
parameters.
• Development/validation Set
• Smaller dataset for testing different design decisions ("hyper-parameters").
• Test Set
• Dataset reflecting the final test scenario, do not use for making design decisions.

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 9
Data is Very Important

Data

Scale Quality

Correctnes
Diverse
s

Tasks Difficulties

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 10
Deep Learning Algorithm Sketch

• Create a model and define a loss

• For each example
• Forward process: calculate the result (prediction & loss) of that example
• if training
• Perform back propagation
• Update parameters

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 11
Different Model Structures
Feed-forward NNs Recurrent NNs

Convolutional NNs
Transformer

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 12
Feed-forward Neural Network

• The units are connected with no cycles

• The outputs from units in each layer are passed to units in the next higher
layer
• No outputs are passed back to lower layers
• Fully-connected (FC) layers

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 13
Feed-forward Neural Network

𝑥1 h1(1) h2(1)

𝑥2 h1(2) h2(2)

𝑥3 h1(3) h2(3)

h1(4) h2(4)

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 14
Feed-forward Neural Network

𝑥1 h1(1) h2(1)

𝑥2 h1(2) h2(2)
(3) (2) (1 ) (2) (2) (2 ) (3 ) (2) (4 )
h1(3) h 2 = 𝑓 (𝑤 3 ,1 h1 +𝑤3 , 2 h1 +𝑤3 , 3 h1 +𝑤3 , 4 h1 )
𝑥3 h2(3)

Non-linearity (activation function) : or

h1(4) h2(4)

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 15
Feedforward Neural Network for
Classification
• Use softmax to get the probability distribution

Neural networks are difficult to optimize.

SGD can only converge to local
minimum. Initializations and optimizers
matter a lot!

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 16
Activation Functions

• Add non-linearities into neural networks

• Allowing the neural networks to learn powerful operations

1
𝑓 (𝑥 )= −𝑥 𝑓 ( 𝑥 ) =max ⁡( 𝑥 , 0)
1+ 𝑒

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 17
Activation Functions

• GeLU (Gaussian Error Linear Unit)

• Used in GPT-3, BERT, and many other models

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 18
Loss Functions

• Given a labeled example , we use a neural network to estimate the conditional

probability and predict the label as

• We compute how close our prediction w.r.t. the true label by a loss function
• Classification: Cross-Entropy

• Regression: L1 loss, L2 loss (a.k.a. Mean Square Error)

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 19
Deep Learning Algorithm Sketch

• Create a model and define a loss (i.e., construct a computation graph)

• For each example
• Forward process: calculate the result (prediction & loss) of that example
• if training
• Perform back propagation
• Update parameters

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 20
Backpropagation

Forward propagation:
from input to output layer

Back propagation:
from output to input
layer

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 21
Back-propagation in PyTorch

PyTorch did back-propagation for you in this one line of code!

A toy pytorch example to train an NN model

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 22
Deep Learning Algorithm Sketch

• Create a model and define a loss (i.e., construct a computation graph)

• For each example
• Forward process: calculate the result (prediction & loss) of that example
• if training
• Perform back propagation
• Update parameters

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 23
Optimizer Update

• Most deep learning toolkits implement the parameter updates by calling

[Link]() function

Before optimizer update After optimizer update

Can be updated with standard SGD or Adam

optimizer

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 24
Standard SGD

• Standard stochastic gradient descent

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 25
Adam Optimizer

• Most standard optimization option in NLP and beyond

• Considers rolling average of gradient , and momentum

Momentum
Rolling Average of Gradient

Correction of bias
Further reading: how to use
the optimizer in
Pytorch
Final update the parameter

optimizer = [Link]([Link](), lr=0.0005, betas=(0.99,

0.999))

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 26
Adam Optimizer

• Gradient descent | Khan Academy

• Intuition of Adam Optimizer
• Blog: An updated overview of recent gradient descent algorithms
• (paper) Convex Optimization: Algorithms and Complexity
• Course: Optimization for Machine Learning

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 27
Learning Rate

learning rate schedule [another link], warmup

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 28
Tensors

• An n-dimensional array

• Widely used in neural networks

• Parameters in NNs consist of different shape of tensors, which store both their
values and gradients (e.g., x, [Link])

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 29
Tensor Operations
•

create tensors from list, [Link]

matrix multiply Element-wise matrix multiply

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 30
Efficiency Tricks: Mini-batching

• On modern hardware 10 operations of size 1 is much slower than 1 operation

of size 10
• Mini-batching combines together smaller operations into one big one
• About padding

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 31
Deep Learning Algorithm Sketch

Forward Pass

Python files
Data Neural Model Ground Truth • [Link]
Input Network Output (Golden Label) • [Link]
• [Link]
• [Link]
• [Link]
Backward Pass
Optimizer Training
Gradient Update Loss

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 32
Different Learnings
• Supervised/unsupervised learning
• Self-supervised learning
• Transfer learning
• Few-shot/zero-shot learning

Stanford STATS214 / CS229M: Machine Learning Theory

Guanhua Chen @ Stat-DS,

Department SUSTech and Data
of Statistics 33
Thank you

Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
Introduction to Deep Learning Techniques
No ratings yet
Introduction to Deep Learning Techniques
299 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Neuron 7 AI: Linear Threshold Units
No ratings yet
Neuron 7 AI: Linear Threshold Units
18 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Overview of Deep Learning Concepts
100% (2)
Overview of Deep Learning Concepts
49 pages
Lecture 4
No ratings yet
Lecture 4
45 pages
Deep Neural Network Optimization Techniques
No ratings yet
Deep Neural Network Optimization Techniques
23 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Deep Learning Tutorial for Business
No ratings yet
Deep Learning Tutorial for Business
58 pages
Ca 3 DL
No ratings yet
Ca 3 DL
6 pages
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
02 Neural Networks
No ratings yet
02 Neural Networks
32 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Deep Learning Training Techniques
No ratings yet
Deep Learning Training Techniques
23 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
16 DL 1
No ratings yet
16 DL 1
9 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
Deep Learning UNIT 5
No ratings yet
Deep Learning UNIT 5
182 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Understanding Neurons and Perceptrons
No ratings yet
Understanding Neurons and Perceptrons
23 pages
Deep Learning for Beginners
100% (1)
Deep Learning for Beginners
87 pages
Lecture 4
No ratings yet
Lecture 4
46 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Deep Learning: Hung-yi Lee 李宏毅
No ratings yet
Deep Learning: Hung-yi Lee 李宏毅
29 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
15 Deep
No ratings yet
15 Deep
39 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
SHAI - Task 3 - NN
No ratings yet
SHAI - Task 3 - NN
10 pages
Fundamentals of Deep Learning Explained
No ratings yet
Fundamentals of Deep Learning Explained
72 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Deep Learning Turorial PDF
No ratings yet
Deep Learning Turorial PDF
301 pages
Artificial Neural Networks Overview
No ratings yet
Artificial Neural Networks Overview
40 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
DL 2
No ratings yet
DL 2
62 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
78 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Week 1 - Artificial Neural Networks - Part I - Justin
No ratings yet
Week 1 - Artificial Neural Networks - Part I - Justin
56 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Hopfield Networks and Associative Memory
No ratings yet
Hopfield Networks and Associative Memory
66 pages
Automata Theory for CS Students
No ratings yet
Automata Theory for CS Students
40 pages
Ispr 21 22 CNN
No ratings yet
Ispr 21 22 CNN
101 pages
Mamba Architecture
No ratings yet
Mamba Architecture
4 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Neural Net
No ratings yet
Neural Net
15 pages
Monthly Demand Forecasting Analysis
No ratings yet
Monthly Demand Forecasting Analysis
8 pages
MITWPU - Unit 1-Theory of Computation-Merged
No ratings yet
MITWPU - Unit 1-Theory of Computation-Merged
299 pages
Notes On Function and Modules
100% (1)
Notes On Function and Modules
24 pages
Cse 477 Neural Network Final Term
No ratings yet
Cse 477 Neural Network Final Term
21 pages
Hcai Mock
100% (1)
Hcai Mock
5 pages
Theory of Computation
No ratings yet
Theory of Computation
10 pages
QB Tac
No ratings yet
QB Tac
4 pages
CS361 TOA Course Outline
No ratings yet
CS361 TOA Course Outline
3 pages
Chapter 5: Continuous Probability Distributions: Department of Mathematics Izmir University of Economics
No ratings yet
Chapter 5: Continuous Probability Distributions: Department of Mathematics Izmir University of Economics
42 pages
Unit 5 (DL)
No ratings yet
Unit 5 (DL)
18 pages
Complexity Theory for CS Students
No ratings yet
Complexity Theory for CS Students
65 pages
Binomial Distribution Guide
No ratings yet
Binomial Distribution Guide
20 pages
Top 100 Deep Learning Interview Questions
No ratings yet
Top 100 Deep Learning Interview Questions
157 pages
Multi Percept Ron
No ratings yet
Multi Percept Ron
14 pages
Artificial Neural Networks (Anns) : Foundations, Architectures, and Applications
No ratings yet
Artificial Neural Networks (Anns) : Foundations, Architectures, and Applications
6 pages
Session 2 Introduction To Generative AI
No ratings yet
Session 2 Introduction To Generative AI
17 pages
Demand Forecasting For Water Distribution Systems - 2014 - Procedia Engineering
No ratings yet
Demand Forecasting For Water Distribution Systems - 2014 - Procedia Engineering
4 pages
Introduction to Theory of Computation
No ratings yet
Introduction to Theory of Computation
6 pages
Dig: Scalable and Efficient Diffusion Models With Gated Linear Attention
No ratings yet
Dig: Scalable and Efficient Diffusion Models With Gated Linear Attention
18 pages
Meta-Learning Fast Weight Language Models
No ratings yet
Meta-Learning Fast Weight Language Models
7 pages
Probability Theory and Statistics Lab - Prof - S N Chandra Shekhar
No ratings yet
Probability Theory and Statistics Lab - Prof - S N Chandra Shekhar
30 pages
L6 Diffusion Models (SP24)
No ratings yet
L6 Diffusion Models (SP24)
210 pages
FRAM Time Series
No ratings yet
FRAM Time Series
30 pages
Excel Forecasting Techniques Overview
No ratings yet
Excel Forecasting Techniques Overview
257 pages