0% found this document useful (0 votes)

16 views29 pages

CSE465 T2 Mathematics For DL

Uploaded by

adib136718

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views29 pages

CSE465 T2 Mathematics For DL

Uploaded by

adib136718

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CSE465

Lecture 2

Linear Algebra and Probability

Review

Silvia Ahmed (SvA) CSE465 ECE@NSU 2

Why are Linear Algebra & Probability Critical for DL?
• Linear Algebra:
• The "language" of neural networks.
• Data represented as vectors and matrices.
• Network operations (transformations, weights) are matrix
multiplications.
• Probability:
• Understanding data distributions.
• Interpreting model outputs (e.g., classification probabilities).
• Formulating loss functions and regularization.
• Uncertainty modeling.

Silvia Ahmed (SvA) CSE465 ECE@NSU 3

CSE465

Linear Algebra Essentials

CSE465 4
Vectors: Definition
• Ordered list of numbers (e.g., [x1, x2, x3]).
• In deep learning, vectors are commonly used to represent individual data points,
features of an input, or learned representations (embeddings).
• Definition: A vector v of dimension n can be written as:
𝑣1
𝑣
𝑣= 2
⋮
𝑣𝑛
• This is a column vector
[conventionally used in DL for inputs and features]
• A row vector would be 𝑣 𝑇 = 𝑣1 𝑣2 ⋯ 𝑣𝑛
• Geometric interpretation:
• a point in n-dimensional space
• or a directed line segment from the origin to that point.

Silvia Ahmed (SvA) CSE465 ECE@NSU 5

Vectors: Operations
• Vector Addition: Element-wise sum of two vectors of the same dimension.

• Scalar Multiplication: Multiplying a vector by a scalar (a single number)

scales its magnitude.

Silvia Ahmed (SvA) CSE465 ECE@NSU 6

Vectors: DL context
• An image can be "flattened" into a vector of pixel values
• A word can be represented by a "word embedding" vector
• The features describing a data point (e.g., height, weight, age)
form a feature vector

Silvia Ahmed (SvA) CSE465 ECE@NSU 7

Matrices
• Rectangular arrays of numbers (e.g., 2x3 matrix)
• Dimensions: rows x columns
• In deep learning, matrices are predominantly used to represent collections
of data points (mini-batches), weights connecting layers in a neural
network, or learned transformations.
• Definition: A matrix A with m rows and n columns (an m×n matrix) is written
as:

Silvia Ahmed (SvA) CSE465 ECE@NSU 8

Special Matrices
• Identity matrix (I): A square matrix with ones on the main diagonal and
zeros elsewhere. When multiplied by another matrix, it leaves the matrix
unchanged.

• Diagonal Matrix: A square matrix where all off-diagonal elements are

zero.
• Symmetric Matrix: A square matrix where A = AT (transpose of A equals
A).

Silvia Ahmed (SvA) CSE465 ECE@NSU 9

Matrices: Operations
• Matrix Addition/Subtraction: Element-wise operations,
requiring matrices to have the same dimensions.
• Scalar Multiplication: Multiplying every element of the matrix
by a scalar.
• Matrix Transpose (AT): Swapping rows and columns of a
matrix. If A is m x n, then AT is n x m.

Silvia Ahmed (SvA) CSE465 ECE@NSU 10

Matrix Multiplication
• Not element-wise! Dot product of rows and columns
• Definition: The product of two matrices A (size: m x k) and B (size: k × n)
results in a matrix C (size: m x n). Each element cij of C is the dot product
of the i-th row of A and the j-th column of B:

• Rules for Multiplication: For A×B, the number of columns in A must equal
the number of rows in B.

Silvia Ahmed (SvA) CSE465 ECE@NSU 11

Matrix Multiplication (contd.)
• Non-Commutativity: In general, AB ≠ BA. The order of multiplication
matters.
• Geometric Interpretation: Matrix multiplication can represent various
geometric transformations such as scaling, rotation, reflection, and
projection.
• DL Context:
• The core operation within a neural network layer: y=Wx+b, where x is the input vector,
W is the weight matrix, b is the bias vector, and y is the output vector.
• Processing mini-batches: If X is a matrix where each row is a data point (or each
column, depending on convention), and W is a weight matrix, then XW or WX
processes the entire batch efficiently.

Silvia Ahmed (SvA) CSE465 ECE@NSU 12

Dot Product (Vector Inner Product)

Silvia Ahmed (SvA) CSE465 ECE@NSU 13

Dot Product (Vector Inner Product) (contd.)
• DL Context:
• The "weighted sum" in a neuron's activation: The dot product of input features x and
weights w (i.e., w⋅x) forms the core of many activation functions before a non-linearity
is applied.

• Attention mechanisms in advanced architectures use dot products to compute

similarity scores between different parts of the input.

Silvia Ahmed (SvA) CSE465 ECE@NSU 14

CSE465

Matrix Calculus for Deep

Learning

CSE465 15
Matrix Calculus - Introduction
• Why is this hard but crucial?
• Deep Learning involves optimizing functions (loss functions) with millions of
parameters.
• We need to compute gradients to update these parameters efficiently.
• "Backpropagation" is essentially repeated application of the chain rule.

• Review:
𝑑𝑦
• Scalar Derivatives: (slope)
𝑑𝑥
𝑑𝑧 𝑑𝑧 𝑑𝑦
• Chain Rule: = (how changes propagate)
𝑑𝑥 𝑑𝑦 𝑑𝑥

Silvia Ahmed (SvA) CSE465 ECE@NSU 16

Matrix Calculus - Vector Derivatives
• Gradient (𝜵𝒙 𝒇(𝒙)): Derivative of a scalar function (f) with
respect to a vector input (x).
• Result: A vector of partial derivatives.

Silvia Ahmed (SvA) CSE465 ECE@NSU 17

Vector Derivatives (contd.)
• Interpretation: The gradient vector points in the direction of the
steepest increase of the function 𝑓. In optimization algorithms
like gradient descent, we move in the opposite direction of the
gradient to find a minimum.
• DL Context: This is used to compute the gradients of the loss
function (a scalar value) with respect to the network's
parameters (weights and biases, which are vectors or matrices).
For instance, ∇𝑤 ℒ tells us how to adjust the weight vector w to
reduce the loss ℒ.

Silvia Ahmed (SvA) CSE465 ECE@NSU 18

Vector Derivatives (contd.)
• Jacobian Matrix (J): Derivative of a vector-valued function with
respect to a vector input.

Silvia Ahmed (SvA) CSE465 ECE@NSU 19

Vector Derivatives (contd.)
• DL Context: Jacobians are essential for understanding how errors
propagate between layers. If we have a sequence of operations like 𝑧 =
𝑑𝑧 𝑑𝑧 𝑑𝑦
𝑔(𝑦) and 𝑦 = 𝑓(𝑥), then = (using the chain rule), where these
𝑑𝑥 𝑑𝑦 𝑑𝑥

"derivatives" are actually Jacobian matrices.

Silvia Ahmed (SvA) CSE465 ECE@NSU 20

Matrix Derivatives
• In deep learning, we frequently need to compute the derivative of a scalar
loss function with respect to a matrix of weights.
• Derivative of a scalar with respect to a matrix:

Silvia Ahmed (SvA) CSE465 ECE@NSU 21

Backpropagation as Chain Rule Application
1
xi1 𝑤11
2
1
𝑤12 𝑏11 𝑤11
𝑏21 𝑦ො𝑖
1
𝑤21 𝑏12 𝑤21
2

xi2 1
𝑤22

Silvia Ahmed (SvA) CSE465 ECE@NSU 22

CSE465

Probability Distributions for

Deep Learning

CSE465 23
Probability Review - Basics
• Random Variables (RV): Outcomes of random phenomena.
• Discrete: Countable outcomes (e.g., coin flip).
• Continuous: Infinite outcomes within a range (e.g., height,
temperature).
• Probability Mass Function (PMF): For discrete RVs, it gives
the probability that the random variable takes on a specific
value. P(X = x).
• Probability Density Function (PDF): For continuous RVs, it
describes the likelihood of the random variable falling within a
particular range of values. The probability of X being in an
𝑏
interval [a,b] is ‫𝑥𝑑 𝑥 𝑓 𝑎׬‬.

Silvia Ahmed (SvA) CSE465 ECE@NSU 24

Probability Review – Basics (contd.)
• Expectation (E[X]): The average or mean value of a random
variable.
• For discrete RV: E[X]=∑xP(X=x).
• For continuous RV: E[X]=∫xf(x)dx.
• Variance (Var[X]): A measure of how spread out the values of a
random variable are from its mean.
• 𝑉𝑎𝑟 𝑋 = 𝐸[ 𝑋 − 𝐸[𝑋] 2 ].

• Standard Deviation is 𝜎 = 𝑉𝑎𝑟[𝑋]

Silvia Ahmed (SvA) CSE465 ECE@NSU 25

Key Probability Distributions for DL - Bernoulli
• Describes a single experiment with only two possible outcomes (e.g.,
success/failure, 0/1), where the probability of "success" is p.
• PMF:
𝑝, 𝑖𝑓 𝑘 = 1
𝑓 𝑥 =ቊ
1 − 𝑝, 𝑖𝑓 𝑘 = 0
This can also be written as 𝑃 𝑋 = 𝑘 = 𝑝𝑘 1 − 𝑝 1−𝑘

• DL context:
• Binary Classification: Output of a sigmoid layer can be interpreted as p.
• Dropout: Each neuron is independently "dropped" (set to 0) with a Bernoulli
probability.

Silvia Ahmed (SvA) CSE465 ECE@NSU 26

Key Probability Distributions for DL - Categorical
• A generalization of the Bernoulli distribution for discrete random variables
with more than two possible outcomes (e.g., rolling a die, classifying an
image into one of several categories).
• It has k mutually exclusive possible outcomes, each with a specific
probability pi, where σ𝑘𝑖=1 𝑝𝑖 = 1
• PMF: 𝑃 𝑋 = 𝑖 = 𝑝𝑖 𝑓𝑜𝑟 𝑖 ∈ 1, 2, … , 𝑘 .
• DL context:
• Multi-class Classification: Output of a softmax layer gives probabilities for each
class.
• Language Modeling: Predicting the next word from a vocabulary.

Silvia Ahmed (SvA) CSE465 ECE@NSU 27

Key Probability Distributions for DL - Normal
(Gaussian)
• The "bell curve" distribution.
• Parameters: Mean (μ) and Standard Deviation (σ).
• PDF:

• DL context:
• Weight Initialization: Often initialized from Normal distributions (e.g., Xavier, He).
• Generative Models: VAEs often model latent spaces or outputs as Gaussian.
• Loss Functions: L2 Loss (MSE) is related to maximizing likelihood under Gaussian
noise.
• Regularization: Adding Gaussian noise to inputs/activations.

Silvia Ahmed (SvA) CSE465 ECE@NSU 28

Summary
• Linear Algebra:
• Matrix Multiplication: The fundamental operation of neural networks.
• Matrix Calculus / Gradients: How neural networks learn
(backpropagation).
• Chain Rule: The mathematical backbone of backpropagation.
• Probability:
• Bernoulli: Binary outcomes, dropout.
• Categorical (Softmax): Multi-class classification.
• Normal: Weight initialization, generative models, noise.

Silvia Ahmed (SvA) CSE465 ECE@NSU 29

Deep-Learning
No ratings yet
Deep-Learning
28 pages
DL Notes Unit 1
No ratings yet
DL Notes Unit 1
28 pages
1 & 2 Linear Algebra and Probability Distribution
No ratings yet
1 & 2 Linear Algebra and Probability Distribution
11 pages
Linear Algebra in Deep Learning
No ratings yet
Linear Algebra in Deep Learning
25 pages
Mathematics For AI
No ratings yet
Mathematics For AI
5 pages
V Aids Ad3501 DL Unit-1
No ratings yet
V Aids Ad3501 DL Unit-1
70 pages
Data Science
No ratings yet
Data Science
74 pages
Complete UNIT III DEEP LEARNING
No ratings yet
Complete UNIT III DEEP LEARNING
126 pages
Unit 1
No ratings yet
Unit 1
39 pages
Data - Science and - Artificial - Intelligence
No ratings yet
Data - Science and - Artificial - Intelligence
106 pages
DL Unit 1
No ratings yet
DL Unit 1
65 pages
Maths Roadmap For Machine Learning
No ratings yet
Maths Roadmap For Machine Learning
21 pages
1 2.-Maths ML
No ratings yet
1 2.-Maths ML
18 pages
Matrix Calculus for Deep Learning
No ratings yet
Matrix Calculus for Deep Learning
33 pages
Derivative Networks Reducedversion - 2022
No ratings yet
Derivative Networks Reducedversion - 2022
14 pages
Lecture 2 - Math
No ratings yet
Lecture 2 - Math
39 pages
Lec1 Mathreview
No ratings yet
Lec1 Mathreview
61 pages
Matrix Calculus for Deep Learning
No ratings yet
Matrix Calculus for Deep Learning
34 pages
Unit - 1 MACHINE LEARNING BASICS, LINEAR ALGEBRA
No ratings yet
Unit - 1 MACHINE LEARNING BASICS, LINEAR ALGEBRA
41 pages
009 Neural - Networks Complete
No ratings yet
009 Neural - Networks Complete
61 pages
Unit 1 1
No ratings yet
Unit 1 1
6 pages
Ai Application
No ratings yet
Ai Application
28 pages
Deep Learning Course Guide
No ratings yet
Deep Learning Course Guide
37 pages
1 9780692196380 FM
No ratings yet
1 9780692196380 FM
3 pages
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
No ratings yet
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
84 pages
Essential Math For AI - ML
100% (1)
Essential Math For AI - ML
22 pages
Mml-Book Removed
No ratings yet
Mml-Book Removed
295 pages
Linear Algebra111
No ratings yet
Linear Algebra111
9 pages
Math for ML Enthusiasts
No ratings yet
Math for ML Enthusiasts
100 pages
Maths Roadmap For Machine Learning
No ratings yet
Maths Roadmap For Machine Learning
16 pages
Math Prelims
No ratings yet
Math Prelims
40 pages
Linear Algebra Tutorial for ML & DL
No ratings yet
Linear Algebra Tutorial for ML & DL
33 pages
Background Material Crib-Sheet: 1 Probability Theory
No ratings yet
Background Material Crib-Sheet: 1 Probability Theory
4 pages
Basic Concepts For Understanding ML & DL
No ratings yet
Basic Concepts For Understanding ML & DL
8 pages
Deep Learning
No ratings yet
Deep Learning
142 pages
L3-7 Mathematical Foundations
No ratings yet
L3-7 Mathematical Foundations
25 pages
Linear Algebra for ML Beginners
No ratings yet
Linear Algebra for ML Beginners
27 pages
Final2 Math EE
No ratings yet
Final2 Math EE
77 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
AI Teacher Training - Machine Learning Curriculum
No ratings yet
AI Teacher Training - Machine Learning Curriculum
34 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2024-12-14 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2024-12-14 Reference-Material-I
36 pages
01 Section 2.1 QR Code Content
No ratings yet
01 Section 2.1 QR Code Content
23 pages
1 Linear Algebra Basics 25-07-2024
No ratings yet
1 Linear Algebra Basics 25-07-2024
30 pages
Leniear Algebra Operation For Machine Learning
No ratings yet
Leniear Algebra Operation For Machine Learning
10 pages
Module 1 Lecture 3 - Linear Algibra
No ratings yet
Module 1 Lecture 3 - Linear Algibra
34 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
20 pages
Derivatives and Backpropagation Guide
No ratings yet
Derivatives and Backpropagation Guide
7 pages
NER and Backpropagation in NLP
No ratings yet
NER and Backpropagation in NLP
80 pages
Math Review For ML
No ratings yet
Math Review For ML
41 pages
Machine Learning Notation: 1 Numbers & Arrays 4 Functions
No ratings yet
Machine Learning Notation: 1 Numbers & Arrays 4 Functions
2 pages
Mathophilia
No ratings yet
Mathophilia
18 pages
Thomas Minka - Note On Matrix Calculus and Algebra
No ratings yet
Thomas Minka - Note On Matrix Calculus and Algebra
19 pages
EML Couse Outcome
No ratings yet
EML Couse Outcome
2 pages
Data Science Unit - 3 - 31.8.23
No ratings yet
Data Science Unit - 3 - 31.8.23
62 pages
Notation Example
No ratings yet
Notation Example
11 pages
Neural Networks: Gradients & Backpropagation
No ratings yet
Neural Networks: Gradients & Backpropagation
83 pages
CENG3300 Lecture 2-1
No ratings yet
CENG3300 Lecture 2-1
21 pages
Advanced Oscillation Analysis
No ratings yet
Advanced Oscillation Analysis
40 pages
Clojure For Machine Learning: Chapter No. 1 "Working With Matrices"
100% (2)
Clojure For Machine Learning: Chapter No. 1 "Working With Matrices"
38 pages
Algebra and More For Analytics
No ratings yet
Algebra and More For Analytics
29 pages
Matrix and Equation Fundamentals
No ratings yet
Matrix and Equation Fundamentals
4 pages
Python Programming Final Lab Notes - Jupyter Notebook
No ratings yet
Python Programming Final Lab Notes - Jupyter Notebook
21 pages
Tamil Nadu Diploma Engineering Syllabus
No ratings yet
Tamil Nadu Diploma Engineering Syllabus
98 pages
Understand The Meaning of Multiplication: Dear Family
No ratings yet
Understand The Meaning of Multiplication: Dear Family
2 pages
Vectors and Matrices - Review.
No ratings yet
Vectors and Matrices - Review.
11 pages
Lecture Notes S1 2024
No ratings yet
Lecture Notes S1 2024
57 pages
Class 12th Maths Syllabus 2025 - 26
No ratings yet
Class 12th Maths Syllabus 2025 - 26
4 pages
Mathematics (Xi-Xii) (Code No. 041) Session - 2022-23
No ratings yet
Mathematics (Xi-Xii) (Code No. 041) Session - 2022-23
11 pages
Course Syllabus Second Sem
No ratings yet
Course Syllabus Second Sem
23 pages
CampusX Data Science Mentorship Program
No ratings yet
CampusX Data Science Mentorship Program
52 pages
Maths Ana Applied Maths 319
No ratings yet
Maths Ana Applied Maths 319
9 pages
Unit-1 Matrices & Determinants
No ratings yet
Unit-1 Matrices & Determinants
27 pages
SIMD Computer Organizations
0% (1)
SIMD Computer Organizations
20 pages
Matrices and Matrix Operations
No ratings yet
Matrices and Matrix Operations
12 pages
Chp11 Inverse
No ratings yet
Chp11 Inverse
13 pages
ADA Practical Solution
No ratings yet
ADA Practical Solution
91 pages
Elements of Mathematics For Economics and Finance 2nd Edition Vassilis C. Mavron Sample
No ratings yet
Elements of Mathematics For Economics and Finance 2nd Edition Vassilis C. Mavron Sample
83 pages
Determinants: Key Concepts for JEE
No ratings yet
Determinants: Key Concepts for JEE
125 pages
Matrices Lecture 1 PDF
No ratings yet
Matrices Lecture 1 PDF
33 pages
Nist Fips 203
No ratings yet
Nist Fips 203
56 pages
Maths Selection Paper 12 2023-2024
No ratings yet
Maths Selection Paper 12 2023-2024
8 pages
10 Mathematics
No ratings yet
10 Mathematics
31 pages
Quantitative Analysis
No ratings yet
Quantitative Analysis
303 pages
Professor Teuvo Kohonen (Auth.) Self-Organizing Maps
100% (1)
Professor Teuvo Kohonen (Auth.) Self-Organizing Maps
437 pages
Invertible Matrices Explained in Tamil
No ratings yet
Invertible Matrices Explained in Tamil
127 pages
Unit-3: Divide and Conquer Algorithms
No ratings yet
Unit-3: Divide and Conquer Algorithms
78 pages

CSE465 T2 Mathematics For DL

Uploaded by

CSE465 T2 Mathematics For DL

Uploaded by

CSE465

Linear Algebra and Probability

CSE465: Pattern Recognition and Neural Network

Silvia Ahmed (SvA) CSE465 ECE@NSU 2

Silvia Ahmed (SvA) CSE465 ECE@NSU 3

Linear Algebra Essentials

Silvia Ahmed (SvA) CSE465 ECE@NSU 5

• Scalar Multiplication: Multiplying a vector by a scalar (a single number)

Silvia Ahmed (SvA) CSE465 ECE@NSU 6

Silvia Ahmed (SvA) CSE465 ECE@NSU 7

Silvia Ahmed (SvA) CSE465 ECE@NSU 8

• Diagonal Matrix: A square matrix where all off-diagonal elements are

Silvia Ahmed (SvA) CSE465 ECE@NSU 9

Silvia Ahmed (SvA) CSE465 ECE@NSU 10

Silvia Ahmed (SvA) CSE465 ECE@NSU 11

Silvia Ahmed (SvA) CSE465 ECE@NSU 12

Silvia Ahmed (SvA) CSE465 ECE@NSU 13

• Attention mechanisms in advanced architectures use dot products to compute

Silvia Ahmed (SvA) CSE465 ECE@NSU 14

Matrix Calculus for Deep

Silvia Ahmed (SvA) CSE465 ECE@NSU 16

Silvia Ahmed (SvA) CSE465 ECE@NSU 17

Silvia Ahmed (SvA) CSE465 ECE@NSU 18

Silvia Ahmed (SvA) CSE465 ECE@NSU 19

"derivatives" are actually Jacobian matrices.

Silvia Ahmed (SvA) CSE465 ECE@NSU 20

Silvia Ahmed (SvA) CSE465 ECE@NSU 21

Silvia Ahmed (SvA) CSE465 ECE@NSU 22

Probability Distributions for

Silvia Ahmed (SvA) CSE465 ECE@NSU 24

• Standard Deviation is 𝜎 = 𝑉𝑎𝑟[𝑋]

Silvia Ahmed (SvA) CSE465 ECE@NSU 25

Silvia Ahmed (SvA) CSE465 ECE@NSU 26

Silvia Ahmed (SvA) CSE465 ECE@NSU 27

Silvia Ahmed (SvA) CSE465 ECE@NSU 28

Silvia Ahmed (SvA) CSE465 ECE@NSU 29

You might also like