Notation and Definitions¶
- A scalar is a simple numeric value, denoted by an italic letter: $x=3.24$
- A vector is a 1D ordered array of n scalars, denoted by a bold letter: $\mathbf{x}=[3.24, 1.2]$
- $x_i$ denotes the $i$th element of a vector, thus $x_0 = 3.24$.
- Note: some other courses use $x^{(i)}$ notation
- $x_i$ denotes the $i$th element of a vector, thus $x_0 = 3.24$.
- A set is an unordered collection of unique elements, denote by caligraphic capital: $\mathcal{S}=\{3.24, 1.2\}$
- A matrix is a 2D array of scalars, denoted by bold capital: $\mathbf{X}=\begin{bmatrix}
3.24 & 1.2 \\
2.24 & 0.2
\end{bmatrix}$
- $\textbf{X}_{i}$ denotes the $i$th row of the matrix
- $\textbf{X}_{:,j}$ denotes the $j$th column
- $\textbf{X}_{i,j}$ denotes the element in the $i$th row, $j$th column, thus $\mathbf{X}_{1,0} = 2.24$
- $\mathbf{X}^{n \times p}$, an $n \times p$ matrix, can represent $n$ data points in a $p$-dimensional space
- Every row is a vector that can represent a point in an p-dimensional space, given a basis.
- The standard basis for a Euclidean space is the set of unit vectors
- E.g. if $\mathbf{X}=\begin{bmatrix} 3.24 & 1.2 \\ 2.24 & 0.2 \\ 3.0 & 0.6 \end{bmatrix}$
- A tensor is an k-dimensional array of data, denoted by an italic capital: $T$
- k is also called the order, degree, or rank
- $T_{i,j,k,...}$ denotes the element or sub-tensor in the corresponding position
- A set of color images can be represented by:
- a 4D tensor (sample x height x width x color channel)
- a 2D tensor (sample x flattened vector of pixel values)

Basic operations¶
- Sums and products are denoted by capital Sigma and capital Pi:
$$\sum_{i=0}^{p} = x_0 + x_1 + ... + x_p \quad \prod_{i=0}^{p} = x_0 \cdot x_1 \cdot ... \cdot x_p$$
- Operations on vectors are element-wise: e.g. $\mathbf{x}+\mathbf{z} = [x_0+z_0,x_1+z_1, ... , x_p+z_p]$
- Dot product $\mathbf{w}\mathbf{x} = \mathbf{w} \cdot \mathbf{x} = \mathbf{w}^{T} \mathbf{x} = \sum_{i=0}^{p} w_i \cdot x_i = w_0 \cdot x_0 + w_1 \cdot x_1 + ... + w_p \cdot x_p$
- Matrix product $\mathbf{W}\mathbf{x} = \begin{bmatrix} \mathbf{w_0} \cdot \mathbf{x} \\ ... \\ \mathbf{w_p} \cdot \mathbf{x} \end{bmatrix}$
- A function $f(x) = y$ relates an input element $x$ to an output $y$
- It has a local minimum at $x=c$ if $f(x) \geq f(c)$ in interval $(c-\epsilon, c+\epsilon)$
- It has a global minimum at $x=c$ if $f(x) \geq f(c)$ for any value for $x$
- A vector function consumes an input and produces a vector: $\mathbf{f}(\mathbf{x}) = \mathbf{y}$
- $\underset{x\in X}{\operatorname{max}}f(x)$ returns the largest value f(x) for any x
- $\underset{x\in X}{\operatorname{argmax}}f(x)$ returns the element x that maximizes f(x)
Gradients¶
- A derivative $f'$ of a function $f$ describes how fast $f$ grows or decreases
- The process of finding a derivative is called differentiation
- Derivatives for basic functions are known
- For non-basic functions we use the chain rule: $F(x) = f(g(x)) \rightarrow F'(x)=f'(g(x))g'(x)$
- A function is differentiable if it has a derivative in any point of it's domain
- It's continuously differentiable if $f'$ is a continuous function
- We say $f$ is smooth if it is infinitely differentiable, i.e., $f', f'', f''', ...$ all exist
- A gradient $\nabla f$ is the derivative of a function in multiple dimensions
- It is a vector of partial derivatives: $\nabla f = \left[ \frac{\partial f}{\partial x_0}, \frac{\partial f}{\partial x_1},... \right]$
- E.g. $f=2x_0+3x_1^{2}-\sin(x_2) \rightarrow \nabla f= [2, 6x_1, -cos(x_2)]$
- Example: $f = -(x_0^2+x_1^2)$
- $\nabla f = \left[\frac{\partial f}{\partial x_0},\frac{\partial f}{\partial x_1}\right] = \left[-2x_0,-2x_1\right]$
- Evaluated at point (-4,1): $\nabla f(-4,1) = [8,-2]$
- These are the slopes at point (-4,1) in the direction of $x_0$ and $x_1$ respectively
interactive(children=(IntSlider(value=120, description='rotation', max=240, step=10), Output()), _dom_classes=…