Lecture 2. Linear models¶

Basics of modeling, optimization, and regularization

Joaquin Vanschoren

Notation and Definitions¶

  • A scalar is a simple numeric value, denoted by an italic letter: $x=3.24$
  • A vector is a 1D ordered array of n scalars, denoted by a bold letter: $\mathbf{x}=[3.24, 1.2]$
    • $x_i$ denotes the $i$th element of a vector, thus $x_0 = 3.24$.
      • Note: some other courses use $x^{(i)}$ notation
  • A set is an unordered collection of unique elements, denote by caligraphic capital: $\mathcal{S}=\{3.24, 1.2\}$
  • A matrix is a 2D array of scalars, denoted by bold capital: $\mathbf{X}=\begin{bmatrix} 3.24 & 1.2 \\ 2.24 & 0.2 \end{bmatrix}$
    • $\textbf{X}_{i}$ denotes the $i$th row of the matrix
    • $\textbf{X}_{:,j}$ denotes the $j$th column
    • $\textbf{X}_{i,j}$ denotes the element in the $i$th row, $j$th column, thus $\mathbf{X}_{1,0} = 2.24$
  • $\mathbf{X}^{n \times p}$, an $n \times p$ matrix, can represent $n$ data points in a $p$-dimensional space
    • Every row is a vector that can represent a point in an p-dimensional space, given a basis.
    • The standard basis for a Euclidean space is the set of unit vectors
  • E.g. if $\mathbf{X}=\begin{bmatrix} 3.24 & 1.2 \\ 2.24 & 0.2 \\ 3.0 & 0.6 \end{bmatrix}$
No description has been provided for this image
  • A tensor is an k-dimensional array of data, denoted by an italic capital: $T$
    • k is also called the order, degree, or rank
    • $T_{i,j,k,...}$ denotes the element or sub-tensor in the corresponding position
    • A set of color images can be represented by:
      • a 4D tensor (sample x height x width x color channel)
      • a 2D tensor (sample x flattened vector of pixel values)

ml

Basic operations¶

  • Sums and products are denoted by capital Sigma and capital Pi:

$$\sum_{i=0}^{p} = x_0 + x_1 + ... + x_p \quad \prod_{i=0}^{p} = x_0 \cdot x_1 \cdot ... \cdot x_p$$

  • Operations on vectors are element-wise: e.g. $\mathbf{x}+\mathbf{z} = [x_0+z_0,x_1+z_1, ... , x_p+z_p]$
  • Dot product $\mathbf{w}\mathbf{x} = \mathbf{w} \cdot \mathbf{x} = \mathbf{w}^{T} \mathbf{x} = \sum_{i=0}^{p} w_i \cdot x_i = w_0 \cdot x_0 + w_1 \cdot x_1 + ... + w_p \cdot x_p$
  • Matrix product $\mathbf{W}\mathbf{x} = \begin{bmatrix} \mathbf{w_0} \cdot \mathbf{x} \\ ... \\ \mathbf{w_p} \cdot \mathbf{x} \end{bmatrix}$
  • A function $f(x) = y$ relates an input element $x$ to an output $y$
    • It has a local minimum at $x=c$ if $f(x) \geq f(c)$ in interval $(c-\epsilon, c+\epsilon)$
    • It has a global minimum at $x=c$ if $f(x) \geq f(c)$ for any value for $x$
  • A vector function consumes an input and produces a vector: $\mathbf{f}(\mathbf{x}) = \mathbf{y}$
  • $\underset{x\in X}{\operatorname{max}}f(x)$ returns the largest value f(x) for any x
  • $\underset{x\in X}{\operatorname{argmax}}f(x)$ returns the element x that maximizes f(x)

Gradients¶

  • A derivative $f'$ of a function $f$ describes how fast $f$ grows or decreases
  • The process of finding a derivative is called differentiation
    • Derivatives for basic functions are known
    • For non-basic functions we use the chain rule: $F(x) = f(g(x)) \rightarrow F'(x)=f'(g(x))g'(x)$
  • A function is differentiable if it has a derivative in any point of it's domain
    • It's continuously differentiable if $f'$ is a continuous function
    • We say $f$ is smooth if it is infinitely differentiable, i.e., $f', f'', f''', ...$ all exist
  • A gradient $\nabla f$ is the derivative of a function in multiple dimensions
    • It is a vector of partial derivatives: $\nabla f = \left[ \frac{\partial f}{\partial x_0}, \frac{\partial f}{\partial x_1},... \right]$
    • E.g. $f=2x_0+3x_1^{2}-\sin(x_2) \rightarrow \nabla f= [2, 6x_1, -cos(x_2)]$
  • Example: $f = -(x_0^2+x_1^2)$
    • $\nabla f = \left[\frac{\partial f}{\partial x_0},\frac{\partial f}{\partial x_1}\right] = \left[-2x_0,-2x_1\right]$
    • Evaluated at point (-4,1): $\nabla f(-4,1) = [8,-2]$
      • These are the slopes at point (-4,1) in the direction of $x_0$ and $x_1$ respectively
interactive(children=(IntSlider(value=120, description='rotation', max=240, step=10), Output()), _dom_classes=…