Lecture 2. Linear models¶

Basics of modeling, optimization, and regularization

Joaquin Vanschoren

Notation and Definitions¶

A scalar is a simple numeric value, denoted by an italic letter: $x=3.24$
A vector is a 1D ordered array of n scalars, denoted by a bold letter: $\mathbf{x}=[3.24, 1.2]$
- $x_i$ denotes the $i$th element of a vector, thus $x_0 = 3.24$.
  - Note: some other courses use $x^{(i)}$ notation
A set is an unordered collection of unique elements, denote by caligraphic capital: $\mathcal{S}=\{3.24, 1.2\}$
A matrix is a 2D array of scalars, denoted by bold capital: $\mathbf{X}=\begin{bmatrix} 3.24 & 1.2 \\ 2.24 & 0.2 \end{bmatrix}$
- $\textbf{X}_{i}$ denotes the $i$th row of the matrix
- $\textbf{X}_{:,j}$ denotes the $j$th column
- $\textbf{X}_{i,j}$ denotes the element in the $i$th row, $j$th column, thus $\mathbf{X}_{1,0} = 2.24$

$\mathbf{X}^{n \times p}$, an $n \times p$ matrix, can represent $n$ data points in a $p$-dimensional space
- Every row is a vector that can represent a point in an p-dimensional space, given a basis.
- The standard basis for a Euclidean space is the set of unit vectors
E.g. if $\mathbf{X}=\begin{bmatrix} 3.24 & 1.2 \\ 2.24 & 0.2 \\ 3.0 & 0.6 \end{bmatrix}$

$$\sum_{i=0}^{p} = x_0 + x_1 + ... + x_p \quad \prod_{i=0}^{p} = x_0 \cdot x_1 \cdot ... \cdot x_p$$

Operations on vectors are element-wise: e.g. $\mathbf{x}+\mathbf{z} = [x_0+z_0,x_1+z_1, ... , x_p+z_p]$
Dot product $\mathbf{w}\mathbf{x} = \mathbf{w} \cdot \mathbf{x} = \mathbf{w}^{T} \mathbf{x} = \sum_{i=0}^{p} w_i \cdot x_i = w_0 \cdot x_0 + w_1 \cdot x_1 + ... + w_p \cdot x_p$
Matrix product $\mathbf{W}\mathbf{x} = \begin{bmatrix} \mathbf{w_0} \cdot \mathbf{x} \\ ... \\ \mathbf{w_p} \cdot \mathbf{x} \end{bmatrix}$
A function $f(x) = y$ relates an input element $x$ to an output $y$
- It has a local minimum at $x=c$ if $f(x) \geq f(c)$ in interval $(c-\epsilon, c+\epsilon)$
- It has a global minimum at $x=c$ if $f(x) \geq f(c)$ for any value for $x$
A vector function consumes an input and produces a vector: $\mathbf{f}(\mathbf{x}) = \mathbf{y}$
$\underset{x\in X}{\operatorname{max}}f(x)$ returns the largest value f(x) for any x
$\underset{x\in X}{\operatorname{argmax}}f(x)$ returns the element x that maximizes f(x)

A derivative $f'$ of a function $f$ describes how fast $f$ grows or decreases
The process of finding a derivative is called differentiation
- Derivatives for basic functions are known
- For non-basic functions we use the chain rule: $F(x) = f(g(x)) \rightarrow F'(x)=f'(g(x))g'(x)$
A function is differentiable if it has a derivative in any point of it's domain
- It's continuously differentiable if $f'$ is a continuous function
- We say $f$ is smooth if it is infinitely differentiable, i.e., $f', f'', f''', ...$ all exist
A gradient $\nabla f$ is the derivative of a function in multiple dimensions
- It is a vector of partial derivatives: $\nabla f = \left[ \frac{\partial f}{\partial x_0}, \frac{\partial f}{\partial x_1},... \right]$
- E.g. $f=2x_0+3x_1^{2}-\sin(x_2) \rightarrow \nabla f= [2, 6x_1, -cos(x_2)]$

Example: $f = -(x_0^2+x_1^2)$
- $\nabla f = \left[\frac{\partial f}{\partial x_0},\frac{\partial f}{\partial x_1}\right] = \left[-2x_0,-2x_1\right]$
- Evaluated at point (-4,1): $\nabla f(-4,1) = [8,-2]$
  - These are the slopes at point (-4,1) in the direction of $x_0$ and $x_1$ respectively

interactive(children=(IntSlider(value=120, description='rotation', max=240, step=10), Output()), _dom_classes=…