Overview¶
- Neural architectures
- Training neural nets
- Forward pass: Tensor operations
- Backward pass: Backpropagation
- Neural network design:
- Activation functions
- Weight initialization
- Optimizers
- Neural networks in practice
- Model selection
- Early stopping
- Memorization capacity and information bottleneck
- L1/L2 regularization
- Dropout
- Batch normalization
Architecture¶
- Logistic regression, drawn in a different, neuro-inspired, way
- Linear model: inner product ($z$) of input vector $\mathbf{x}$ and weight vector $\mathbf{w}$, plus bias $w_0$
- Logistic (or sigmoid) function maps the output to a probability in [0,1]
- Uses log loss (cross-entropy) and gradient descent to learn the weights
$$\hat{y}(\mathbf{x}) = \text{sigmoid}(z) = \text{sigmoid}(w_0 + \mathbf{w}\mathbf{x}) = \text{sigmoid}(w_0 + w_1 * x_1 + w_2 * x_2 +... + w_p * x_p)$$