Deep Learning Loss Functions &
Transformer Use Cases - Notes
1. Binary Cross Entropy (BCE)
Binary Cross Entropy is a loss function used for binary classification problems.
Formula:
BCE = -[y * log(p) + (1 - y) * log(1 - p)]
Where:
y = actual class label (0 or 1)
p = predicted probability (from sigmoid, between 0 and 1)
Usage:
- Classification tasks like spam detection, tumor detection (yes/no)
- Final layer uses sigmoid activation
- Used when outputs are binary or independent class probabilities
2. Log Loss
Log Loss is another name for Binary Cross Entropy.
- It uses logarithms to penalize wrong predictions harshly.
- Formula is the same as BCE.
- Commonly used in classification competitions and benchmarks.
3. Transformers for Forecasting - Loss Functions
The choice of loss function depends on the forecasting task:
1. Regression (predicting numeric values):
- MSE (Mean Squared Error): Penalizes large errors.
- MAE (Mean Absolute Error): More robust to outliers.
2. Probabilistic Forecasting (predicting uncertainty):
- Quantile Loss
- Negative Log-Likelihood (NLL)
3. Classification (binary output like up/down):
- Binary Cross Entropy
4. Multi-class Forecasting:
- Cross Entropy
Other Useful Losses:
- Huber Loss: Combination of MSE and MAE, useful for outliers with smooth gradients.
Recommendation: Start with MSE or MAE for standard forecasting tasks using transformers.