0% found this document useful (0 votes)
14 views2 pages

Loss Functions and Transformers Notes

The document discusses various loss functions used in deep learning, particularly Binary Cross Entropy (BCE) and Log Loss for binary classification tasks, and their formulas. It also outlines loss functions for forecasting tasks with transformers, including MSE, MAE, Quantile Loss, and Cross Entropy, recommending MSE or MAE for standard forecasting. Additionally, it mentions Huber Loss as a useful option for handling outliers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views2 pages

Loss Functions and Transformers Notes

The document discusses various loss functions used in deep learning, particularly Binary Cross Entropy (BCE) and Log Loss for binary classification tasks, and their formulas. It also outlines loss functions for forecasting tasks with transformers, including MSE, MAE, Quantile Loss, and Cross Entropy, recommending MSE or MAE for standard forecasting. Additionally, it mentions Huber Loss as a useful option for handling outliers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Deep Learning Loss Functions &

Transformer Use Cases - Notes


1. Binary Cross Entropy (BCE)
Binary Cross Entropy is a loss function used for binary classification problems.
Formula:
BCE = -[y * log(p) + (1 - y) * log(1 - p)]
Where:
y = actual class label (0 or 1)
p = predicted probability (from sigmoid, between 0 and 1)
Usage:
- Classification tasks like spam detection, tumor detection (yes/no)
- Final layer uses sigmoid activation
- Used when outputs are binary or independent class probabilities

2. Log Loss
Log Loss is another name for Binary Cross Entropy.
- It uses logarithms to penalize wrong predictions harshly.
- Formula is the same as BCE.
- Commonly used in classification competitions and benchmarks.

3. Transformers for Forecasting - Loss Functions


The choice of loss function depends on the forecasting task:
1. Regression (predicting numeric values):
- MSE (Mean Squared Error): Penalizes large errors.
- MAE (Mean Absolute Error): More robust to outliers.
2. Probabilistic Forecasting (predicting uncertainty):
- Quantile Loss
- Negative Log-Likelihood (NLL)
3. Classification (binary output like up/down):
- Binary Cross Entropy
4. Multi-class Forecasting:
- Cross Entropy

Other Useful Losses:


- Huber Loss: Combination of MSE and MAE, useful for outliers with smooth gradients.

Recommendation: Start with MSE or MAE for standard forecasting tasks using transformers.

You might also like