0% found this document useful (0 votes)
23 views1 page

Learning Dynamics in Class Imbalance

Poster of "A Theoretical Analysis of the Learning Dynamics under Class Imbalance" presented at ICML 2023

Uploaded by

francazi.1707964
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views1 page

Learning Dynamics in Class Imbalance

Poster of "A Theoretical Analysis of the Learning Dynamics under Class Imbalance" presented at ICML 2023

Uploaded by

francazi.1707964
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

A Theoretical Analysis of the Learning Dynamics

under Class Imbalance

E. Francazia,b, M. Baity Jesib, A. Lucchic


a
Physics Department, EPFL, Switzerland

b
SIAM Department, Eawag (ETH), Switzerland

c
Department of Mathematics and Computer Science, University of Basel, Switzerland
contact: [email protected]

Why Class Imbalance is interesting


Why Class Imbalance is interesting
Many
Many datasets
datasets areare affected
affected by Class
by Class Imbalance
Imbalance

Imbalance can significantly impact performance

fraud detectio
Performance of minority classes drops: 

spam identificatio
fraud detectio

Minority Initial Drop (MID


biodiversity monitorin
spam identificatio
...
biodiversity monitorin
Species [Kyathanahally et al. 2021]
... The learning dynamics

Faces [Zhang et al. 2017]
is delayed for 

imbalanced problems

Places [Wang et al. 2017]


Actions [Zhang et al. 2019]

How does class imbalance affect learning dynamics?

Class Imbalance causes drop in minority class performance (MID

This delays the learning process


GD and SGD are differently affected by Class Imbalance

Majority class gradient


MID is caused by differences in the per-class gradients
Minority class gradient
Whole dataset

Descent vector

Gradient of single example


Gradient Descent Stochastic Gradient Descent
Per-class full-batch gradient Gradient is dominated by Randomness due to batch
majority class contribution

random selection causes


Per-class full-batch normalized

gradient
directional noise in gradients

Per-class mini-batch normalized

gradient

Orthogonal projection
Minority class contribution has
negative scalar product with Directional noise is higher for
descent vector 
 minority class 

(per-class loss increases)
(lower signal along

full-batch per-class direction)

Gradient Descent Stochastic Gradient Descent


Class Imbalance induces a difference in the per-class gradient norms; 
 Imbalance induces a difference in the per-class gradient directional noise;

GD dynamics, which follow the gradient direction, will be ruled by the majority class. the signal along the full-batch direction (FBD) is damped more for minority class.
Equalizing per-class norms 

Normalizing the gradient contribution from each class (PCNGD)
 isnot enough (PCNSGD);
eliminates the gap between per-class performances. We need to equalize 

the projections along FBD (PCNSGD+R)

Per-class

normalization

Per-class

normalization (a) Full batch (b) One mini batch (c) Many mini batches

Conclusion
Class Imbalance induces differences in per-class gradients causing drop in minority class performance
minority class performance
GD: Per-class normalization allows for monotonic loss. We prove convergenc
GD: Per-class normalization allows for monotonic loss. We prove convergenc
SGD: Addictional directional noise must be taken into accoun
SGD: Addictional directional noise must be taken into accoun
Directional noise explains effectiveness of methods such as oversampling (O)
Directional noise explains effectiveness of methods such as oversampling (O) arXiv:2207.00391

You might also like