DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets

This is the official implementation of the ICCV 2023 Workshop paper: DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets.

This paper presents a novel method for addressing data imbalance in machine learning. The method computes sample likelihoods based on image appearance using deep perceptual embeddings and clustering. It then uses these likelihoods to weigh samples differently during training with a proposed Generalized Focal Loss function. This loss can be easily integrated with deep learning algorithms. Experiments validate the method's effectiveness across autonomous driving vision datasets including KITTI and nuScenes. The loss function improves state-of-the-art 3D object detection methods, achieving over 200% AP gains on under-represented classes (Cyclist) in the KITTI dataset. The results demonstrate the method is generalizable, complements existing techniques, and is particularly beneficial for smaller datasets and rare classes.

Getting Started

TL;DR

The concept of this paper is simple: (1) Quantify the likelihood of occurrence for each sample in the training dataset (2) Compute Generalized Focal Loss based on the likelihoods (3) Train the model with the new weighted loss function.

Generalized Focal Loss requires computing a loss weight for each sample, called Dequity Weight, and can be computed as follows:

def dequity_loss_weight(self, p: float, 
                              eta: float=1.0, 
                              gamma: float=5.0
    ) -> float:
    """Calculate the Dquity Weight.
    Args:
        p (float): The probability of the sample.
        eta (float): The parameter to control the weight.
        gamma (float): The parameter to control the weight.
    Returns:
        float: The dequity loss weight.
    """
    return (eta + (1 - p) ** gamma) / (eta + 1)

The pseudo-code for the training algorithm is as follows:

for sample in dataloader:
    # retrieve the sample likelihood
    p = get_sample_likelihood(sample)
    # compute the DEquity weight
    w = dequity_loss_weight(p)
    # forward pass
    y_hat = model(sample)
    # compute the loss
    loss = loss_fn(y_hat, sample)
    # compute the weighted loss
    weighted_loss = w * loss   <-- Generalized Focal Loss (Our Contribution)
    # backward pass
    weighted_loss.backward()

Sample likelihoods are computed beforehand. Please refer to the Getting Started guide for more details.

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{shrivastava2023datasetequity,
  title={DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets},
  author={Shrivastava, Shubham and Zhang, Xianling and Nagesh, Sushruth and Parchami, Armin},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
BEVFormer @ 2d36d66		BEVFormer @ 2d36d66
bevfusion @ c4ac068		bevfusion @ c4ac068
data_analysis		data_analysis
dd3d @ 081a481		dd3d @ 081a481
docs		docs
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets

Getting Started

TL;DR

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

towardsautonomy/DatasetEquity

Folders and files

Latest commit

History

Repository files navigation

DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets

Getting Started

TL;DR

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages