We reproduce the results for the paper, "Training Binary Neural Networks using the Bayesian Learning Rule". We make an end-to-end trainer for training Binary Neural Networks using various methods and Keras like usage. This is our entry for the ML Reproducibility Challenge 2020.
- BayesBiNN - This was the central method in the above-mentioned paper and gives a mathematically principled way of solving the discrete optimization problem in case BNNs.
- STE - This is another method for optimizing BNNs originally mentioned in the paper "Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation". This method gives a much smoother training as compared to BayesBiNN and also gives good performance on more complex tasks like Semantic Segmentation than Image Classification.
tqdm
torch
wandb #optional
torchvision
torchsummary
Pillow==7.2.2
opencv-python
pip install -r requirements.txtCurrently, our code explicitly supports MNIST, CIFAR-10 and CIFAR-100 datasets but small changes to the data loader file can extend it to other custom datasets.
We have added WandB support to monitor the training of models on a larger scale.
Use wandb_on argument to connect it to the WandB server.
python train_bayesbinn.py /path/to/config [wandb_on]
# Example python train_bayesbinn.py configs/mnist_config_bayesbinn.json wandb_on
{
"dataset": "cifar10",
"input_shape": 3,
"output_shape": 10,
"data_augmentation": true,
"epochs": 500,
"criterion": "crossentropy",
"validation_split": 0.1,
"lr_scheduler": "cosine",
"batch_size": 50,
"lr_init": 3e-4,
"lr_final": 1e-16,
"drop_prob": 0.2,
"batch_affine": false,
"model_architecture": "VGGBinaryConnect",
"mc_steps": 1,
"temperature": 1e-10,
"evaluate_steps": 0,
"momentum": 0.2
}Use wandb_on argument to connect it to the WandB server.
python train_ste.py /path/to/config [wandb_on]
# Example python train_ste.py configs/mnist_config_ste.json wandb_on
{
"dataset": "cifar10",
"input_shape": 3,
"output_shape": 10,
"data_augmentation": true,
"epochs": 500,
"criterion": "crossentropy",
"validation_split": 0.1,
"lr_scheduler": "cosine",
"batch_size": 50,
"lr_init": 0.01,
"lr_final": 1e-16,
"batch_affine": false,
"model_architecture": "VGGBinaryConnect",
"momentum": 0.2,
"grad_clip_value": 1,
"weight_clip_value": 1
}- Prateek Garg (IIT Delhi)
- Lakshya Singhal (IIT Delhi)
@misc{meng2020training,
title={Training Binary Neural Networks using the Bayesian Learning Rule},
author={Xiangming Meng and Roman Bachmann and Mohammad Emtiyaz Khan},
year={2020},
eprint={2002.10778},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{bengio2013estimating,
title={Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation},
author={Yoshua Bengio and Nicholas Léonard and Aaron Courville},
year={2013},
eprint={1308.3432},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.