Add support for bitsandbytes by manuelciosici · Pull Request #15622 · huggingface/transformers

manuelciosici · 2022-02-11T13:01:13Z

What does this PR do?

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Need to instrument CI to install bnb (binary package so a bit trickier than normal dependency)
I have implemented a CLI parameter to support bitsandbytes
I did not write any documentation yet
I followed @TimDettmers 's suggestion to override the embedding layers. However, I am unsure about a couple of things:
- Does the override need to happen before the model is loaded onto the GPU as the official documentation describes for other overrides?
- Are there any pitfalls to my current approach to identifying Embedding layers? It seems to work fine for RoBERTa and for GPT-2.
So far, I've used run_mlm.py and run_clm.py from the examples directory to check that the code runs. Using RTX A6000 GPUs, I see

Model	visible devices	optimizer	per device batch size	GPU memory
gpt2-large	0	adamw_torch	2	48638MiB / 49140MiB
gpt2-large	0	adamw_bnb	2	42412MiB / 49140MiB
gpt2-large	0,1	adamw_torch	1	30724MiB / 49140MiB 21040MiB / 49140MiB
gpt2-large	0,1	adamw_torch	2	OOM
gpt2-large	0,1	adamw_bnb	1	26820MiB / 49140MiB 21042MiB / 49140MiB
gpt2-large	0,1	adamw_bnb	2	44458MiB / 49140MiB 36906MiB / 49140MiB