Skip to content

Add support for bitsandbytes#15622

Merged
sgugger merged 18 commits intohuggingface:mainfrom
manuelciosici:14819-integrate-bnb
Apr 19, 2022
Merged

Add support for bitsandbytes#15622
sgugger merged 18 commits intohuggingface:mainfrom
manuelciosici:14819-integrate-bnb

Conversation

@manuelciosici
Copy link
Copy Markdown
Contributor

@manuelciosici manuelciosici commented Feb 11, 2022

What does this PR do?

Fixes #14819

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@stas00 @sgugger @TimDettmers

Status

  • Need to instrument CI to install bnb (binary package so a bit trickier than normal dependency)
  • I have implemented a CLI parameter to support bitsandbytes
  • I did not write any documentation yet
  • I followed @TimDettmers 's suggestion to override the embedding layers. However, I am unsure about a couple of things:
    • Does the override need to happen before the model is loaded onto the GPU as the official documentation describes for other overrides?
    • Are there any pitfalls to my current approach to identifying Embedding layers? It seems to work fine for RoBERTa and for GPT-2.
  • So far, I've used run_mlm.py and run_clm.py from the examples directory to check that the code runs. Using RTX A6000 GPUs, I see
Model visible devices optimizer per device batch size GPU memory
gpt2-large 0 adamw_torch 2 48638MiB / 49140MiB
gpt2-large 0 adamw_bnb 2 42412MiB / 49140MiB
gpt2-large 0,1 adamw_torch 1 30724MiB / 49140MiB
21040MiB / 49140MiB
gpt2-large 0,1 adamw_torch 2 OOM
gpt2-large 0,1 adamw_bnb 1 26820MiB / 49140MiB
21042MiB / 49140MiB
gpt2-large 0,1 adamw_bnb 2 44458MiB / 49140MiB
36906MiB / 49140MiB

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Integrating bitsandbytes 8-bit optimizer / adding Embedding Norm

8 participants