[WIP]`NLLB-MoE` Adds the moe model by ArthurZucker · Pull Request #22024 · huggingface/transformers

ArthurZucker · 2023-03-08T12:11:57Z

What does this PR do?

Fixes #21300
To-Dos:

Conversion script and original weights available here
Converted checkpoints and configuration file available:
- moe-128 experts
Make the common tests go green
Implement top 2 gating mecanism
Add integration tests for:
- the routers
- the logits
- the generation using greedy search
- Cleanup the PR

Fixes huggingface#21300

…to ArthurZucker/issue21300

…cker/transformers into ArthurZucker/issue21300

…to ArthurZucker/issue21300

sgugger

Thanks a lot for adding this new model! I have a couple of comments, but looks on track to be merged soon!

docs/source/en/model_doc/nllb-moe.mdx

src/transformers/models/nllb_moe/modeling_nllb_moe.py

tests/models/nllb_moe/test_modeling_nllb_moe.py

Co-authored-by: Sylvain Gugger <[email protected]>

…cker/transformers into ArthurZucker/issue21300

…to ArthurZucker/issue21300

* Initial commit * update modeling code * update doc * add functions necessary * fix impotrs * revert changes * fixup * more styling to get going * remove standalone encoder * update code * styling * fix config and model * update code and some refactoring * make more tests pass * Adding NLLB-200 - MoE - 54.5B for no language left behind Fixes huggingface#21300 * fix mor common tests * styke * update testing file * update * update * Router2 doc * update check config with sparse layer * add dummy router * update current conversion script * create on the fly conversion script * Fixup * style * style 2 * fix empty return * fix return * Update default config sparse layers * easier to create sparse layers * update * update conversion script * update modeling * add to toctree * styling * make ruff happy * update docstring * update conversion script * update, will break tests but impelemting top2 * update * ❗local groups are supported here * ⚠️ Support for local groups is now removed ⚠️ This is because it has to work with model parallelism that we do not support * finish simplificaiton * Fix forward * style * fixup * Update modelling and test, refactoring * update tests * remove final layer)norm as it is done in the FF * routing works! Logits test added * nit in test * remove top1router * style * make sure sparse are tested. Had to change route_tokens a liottle bit * add support for unslip models when converting * fixup * style * update test s * update test * REFACTOR * encoder outputs match! * style * update testing * 🎉encoder and decoder logits match 🎉 * styleing * update tests * cleanup tests * fix router test and CIs * cleanup * cleanup test styling * fix tests * Finally the generation tests match! * cleanup * update test * style testing file * remove script * cleanup * more cleanup * nits * update * NLLB tokenizer is wrong and will be fixed soon * use LongTensors * update tests * revert some small changes * fix second expert sampling and batch prioritized routing * update tests * finish last tests * make ruff happy * update * ruff again * style * Update docs/source/en/model_doc/nllb-moe.mdx Co-authored-by: Sylvain Gugger <[email protected]> * Updates based on review * style and fix import issue * nit * more nits * cleanup * styling * update test_seconde_expert_policy * fix name * last nit on the markdown examples --------- Co-authored-by: Sylvain Gugger <[email protected]>

ArthurZucker added 8 commits March 8, 2023 12:10

Initial commit

bd0447f

update modeling code

9a2ed75

update doc

6539759

add functions necessary

9ea138f

fix impotrs

48a0a17

revert changes

b5f3252

fixup

ac76b65

more styling to get going

1ad462c

ArthurZucker self-assigned this Mar 9, 2023

remove standalone encoder

7942325

ArthurZucker changed the title ~~[NLLB-MoE] Adds the moe model~~ [WIP][NLLB-MoE] Adds the moe model Mar 9, 2023

ArthurZucker added 19 commits March 9, 2023 15:54

update code

f903c0d

styling

f6939c4

fix config and model

53e13a3

update code and some refactoring

eb7e60a

make more tests pass

27e90c2

Adding NLLB-200 - MoE - 54.5B for no language left behind

7dfe18a

Fixes huggingface#21300

fix mor common tests

7986eb2

styke

583eed7

update testing file

ec6a154

Merge branch 'main' of https://github.com/huggingface/transformers in…

adcc4ec

…to ArthurZucker/issue21300

update

de0aead

update

d7ce7c4

Router2 doc

3fefd1c

update check config with sparse layer

fd0ff00

Merge branch 'ArthurZucker/issue21300' of https://github.com/ArthurZu…

c33ccbb

…cker/transformers into ArthurZucker/issue21300

add dummy router

0d4d3ad

update current conversion script

d31b812

create on the fly conversion script

6d39377

Fixup

c9e6ef6

ArthurZucker added 10 commits March 23, 2023 16:13

use LongTensors

0abdfd4

Merge branch 'main' of https://github.com/huggingface/transformers in…

bb1a45e

…to ArthurZucker/issue21300

update tests

76fb880

revert some small changes

4bfa876

fix second expert sampling and batch prioritized routing

a9e6917

update tests

fa7963b

finish last tests

a138347

make ruff happy

83f800a

update

e0dc254

ruff again

fab8142

ArthurZucker marked this pull request as ready for review March 24, 2023 11:20

style

4ca3306

ArthurZucker requested a review from sgugger March 24, 2023 11:29

sgugger approved these changes Mar 24, 2023

View reviewed changes

ArthurZucker and others added 12 commits March 27, 2023 10:00

Update docs/source/en/model_doc/nllb-moe.mdx

22f9aa9

Co-authored-by: Sylvain Gugger <[email protected]>

Updates based on review

e297fea

style and fix import issue

e7a5320

Merge branch 'ArthurZucker/issue21300' of https://github.com/ArthurZu…

a6ee336

…cker/transformers into ArthurZucker/issue21300

nit

a1833a6

more nits

7ff49f8

cleanup

634dc4f

styling

893c520

update test_seconde_expert_policy

5f41835

fix name

ece00af

Merge branch 'main' of https://github.com/huggingface/transformers in…

e7c9505

…to ArthurZucker/issue21300

last nit on the markdown examples

95bd670

ArthurZucker merged commit 19ade24 into huggingface:main Mar 27, 2023

NanoCode012 mentioned this pull request Mar 30, 2023

[Bug] KeyError: 'nllb-moe' when trying to load nllb-moe-54b model #22461

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]`NLLB-MoE` Adds the moe model#22024

[WIP]`NLLB-MoE` Adds the moe model#22024
ArthurZucker merged 118 commits intohuggingface:mainfrom
ArthurZucker:ArthurZucker/issue21300

ArthurZucker commented Mar 8, 2023 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ArthurZucker commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArthurZucker commented Mar 8, 2023 •

edited

Loading