Add new cohere2_moe model by Cyrilvallez · Pull Request #46115 · huggingface/transformers

Cyrilvallez · 2026-05-20T09:37:46Z

What does this PR do?

As per the title!

Port the Cohere2Moe (c4) model from cohere-transformers to upstream. Adds modular and generated modeling files, configuration, and unit tests. Registers Cohere2MoeConfig in CONFIG_MAPPING_NAMES, Cohere2MoeModel in MODEL_MAPPING_NAMES, and Cohere2MoeForCausalLM in MODEL_FOR_CAUSAL_LM_MAPPING. Adds "cohere2moe" -> "cohere2_moe" to SPECIAL_MODEL_TYPE_TO_MODULE_NAME so the auto-mapping resolves the module directory correctly. Co-authored-by: Cursor <[email protected]>

…e2Moe - Remove CohereTokenizer.__init__ and replace with convert_to_native_format classmethod so the tokenizer loads fully from tokenizer.json (v5-style). Adds add_bos_token/add_eos_token defaults to match 4.56.2.6 behaviour. - Update test_tokenization_cohere.py: disable test_tokenizer_from_extractor (no __init__ means extractor-based construction is unsupported), replace test_add_prefix_space_fast with tokenizer.json-based tests that verify pre_tokenizer/normalizer/decoder components are preserved on load. - Add Cohere2MoeVisionIntegrationTest to test_modeling_cohere2_vision.py with forward and generate tests for Command A+ (cohere2moe backbone). Co-authored-by: Cursor <[email protected]>

…e_position - Rename 'input_embeds' key to 'inputs_embeds' to match create_causal_mask() signature (typo introduced in generated modeling file). - Remove 'cache_position' from mask_kwargs; upstream create_causal_mask() does not accept this parameter (cohere-transformers-specific argument). Result: all 6 integration tests now pass (2 skipped: flash_attn not installed). Co-authored-by: Cursor <[email protected]>

Replace hardcoded expected outputs in test_model_flash_attn that were written for command_a+_bf16 with the correct outputs from mhlv2_bf16_clean. Co-authored-by: Cursor <[email protected]>

rope_scaling and sliding_window_pattern are consumed in __post_init__ and stored as derived attributes (rope_parameters and layer_types) which the modeling code reads directly. Co-authored-by: Cursor <[email protected]>

HuggingFaceDocBuilderDev · 2026-05-20T09:51:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-05-20T10:15:32Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, cohere2, cohere2_moe, cohere2_vision

github-actions · 2026-05-20T10:46:22Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46115&sha=250645

* Add Cohere2Moe (cohere2_moe) model Port the Cohere2Moe (c4) model from cohere-transformers to upstream. Adds modular and generated modeling files, configuration, and unit tests. Registers Cohere2MoeConfig in CONFIG_MAPPING_NAMES, Cohere2MoeModel in MODEL_MAPPING_NAMES, and Cohere2MoeForCausalLM in MODEL_FOR_CAUSAL_LM_MAPPING. Adds "cohere2moe" -> "cohere2_moe" to SPECIAL_MODEL_TYPE_TO_MODULE_NAME so the auto-mapping resolves the module directory correctly. Co-authored-by: Cursor <[email protected]> * Drop CohereTokenizer.__init__; add vision integration tests for Cohere2Moe - Remove CohereTokenizer.__init__ and replace with convert_to_native_format classmethod so the tokenizer loads fully from tokenizer.json (v5-style). Adds add_bos_token/add_eos_token defaults to match 4.56.2.6 behaviour. - Update test_tokenization_cohere.py: disable test_tokenizer_from_extractor (no __init__ means extractor-based construction is unsupported), replace test_add_prefix_space_fast with tokenizer.json-based tests that verify pre_tokenizer/normalizer/decoder components are preserved on load. - Add Cohere2MoeVisionIntegrationTest to test_modeling_cohere2_vision.py with forward and generate tests for Command A+ (cohere2moe backbone). Co-authored-by: Cursor <[email protected]> * Fix create_causal_mask call: input_embeds -> inputs_embeds, drop cache_position - Rename 'input_embeds' key to 'inputs_embeds' to match create_causal_mask() signature (typo introduced in generated modeling file). - Remove 'cache_position' from mask_kwargs; upstream create_causal_mask() does not accept this parameter (cohere-transformers-specific argument). Result: all 6 integration tests now pass (2 skipped: flash_attn not installed). Co-authored-by: Cursor <[email protected]> * Update flash_attn expected texts for mhlv2_bf16_clean model Replace hardcoded expected outputs in test_model_flash_attn that were written for command_a+_bf16 with the correct outputs from mhlv2_bf16_clean. Co-authored-by: Cursor <[email protected]> * update path * address comments * fix deprecate * address comments * address comments * update mapping * fix ci * fix ci * format * fix ci * format * Add Cohere2MoeConfig to check_config_attributes ignore list rope_scaling and sliding_window_pattern are consumed in __post_init__ and stored as derived attributes (rope_parameters and layer_types) which the modeling code reads directly. Co-authored-by: Cursor <[email protected]> * add date? * Update docs/source/en/model_doc/cohere2_moe.md * a few improvements * extract the router explicitly * simplify attention * attributes * fix the output type * simplify tests * fix init weights * revert changes to tokenizer - will be in hub config * style * mapping already exist, use it directly --------- Co-authored-by: Terrencezzj <[email protected]> Co-authored-by: Cursor <[email protected]> Co-authored-by: bharat <[email protected]>

Terrencezzj and others added 25 commits May 13, 2026 21:01

Update flash_attn expected texts for mhlv2_bf16_clean model

4ef3f8d

Replace hardcoded expected outputs in test_model_flash_attn that were written for command_a+_bf16 with the correct outputs from mhlv2_bf16_clean. Co-authored-by: Cursor <[email protected]>

update path

8cc5807

address comments

64273d6

fix deprecate

444d8e7

address comments

bb948eb

address comments

bdf02f5

update mapping

360a7a5

fix ci

46866f2

fix ci

ee11b0d

format

b3bef79

fix ci

7e5b1ca

format

84940ab

Add Cohere2MoeConfig to check_config_attributes ignore list

4e111d4

rope_scaling and sliding_window_pattern are consumed in __post_init__ and stored as derived attributes (rope_parameters and layer_types) which the modeling code reads directly. Co-authored-by: Cursor <[email protected]>

add date?

8d13db9

Update docs/source/en/model_doc/cohere2_moe.md

c6f05c5

a few improvements

8433af0

extract the router explicitly

a8ea548

simplify attention

9bb0cae

attributes

62d0e86

fix the output type

afdd177

simplify tests

244df18

fix init weights

7152894

revert changes to tokenizer - will be in hub config

b73469b

Cyrilvallez added 2 commits May 20, 2026 19:20

Merge remote-tracking branch 'origin/main' into cohere2_moe

4ef7795

style

5c31d1a

mapping already exist, use it directly

2506452

Cyrilvallez merged commit 9188b5e into main May 20, 2026
89 of 96 checks passed

Cyrilvallez deleted the cohere2_moe branch May 20, 2026 11:18

ArthurZucker added the New model label May 20, 2026

adityasingh2400 mentioned this pull request May 21, 2026

Fix LlamaConfig rejecting explicit head_dim when hidden_size is not divisible by num_attention_heads #46140

Closed

Ace3Z mentioned this pull request May 24, 2026

Guard torch.distributed.tensor.device_mesh import in continuous_batching #46183

Closed

5 tasks

Cyrilvallez mentioned this pull request May 25, 2026

Update cohere2_moe tp_plan #46189

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new cohere2_moe model#46115

Add new cohere2_moe model#46115
Cyrilvallez merged 29 commits into
mainfrom
cohere2_moe

Cyrilvallez commented May 20, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Cyrilvallez commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Cyrilvallez commented May 20, 2026 •

edited

Loading