Skip to content

Add new cohere2_moe model#46115

Merged
Cyrilvallez merged 29 commits into
mainfrom
cohere2_moe
May 20, 2026
Merged

Add new cohere2_moe model#46115
Cyrilvallez merged 29 commits into
mainfrom
cohere2_moe

Conversation

@Cyrilvallez
Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez commented May 20, 2026

What does this PR do?

As per the title!

Terrencezzj and others added 25 commits May 13, 2026 21:01
Port the Cohere2Moe (c4) model from cohere-transformers to upstream.
Adds modular and generated modeling files, configuration, and unit tests.
Registers Cohere2MoeConfig in CONFIG_MAPPING_NAMES, Cohere2MoeModel in
MODEL_MAPPING_NAMES, and Cohere2MoeForCausalLM in MODEL_FOR_CAUSAL_LM_MAPPING.
Adds "cohere2moe" -> "cohere2_moe" to SPECIAL_MODEL_TYPE_TO_MODULE_NAME so
the auto-mapping resolves the module directory correctly.

Co-authored-by: Cursor <[email protected]>
…e2Moe

- Remove CohereTokenizer.__init__ and replace with convert_to_native_format
  classmethod so the tokenizer loads fully from tokenizer.json (v5-style).
  Adds add_bos_token/add_eos_token defaults to match 4.56.2.6 behaviour.
- Update test_tokenization_cohere.py: disable test_tokenizer_from_extractor
  (no __init__ means extractor-based construction is unsupported), replace
  test_add_prefix_space_fast with tokenizer.json-based tests that verify
  pre_tokenizer/normalizer/decoder components are preserved on load.
- Add Cohere2MoeVisionIntegrationTest to test_modeling_cohere2_vision.py
  with forward and generate tests for Command A+ (cohere2moe backbone).

Co-authored-by: Cursor <[email protected]>
…e_position

- Rename 'input_embeds' key to 'inputs_embeds' to match create_causal_mask()
  signature (typo introduced in generated modeling file).
- Remove 'cache_position' from mask_kwargs; upstream create_causal_mask()
  does not accept this parameter (cohere-transformers-specific argument).

Result: all 6 integration tests now pass (2 skipped: flash_attn not installed).
Co-authored-by: Cursor <[email protected]>
Replace hardcoded expected outputs in test_model_flash_attn that were
written for command_a+_bf16 with the correct outputs from mhlv2_bf16_clean.

Co-authored-by: Cursor <[email protected]>
rope_scaling and sliding_window_pattern are consumed in __post_init__
and stored as derived attributes (rope_parameters and layer_types)
which the modeling code reads directly.

Co-authored-by: Cursor <[email protected]>
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, cohere2, cohere2_moe, cohere2_vision

@github-actions
Copy link
Copy Markdown
Contributor

@Cyrilvallez Cyrilvallez merged commit 9188b5e into main May 20, 2026
89 of 96 checks passed
@Cyrilvallez Cyrilvallez deleted the cohere2_moe branch May 20, 2026 11:18
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request May 28, 2026
* Add Cohere2Moe (cohere2_moe) model

Port the Cohere2Moe (c4) model from cohere-transformers to upstream.
Adds modular and generated modeling files, configuration, and unit tests.
Registers Cohere2MoeConfig in CONFIG_MAPPING_NAMES, Cohere2MoeModel in
MODEL_MAPPING_NAMES, and Cohere2MoeForCausalLM in MODEL_FOR_CAUSAL_LM_MAPPING.
Adds "cohere2moe" -> "cohere2_moe" to SPECIAL_MODEL_TYPE_TO_MODULE_NAME so
the auto-mapping resolves the module directory correctly.

Co-authored-by: Cursor <[email protected]>

* Drop CohereTokenizer.__init__; add vision integration tests for Cohere2Moe

- Remove CohereTokenizer.__init__ and replace with convert_to_native_format
  classmethod so the tokenizer loads fully from tokenizer.json (v5-style).
  Adds add_bos_token/add_eos_token defaults to match 4.56.2.6 behaviour.
- Update test_tokenization_cohere.py: disable test_tokenizer_from_extractor
  (no __init__ means extractor-based construction is unsupported), replace
  test_add_prefix_space_fast with tokenizer.json-based tests that verify
  pre_tokenizer/normalizer/decoder components are preserved on load.
- Add Cohere2MoeVisionIntegrationTest to test_modeling_cohere2_vision.py
  with forward and generate tests for Command A+ (cohere2moe backbone).

Co-authored-by: Cursor <[email protected]>

* Fix create_causal_mask call: input_embeds -> inputs_embeds, drop cache_position

- Rename 'input_embeds' key to 'inputs_embeds' to match create_causal_mask()
  signature (typo introduced in generated modeling file).
- Remove 'cache_position' from mask_kwargs; upstream create_causal_mask()
  does not accept this parameter (cohere-transformers-specific argument).

Result: all 6 integration tests now pass (2 skipped: flash_attn not installed).
Co-authored-by: Cursor <[email protected]>

* Update flash_attn expected texts for mhlv2_bf16_clean model

Replace hardcoded expected outputs in test_model_flash_attn that were
written for command_a+_bf16 with the correct outputs from mhlv2_bf16_clean.

Co-authored-by: Cursor <[email protected]>

* update path

* address comments

* fix deprecate

* address comments

* address comments

* update mapping

* fix ci

* fix ci

* format

* fix ci

* format

* Add Cohere2MoeConfig to check_config_attributes ignore list

rope_scaling and sliding_window_pattern are consumed in __post_init__
and stored as derived attributes (rope_parameters and layer_types)
which the modeling code reads directly.

Co-authored-by: Cursor <[email protected]>

* add date?

* Update docs/source/en/model_doc/cohere2_moe.md

* a few improvements

* extract the router explicitly

* simplify attention

* attributes

* fix the output type

* simplify tests

* fix init weights

* revert changes to tokenizer - will be in hub config

* style

* mapping already exist, use it directly

---------

Co-authored-by: Terrencezzj <[email protected]>
Co-authored-by: Cursor <[email protected]>
Co-authored-by: bharat <[email protected]>
kashif pushed a commit to kashif/transformers that referenced this pull request Jun 1, 2026
* Add Cohere2Moe (cohere2_moe) model

Port the Cohere2Moe (c4) model from cohere-transformers to upstream.
Adds modular and generated modeling files, configuration, and unit tests.
Registers Cohere2MoeConfig in CONFIG_MAPPING_NAMES, Cohere2MoeModel in
MODEL_MAPPING_NAMES, and Cohere2MoeForCausalLM in MODEL_FOR_CAUSAL_LM_MAPPING.
Adds "cohere2moe" -> "cohere2_moe" to SPECIAL_MODEL_TYPE_TO_MODULE_NAME so
the auto-mapping resolves the module directory correctly.

Co-authored-by: Cursor <[email protected]>

* Drop CohereTokenizer.__init__; add vision integration tests for Cohere2Moe

- Remove CohereTokenizer.__init__ and replace with convert_to_native_format
  classmethod so the tokenizer loads fully from tokenizer.json (v5-style).
  Adds add_bos_token/add_eos_token defaults to match 4.56.2.6 behaviour.
- Update test_tokenization_cohere.py: disable test_tokenizer_from_extractor
  (no __init__ means extractor-based construction is unsupported), replace
  test_add_prefix_space_fast with tokenizer.json-based tests that verify
  pre_tokenizer/normalizer/decoder components are preserved on load.
- Add Cohere2MoeVisionIntegrationTest to test_modeling_cohere2_vision.py
  with forward and generate tests for Command A+ (cohere2moe backbone).

Co-authored-by: Cursor <[email protected]>

* Fix create_causal_mask call: input_embeds -> inputs_embeds, drop cache_position

- Rename 'input_embeds' key to 'inputs_embeds' to match create_causal_mask()
  signature (typo introduced in generated modeling file).
- Remove 'cache_position' from mask_kwargs; upstream create_causal_mask()
  does not accept this parameter (cohere-transformers-specific argument).

Result: all 6 integration tests now pass (2 skipped: flash_attn not installed).
Co-authored-by: Cursor <[email protected]>

* Update flash_attn expected texts for mhlv2_bf16_clean model

Replace hardcoded expected outputs in test_model_flash_attn that were
written for command_a+_bf16 with the correct outputs from mhlv2_bf16_clean.

Co-authored-by: Cursor <[email protected]>

* update path

* address comments

* fix deprecate

* address comments

* address comments

* update mapping

* fix ci

* fix ci

* format

* fix ci

* format

* Add Cohere2MoeConfig to check_config_attributes ignore list

rope_scaling and sliding_window_pattern are consumed in __post_init__
and stored as derived attributes (rope_parameters and layer_types)
which the modeling code reads directly.

Co-authored-by: Cursor <[email protected]>

* add date?

* Update docs/source/en/model_doc/cohere2_moe.md

* a few improvements

* extract the router explicitly

* simplify attention

* attributes

* fix the output type

* simplify tests

* fix init weights

* revert changes to tokenizer - will be in hub config

* style

* mapping already exist, use it directly

---------

Co-authored-by: Terrencezzj <[email protected]>
Co-authored-by: Cursor <[email protected]>
Co-authored-by: bharat <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants