Skip to content

[Mistral3] attn_implementation not applied to vision_tower.config in Mistral3Config due to init order #40062

@starcatmeow

Description

@starcatmeow

System Info

  • transformers version: 4.55.0
  • Platform: Linux-6.8.0-60-generic-x86_64-with-glibc2.35
  • Python version: 3.13.6
  • Huggingface_hub version: 0.34.3
  • Safetensors version: 0.6.1
  • Accelerate version: 1.9.0
  • Accelerate config: not found
  • DeepSpeed version: 0.17.4
  • PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: NVIDIA H100 80GB HBM3

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained(
    "mistralai/Mistral-Small-3.1-24B-Instruct-2503",
    torch_dtype="bfloat16",
    attn_implementation="flash_attention_2",
)

print(model.config._attn_implementation)               # 'flash_attention_2'
print(model.vision_tower.config._attn_implementation)  # 'sdpa'

Expected behavior

Both model.config._attn_implementation and model.vision_tower.config._attn_implementation should match the passed attn_implementation argument.

Cause

In Mistral3Config, super().__init__ is called before self.vision_config is initialized.

The super().__init__ call triggers the _attn_implementation setter, which attempts to update the vision config — but at this point vision_config does not exist yet, so the update is skipped.

if subconfig is not None:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions