-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Closed
Labels
Description
System Info
transformersversion: 4.55.0- Platform: Linux-6.8.0-60-generic-x86_64-with-glibc2.35
- Python version: 3.13.6
- Huggingface_hub version: 0.34.3
- Safetensors version: 0.6.1
- Accelerate version: 1.9.0
- Accelerate config: not found
- DeepSpeed version: 0.17.4
- PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: Yes
- GPU type: NVIDIA H100 80GB HBM3
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained(
"mistralai/Mistral-Small-3.1-24B-Instruct-2503",
torch_dtype="bfloat16",
attn_implementation="flash_attention_2",
)
print(model.config._attn_implementation) # 'flash_attention_2'
print(model.vision_tower.config._attn_implementation) # 'sdpa'Expected behavior
Both model.config._attn_implementation and model.vision_tower.config._attn_implementation should match the passed attn_implementation argument.
Cause
In Mistral3Config, super().__init__ is called before self.vision_config is initialized.
| super().__init__(**kwargs) |
The super().__init__ call triggers the _attn_implementation setter, which attempts to update the vision config — but at this point vision_config does not exist yet, so the update is skipped.
| if subconfig is not None: |