Skip to content

Qwen2.5-vl crash when using mcore backend #1575

@yfw

Description

@yfw

Describe the bug
qwen2.5-vl using megatron path is crashing after megatron-bridge rebase to main branch.

  File "/opt/nemo-rl/3rdparty/Megatron-Bridge-workspace/Megatron-Bridge/src/megatron/bridge/models/qwen_vl/modeling_qwen25_vl.py", line 193, in forward
    position_ids, rope_deltas = self.get_rope_index(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/opt/ray_venvs/nemo_rl.models.policy.megatron_policy_worker.MegatronPolicyWorker/lib/python3.12/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1057, in get_rope_index
    input_ids = input_ids[attention_mask[i] == 1]
                ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: too many indices for tensor of dimension 1

The issue is we are missing this change in Megatron-Bridge main: NVIDIA-NeMo/Megatron-Bridge@480bdc0#diff-31acdfd70c4d5550889bba3d1ada09e730c444e10dd1feccdde97abfd8b466eaL178-R180

Steps/Code to reproduce bug

Run the examples/configs/recipes/vlm/vlm_grpo-qwen2.5-vl-3b-instruct-clevr-1n2g-megatrontp2.v1.yaml recipe

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingqa_rcca_donewhen RCCA finished for the issue, the qa will mark with this label .

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions