[bug] Quantization fails for Nemotron-3-Super in multinode setup

### Problem

Based on the [guide](https://github.com/NVIDIA-NeMo/Nemotron/blob/main/docs/nemotron/super3/quantization.md) , we have to first needs to convert HuggingFace checkpoint into a quantized Megatron checkpoint. Then, we convert the quantized Megatron checkpoint to HuggingFace format. 

I am modifying the step 1 direct script execution which utilizes 16 GPUs concurrently to using 4 nodes, and 2 nodes each with 8 GPUs.

Original: https://github.com/NVIDIA-NeMo/Nemotron/blob/main/docs/nemotron/super3/quantization.md#direct-script-execution-megatron-bridge 
```
torchrun --nproc_per_node=16 examples/quantization/ptq_generate.py \
    --hf-model-id $HF_MODEL \
    --megatron-load-path $MEGATRON_SAVE_PATH \
    --pp 2 \
    --tp 8 \
    --ep 8 \
    --trust-remote-code
```
Modified script for multiple nodes (change NNODES for Node count):
```
#!/usr/bin/env bash
set -xeuo pipefail

# 1. Master Logic
MASTER_ADDR=${MLP_WORKER_0_HOST:-$(hostname -I | awk '{print $1}')}
MASTER_PORT=6000
NNODES=4
GPUS_PER_NODE=8
export HF_HOME=/mnt/hf_hub
export HF_TOKEN=<HF_TOKEN>
export NEMO_HOME=/mnt/soonchang/nemo/cache
# Paths
HF_MODEL="/mnt/soonchang/ckpt/sft/nemo_super_megatron_bridge/sft_hf"
MEGATRON_SAVE_PATH="/mnt/soonchang/ckpt/sft/nemo_super_megatron_bridge/sft_mgt"
WORKDIR="/mnt/soonchang/train/super/Megatron-Bridge"

mkdir -p "$MEGATRON_SAVE_PATH"

# 2. Worker Launch Loop
RANK_COUNTER=1
for WORKER_IP in $(awk '{print $1}' /root/mpi_rack_hostfile); do
    if [[ "$WORKER_IP" == "$MASTER_ADDR" ]]; then
        continue
    fi
    
    echo "Launching worker node $RANK_COUNTER on IP: $WORKER_IP"

    # Use single quotes for the SSH command to avoid local expansion issues, 
    # but pass the variables we need explicitly.
    ssh -o StrictHostKeyChecking=no root@"${WORKER_IP}" "
        export HF_HOME='/mnt/hf_hub'
        export HF_TOKEN='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'
        export NEMO_HOME='/mnt/soonchang/nemo/cache'
        export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"
        
        cd $WORKDIR
        torchrun --nnodes=$NNODES \
            --nproc_per_node=$GPUS_PER_NODE \
            --master-addr '$MASTER_ADDR' \
            --master-port $MASTER_PORT \
            --node-rank $RANK_COUNTER \
            examples/quantization/quantize_mtp.py \
            --hf-model-id $HF_MODEL \
            --export-quant-cfg mamba_moe_fp8_conservative \
            --megatron-save-path $MEGATRON_SAVE_PATH \
            --pp 2 \
                --tp 8 \
                --ep 8 \
            --trust-remote-code
    " &

    ((RANK_COUNTER++))
done

# 3. Launch Master Node (Rank 0)
echo "Launching Master Node (Rank 0)"

export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"
cd "$WORKDIR"
torchrun \
  --nnodes=$NNODES \
  --nproc_per_node=$GPUS_PER_NODE \
  --master-addr "$MASTER_ADDR" \
  --master-port "$MASTER_PORT" \
  --node-rank 0 \
  examples/quantization/quantize_mtp.py \
    --hf-model-id "$HF_MODEL" \
    --export-quant-cfg mamba_moe_fp8_conservative \
    --megatron-save-path "$MEGATRON_SAVE_PATH" \
    --pp 2 \
        --tp 8 \
        --ep 8 \
    --trust-remote-code

# Wait for background workers to finish
wait
```
Error message:
```
Inserted 576 quantizers
0%|          | 0/512 [00:00<?, ?it/s]Traceback (most recent call last):
File "/opt/Megatron-Bridge/src/megatron/bridge/models/decorators/torchrun.py", line 37, in wrapper
return_value = recorded_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 358, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 195, in main
mtq.quantize(unwrapped_model, mtq_config, ptq_forward_loop_func)
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_quant.py", line 232, in quantize
return calibrate(model, config.get("algorithm"), forward_loop=forward_loop)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_quant.py", line 103, in calibrate
apply_mode(
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/opt/conversion.py", line 418, in apply_mode
model, metadata = get_mode(m).convert(model, config, **kwargs)  # type: ignore  [call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/mode.py", line 295, in wrapped_func
return wrapped_calib_func(model, config, forward_loop, func=self.__class__._calib_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/mode.py", line 221, in wrapped_calib_func
func(model, forward_loop=forward_loop, **kwargs)
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 121, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_calib.py", line 112, in max_calibrate
forward_loop(model)
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 183, in <lambda>
ptq_forward_loop_func = lambda model: _hf_dataset_forward_loop_func(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 94, in _hf_dataset_forward_loop_func
megatron_generate(
TypeError: megatron_generate() got an unexpected keyword argument 'position_ids'
Traceback (most recent call last):
File "/opt/Megatron-Bridge/src/megatron/bridge/models/decorators/torchrun.py", line 37, in wrapper
return_value = recorded_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 358, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 195, in main
mtq.quantize(unwrapped_model, mtq_config, ptq_forward_loop_func)
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_quant.py", line 232, in quantize
return calibrate(model, config.get("algorithm"), forward_loop=forward_loop)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_quant.py", line 103, in calibrate
apply_mode(
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/opt/conversion.py", line 418, in apply_mode
model, metadata = get_mode(m).convert(model, config, **kwargs)  # type: ignore  [call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/mode.py", line 295, in wrapped_func
return wrapped_calib_func(model, config, forward_loop, func=self.__class__._calib_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/mode.py", line 221, in wrapped_calib_func
func(model, forward_loop=forward_loop, **kwargs)
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 121, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_calib.py", line 112, in max_calibrate
forward_loop(model)
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 183, in <lambda>
ptq_forward_loop_func = lambda model: _hf_dataset_forward_loop_func(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 94, in _hf_dataset_forward_loop_func
megatron_generate(
TypeError: megatron_generate() got an unexpected keyword argument 'position_ids'
```









### Minimal repro

```shell
#!/usr/bin/env bash
set -xeuo pipefail

# 1. Master Logic
MASTER_ADDR=${MLP_WORKER_0_HOST:-$(hostname -I | awk '{print $1}')}
MASTER_PORT=6000
NNODES=4
GPUS_PER_NODE=8
export HF_HOME=/mnt/hf_hub
export HF_TOKEN=<HF_TOKEN>
export NEMO_HOME=/mnt/soonchang/nemo/cache
# Paths
HF_MODEL="/mnt/soonchang/ckpt/sft/nemo_super_megatron_bridge/sft_hf"
MEGATRON_SAVE_PATH="/mnt/soonchang/ckpt/sft/nemo_super_megatron_bridge/sft_mgt"
WORKDIR="/mnt/soonchang/train/super/Megatron-Bridge"

mkdir -p "$MEGATRON_SAVE_PATH"

# 2. Worker Launch Loop
RANK_COUNTER=1
for WORKER_IP in $(awk '{print $1}' /root/mpi_rack_hostfile); do
    if [[ "$WORKER_IP" == "$MASTER_ADDR" ]]; then
        continue
    fi
    
    echo "Launching worker node $RANK_COUNTER on IP: $WORKER_IP"

    # Use single quotes for the SSH command to avoid local expansion issues, 
    # but pass the variables we need explicitly.
    ssh -o StrictHostKeyChecking=no root@"${WORKER_IP}" "
        export HF_HOME='/mnt/hf_hub'
        export HF_TOKEN='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'
        export NEMO_HOME='/mnt/soonchang/nemo/cache'
        export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"
        
        cd $WORKDIR
        torchrun --nnodes=$NNODES \
            --nproc_per_node=$GPUS_PER_NODE \
            --master-addr '$MASTER_ADDR' \
            --master-port $MASTER_PORT \
            --node-rank $RANK_COUNTER \
            examples/quantization/quantize_mtp.py \
            --hf-model-id $HF_MODEL \
            --export-quant-cfg mamba_moe_fp8_conservative \
            --megatron-save-path $MEGATRON_SAVE_PATH \
            --pp 2 \
                --tp 8 \
                --ep 8 \
            --trust-remote-code
    " &

    ((RANK_COUNTER++))
done

# 3. Launch Master Node (Rank 0)
echo "Launching Master Node (Rank 0)"

export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"
cd "$WORKDIR"
torchrun \
  --nnodes=$NNODES \
  --nproc_per_node=$GPUS_PER_NODE \
  --master-addr "$MASTER_ADDR" \
  --master-port "$MASTER_PORT" \
  --node-rank 0 \
  examples/quantization/quantize_mtp.py \
    --hf-model-id "$HF_MODEL" \
    --export-quant-cfg mamba_moe_fp8_conservative \
    --megatron-save-path "$MEGATRON_SAVE_PATH" \
    --pp 2 \
        --tp 8 \
        --ep 8 \
    --trust-remote-code

# Wait for background workers to finish
wait
```

### Expected behavior

Successful quantization

### Affected area

area:recipe

### Regression?

Not sure

### Environment

Branch: `super-v3`
Container: `nvcr.io/nvidia/nemo:26.02.nemotron_3_super`
GPU: H100

### Logs

```shell
Inserted 576 quantizers
0%|          | 0/512 [00:00<?, ?it/s]Traceback (most recent call last):
File "/opt/Megatron-Bridge/src/megatron/bridge/models/decorators/torchrun.py", line 37, in wrapper
return_value = recorded_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 358, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 195, in main
mtq.quantize(unwrapped_model, mtq_config, ptq_forward_loop_func)
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_quant.py", line 232, in quantize
return calibrate(model, config.get("algorithm"), forward_loop=forward_loop)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_quant.py", line 103, in calibrate
apply_mode(
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/opt/conversion.py", line 418, in apply_mode
model, metadata = get_mode(m).convert(model, config, **kwargs)  # type: ignore  [call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/mode.py", line 295, in wrapped_func
return wrapped_calib_func(model, config, forward_loop, func=self.__class__._calib_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/mode.py", line 221, in wrapped_calib_func
func(model, forward_loop=forward_loop, **kwargs)
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 121, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_calib.py", line 112, in max_calibrate
forward_loop(model)
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 183, in <lambda>
ptq_forward_loop_func = lambda model: _hf_dataset_forward_loop_func(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 94, in _hf_dataset_forward_loop_func
megatron_generate(
TypeError: megatron_generate() got an unexpected keyword argument 'position_ids'
Traceback (most recent call last):
File "/opt/Megatron-Bridge/src/megatron/bridge/models/decorators/torchrun.py", line 37, in wrapper
return_value = recorded_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 358, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 195, in main
mtq.quantize(unwrapped_model, mtq_config, ptq_forward_loop_func)
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_quant.py", line 232, in quantize
return calibrate(model, config.get("algorithm"), forward_loop=forward_loop)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_quant.py", line 103, in calibrate
apply_mode(
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/opt/conversion.py", line 418, in apply_mode
model, metadata = get_mode(m).convert(model, config, **kwargs)  # type: ignore  [call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/mode.py", line 295, in wrapped_func
return wrapped_calib_func(model, config, forward_loop, func=self.__class__._calib_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/mode.py", line 221, in wrapped_calib_func
func(model, forward_loop=forward_loop, **kwargs)
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 121, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/modelopt/torch/quantization/model_calib.py", line 112, in max_calibrate
forward_loop(model)
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 183, in <lambda>
ptq_forward_loop_func = lambda model: _hf_dataset_forward_loop_func(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/soonchang/train/super/Megatron-Bridge/examples/quantization/quantize_mtp.py", line 94, in _hf_dataset_forward_loop_func
megatron_generate(
TypeError: megatron_generate() got an unexpected keyword argument 'position_ids'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Quantization fails for Nemotron-3-Super in multinode setup #3499

Problem

Minimal repro

Expected behavior

Affected area

Regression?

Environment

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] Quantization fails for Nemotron-3-Super in multinode setup #3499

Description

Problem

Minimal repro

Expected behavior

Affected area

Regression?

Environment

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions