Skip to content

Conversation

@tohtana
Copy link
Collaborator

@tohtana tohtana commented Jun 21, 2025

TestParamPartitioningSkipInit throws the following error.

====================================== short test summary info ======================================
FAILED test_zero.py::TestParamPartitioningSkipInit::test[dtype1] - RuntimeError: mat1 and mat2 must have the same dtype, but got Half and BFloat16
========= 1 failed, 204 passed, 66 skipped, 15 deselected, 5 warnings in 2305.03s (0:38:25) =========

The test always sets the model's dtype to torch.bfloat16 and ignores the test parameter dtype when bfloat16 is supported. This causes a dtype mismatch when dtype=torch.float16 is given as the test parameter because the data loader respects the test parameter dtype.

@tohtana tohtana requested review from loadams and tjruwase as code owners June 21, 2025 21:20
@tohtana tohtana enabled auto-merge (squash) June 23, 2025 16:26
@tohtana tohtana merged commit e049bbf into master Jun 23, 2025
9 checks passed
@tohtana tohtana deleted the tohtana/fix_zero_test_dtype_mismatch branch June 23, 2025 16:44
Antlera pushed a commit to Antlera/DeepSpeed that referenced this pull request Jun 27, 2025
`TestParamPartitioningSkipInit` throws the following error.
```
====================================== short test summary info ======================================
FAILED test_zero.py::TestParamPartitioningSkipInit::test[dtype1] - RuntimeError: mat1 and mat2 must have the same dtype, but got Half and BFloat16
========= 1 failed, 204 passed, 66 skipped, 15 deselected, 5 warnings in 2305.03s (0:38:25) =========
```

The test always sets the model's dtype to `torch.bfloat16` and ignores
the test parameter `dtype` when bfloat16 is supported. This causes a
dtype mismatch when `dtype=torch.float16` is given as the test parameter
because the data loader respects the test parameter dtype.

---------

Signed-off-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
lpnpcs pushed a commit to lpnpcs/DeepSpeed that referenced this pull request Jul 30, 2025
`TestParamPartitioningSkipInit` throws the following error.
```
====================================== short test summary info ======================================
FAILED test_zero.py::TestParamPartitioningSkipInit::test[dtype1] - RuntimeError: mat1 and mat2 must have the same dtype, but got Half and BFloat16
========= 1 failed, 204 passed, 66 skipped, 15 deselected, 5 warnings in 2305.03s (0:38:25) =========
```

The test always sets the model's dtype to `torch.bfloat16` and ignores
the test parameter `dtype` when bfloat16 is supported. This causes a
dtype mismatch when `dtype=torch.float16` is given as the test parameter
because the data loader respects the test parameter dtype.

---------

Signed-off-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
`TestParamPartitioningSkipInit` throws the following error.
```
====================================== short test summary info ======================================
FAILED test_zero.py::TestParamPartitioningSkipInit::test[dtype1] - RuntimeError: mat1 and mat2 must have the same dtype, but got Half and BFloat16
========= 1 failed, 204 passed, 66 skipped, 15 deselected, 5 warnings in 2305.03s (0:38:25) =========
```

The test always sets the model's dtype to `torch.bfloat16` and ignores
the test parameter `dtype` when bfloat16 is supported. This causes a
dtype mismatch when `dtype=torch.float16` is given as the test parameter
because the data loader respects the test parameter dtype.

---------

Signed-off-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants