You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It works well on multi-node, but fails on single-node.
Error:
raise RuntimeError(f"NCCL error: {error_str}")
RuntimeError: NCCL error: unhandled cuda error (run with NCCL_DEBUG=INFO for details)
Repro:
pytest
Remove pyskip.
# Skip tensor_parallel_size == 2 until we have resources in CIiftensor_parallel_size==2:
pytest.skip(
"Test requires at least three GPUs to run with tensor_parallel_size == 2 on separate clusters."
)
It works well on multi-node, but fails on single-node.
Error:
Repro:
pytest
world_size=tensor_parallel_size+1(will be fixed in feat: support non-colocated in mcore #613)dtensor worker
mcore worker