[DeviceMesh] from_group raises error when passing both group and mesh

### 🐛 Describe the bug

Summary: the if-condition [here](https://github.com/pytorch/pytorch/blob/93dcb92bae5e242e065418a12e01bb87677a6d76/torch/distributed/device_mesh.py#L762) seems to be incorrect and raising an error when the mesh is a torch.Tensor and mesh.tolist() is equal to group_ranks.  I guess it is because the second part of the condition evaluates to True since mesh is a torch.Tensor (not None) and Tensor is not equal to list



To reproduce the error:

    class SomeTest(DTensorTestBase):
        @property
        def world_size(self) -> int:
            return 4

        @with_comms
        def test_init_process_group(self):
            input_pg = new_group(ranks=list(range(self.world_size)))
            device_mesh = DeviceMesh.from_group(group=input_pg, device_type="cuda", mesh= torch.arange(self.world_size))

### Versions

PyTorch version: 2.4.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.9.2 (default, Feb 28 2021, 17:03:44)  [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-5.15.120.bsk.2-amd64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A800-SXM4-40GB
GPU 1: NVIDIA A800-SXM4-40GB
GPU 2: NVIDIA A800-SXM4-40GB
GPU 3: NVIDIA A800-SXM4-40GB
GPU 4: NVIDIA A800-SXM4-40GB
GPU 5: NVIDIA A800-SXM4-40GB
GPU 6: NVIDIA A800-SXM4-40GB
GPU 7: NVIDIA A800-SXM4-40GB

Nvidia driver version: 535.161.08
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   52 bits physical, 57 bits virtual
CPU(s):                          120
On-line CPU(s) list:             0-119
Thread(s) per core:              2
Core(s) per socket:              30
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           106
Model name:                      Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz
Stepping:                        6
CPU MHz:                         2294.635
BogoMIPS:                        4589.27
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       2.8 MiB
L1i cache:                       1.9 MiB
L2 cache:                        75 MiB
L3 cache:                        108 MiB
NUMA node0 CPU(s):               0-59
NUMA node1 CPU(s):               60-119
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; TSX disabled
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm md_clear arch_capabilities

Versions of relevant libraries:
[pip3] numpy==2.0.2
[pip3] torch==2.4.1
[pip3] torchaudio==2.4.1
[pip3] torchvision==0.19.1
[pip3] triton==3.0.0
[conda] Could not collect

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DeviceMesh] from_group raises error when passing both group and mesh #137676

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DeviceMesh] from_group raises error when passing both group and mesh #137676

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions