Generalization of distributed test cases for non CUDA devices #136988

AnantGulati · 2024-09-30T07:24:24Z

Motivation

This pr is an extension of #131758. As described in #131758, these changes are looking to make distributed UTs more accessible to users of all device types.

The UTs currently are specific to CUDA devices as explicit CUDA API calls are used.

To allow for easier changes to be made for non CUDA devices, we introduce a new class DistributedTestBase derived from MultiProcessTestCase which helps abstracts out the process group creation /deletion and other functionality for a given device.

The tests can be instantiated per device using existing utilities such as instantiate_device_type_tests , which will pass the device as argument and accordingly create the PG with the right backend

The new generalized content can be added by deriving from this base class.

One such change has also been raised in this pr specifically the test_functional_api.py file. It has allowed for clean and easy adaptations for non CUDA devices as demonstrated.

CC: @kwen2501 @wconstab @XilunWu @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec @LucasLLC @MeetVadakkanchery @mhorowitz @pradeepfn

pytorch-bot · 2024-09-30T07:24:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136988

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3ebea74 with merge base 8dddd45 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-09-30T07:24:29Z

The committers listed above are authorized under a signed CLA.

✅ login: guangyey / name: Yu, Guangye (38c8b20)
✅ login: pytorchmergebot (48b74aa, ea6f7d8, 7db7050, f091a43, 15a730c, 5d7d98c, acb761c, fc281d1, 60638b4, 9cbb51a, b48356f, 8d6ad9d, 2c9fb78, b8d8900)
✅ login: cyyever / name: cyyever (507a19d, af0679c, 43180d0)
✅ login: ezyang / name: Edward Z. Yang (a5e103d, 33e7cbd, cfb787a, 13a617e, 411cb88, 2ff9da0)
✅ login: eellison (9721009, 451cb50)
✅ login: bdhirsh / name: Brian Hirsh (1b25415, 0a4921d, 679e949)
✅ login: laithsakka / name: Laith Sakka (3b210dc)
✅ login: malfet / name: Nikita Shulga (5f33ca8, 3f57e9e, 2045b75, 20c7fc8, f695ea7, e528d34, 9845028, a00f34a, 649ae39, e7aed85, 0e707ea, 8d47c17, 688d83c, 235babd, 6079599)
✅ login: desertfire / name: Bin Bao (8a26c70, 9003692, a40bc58, b4a9cf1)
✅ login: henrylhtsang / name: Henry Tsang (cbf5348)
✅ login: rec / name: Tom Ritchford (948e845, 06a6350)
✅ login: tugsbayasgalan / name: Tugsbayasgalan Manlaibaatar (6b908a9)
✅ login: xmfan / name: Simon Fan (c512e3e, a9c9a65)
✅ login: bobrenjc93 / name: Bob Ren (a1297c2, 9401713, a12b45b)
✅ login: danzimm / name: Dan Zimmerman (4906f33, 2d72669)
✅ login: PaliC / name: Sahan Paliskara (1dbfd5d, b18a850)
✅ login: shink / name: Yuanhao Ji (5915275)
✅ login: AnantGulati / name: Anant Gulati (8accbc3, e305aa0, 3ebea74, c5ccf17)
✅ login: 22quinn (4b311bf)
✅ login: NiklasZ (f47cd17)
✅ login: haifeng-jin / name: Haifeng Jin (777cbe8)
✅ login: int3 / name: Jez Ng (9197a43, 361e0d0)
✅ login: Chillee / name: Horace He (586a575, 5e1448e, ba12c2f, dffc22f)
✅ login: ZainRizvi / name: Zain Rizvi (ae8c0e6, 25b1775)
✅ login: kwen2501 / name: Ke Wen (6996587)
✅ login: pianpwk / name: Pian Pawakapan (8d4c8a0)
✅ login: H-Huang / name: Howard Huang (b3531a0)
✅ login: abhishek-iitmadras / name: Abhishek Kumar (ce38f29, 138426e)
✅ login: alenawang / name: Alena Wang (169fb4f)
✅ login: coconutruben (54b3741, a5a9674)
✅ login: egienvalue / name: egienvalue (3652e37)
✅ login: sxu / name: Shen Chen Xu (ebf976a)
✅ login: awgu / name: Andrew Gu (38e25de)
✅ login: mengluy0125 (46aaa81)
✅ login: drisspg / name: Driss Guessous (5f271b9)
✅ login: XilunWu / name: Xilun Wu (c8e1bc9)
✅ login: albanD (e7ed865)
✅ login: isuruf / name: Isuru Fernando (f437cd9)
✅ login: angelayi / name: Angela Yi (fe007e4)
✅ login: yf225 / name: Will Feng (78fa257)
✅ login: benjaminglass1 / name: Benjamin Glass (9b28f5d, b788082)
✅ login: muchulee8 / name: Mu-Chu Lee (34749ce)
✅ login: clee2000 (a6e9693)
✅ login: atalman / name: Andrey Talman (3ed844a)
✅ login: jansel / name: Jason Ansel (bfdabb9)
✅ login: zeshengzong / name: Zesheng Zong (74d8922)

AnantGulati · 2024-09-30T07:28:16Z

@pytorchbot label "topic: not user facing"

kwen2501

Thanks for the PR.
I didn't look closely, but it seems there are a lot of replacements from cuda to hpu. I wonder if this PR is a demo and not for landing?

kwen2501 · 2024-10-02T08:15:11Z