Promote Kueue as the only workload scheduling solution for A3U and adopt the same in NCCL tests#3534
Conversation
ankitkinra
left a comment
There was a problem hiding this comment.
Do we need to delete the 2 node nccl test ?
Deleting this as well, since we have an n-node test now |
Deleted |
ankitkinra
left a comment
There was a problem hiding this comment.
Approved , but I see Sam has an open comment , please take a look at it
Addressed |
mwysokin
left a comment
There was a problem hiding this comment.
I think we should make an informed decision about how to handle TAS and non-TAS Workloads. I added a comment in the relevant place.
4c19b60 to
a3bfc70
Compare
Addressed |
|
Re-tested the changes after adding Michal's suggestions, all good |
…TAS plugin from A3U blueprints. Update Jobset based NCCL test to use Kueue
a3bfc70 to
6325681
Compare
What?
Why?
Kueue is the officially supported workload scheduler for A3 Ultra
Testing
Ran Jobset based NCCL test, verified bandwidth figures
Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.