Skip to content

Update NCCL plugin to v1.0.3 in A3U#3594

Merged
tpdownes merged 3 commits into
GoogleCloudPlatform:developfrom
akiki-liang0:feat/nccl-plugin-blueprint-update
Jan 28, 2025
Merged

Update NCCL plugin to v1.0.3 in A3U#3594
tpdownes merged 3 commits into
GoogleCloudPlatform:developfrom
akiki-liang0:feat/nccl-plugin-blueprint-update

Conversation

@akiki-liang0
Copy link
Copy Markdown
Contributor

@akiki-liang0 akiki-liang0 commented Jan 27, 2025

  • update a3u slurm blueprints with NCCL plugin v1.0.3
  • update NCCL env vars in NeMo example Dockerfile

Tests:

  • 5 consecutive successful NCCL example runs
  • 5 consecutive successful NeMo example runs
  • 2 consecutive successful Ramble workload runs

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@samskillman samskillman added the release-version-updates Added to release notes under the "Version Updates" heading. label Jan 27, 2025
@tpdownes
Copy link
Copy Markdown
Contributor

/gcbrun

@tpdownes tpdownes self-requested a review January 28, 2025 01:01
@tpdownes tpdownes merged commit 8fd1598 into GoogleCloudPlatform:develop Jan 28, 2025
@abbas1902 abbas1902 mentioned this pull request Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-version-updates Added to release notes under the "Version Updates" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants