Skip to content

Kueue Config Integration Tests incorporating different Accelerator types for different machines#4252

Merged
ighosh98 merged 2 commits into
GoogleCloudPlatform:developfrom
ishitachail:develop
Jun 9, 2025
Merged

Kueue Config Integration Tests incorporating different Accelerator types for different machines#4252
ighosh98 merged 2 commits into
GoogleCloudPlatform:developfrom
ishitachail:develop

Conversation

@ishitachail
Copy link
Copy Markdown
Contributor

@ishitachail ishitachail commented Jun 8, 2025

This setup uses a tas-queues-template.yaml file with a special placeholder: __ACCELERATOR_TYPE__. At runtime, Ansible replaces this with the appropriate accelerator type for the machine.

For example:

On a3-megagpu machines, it becomes nvidia-h100-mega-80gb.
On a3-ultragpu machines, it becomes nvidia-h200-141gb.
And for a4 machines, it will be nvidia-b200.

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@ishitachail ishitachail requested review from a team and samskillman as code owners June 8, 2025 05:17
@ishitachail ishitachail added the release-bugfix Added to release notes under the "Bug fixes" heading. label Jun 8, 2025
@ighosh98
Copy link
Copy Markdown
Contributor

ighosh98 commented Jun 8, 2025

I see that the PR has conflicts. Make sure all conflicts are resolved correctly once you are done implementing the right solution.

Comment thread tools/cloud-build/daily-tests/tests/gke-a4.yml Outdated
@ishitachail ishitachail changed the title Different Kueue Config Integration Tests for different machine types Kueue Config Integration Tests incorporating different Accelerator types for different machines Jun 9, 2025
Copy link
Copy Markdown
Contributor

@ighosh98 ighosh98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the indentation errors that were flagged by Github Actions. Let's run all the integration tests once before merging the PR.

Comment thread tools/cloud-build/daily-tests/tests/gke-a4.yml Outdated
@ishitachail ishitachail force-pushed the develop branch 3 times, most recently from afb4237 to f92efdc Compare June 9, 2025 05:34
@ighosh98
Copy link
Copy Markdown
Contributor

ighosh98 commented Jun 9, 2025

@Ishita-Chail run the relevant GKE tests once to make sure it passes and there are no misses

@ighosh98 ighosh98 self-requested a review June 9, 2025 08:32
@ighosh98 ighosh98 merged commit c3d8310 into GoogleCloudPlatform:develop Jun 9, 2025
19 of 68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-bugfix Added to release notes under the "Bug fixes" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants