Skip to content

Pathways cluster config#5370

Merged
SwarnaBharathiMantena merged 21 commits into
GoogleCloudPlatform:developfrom
FIoannides:pathways-cluster-config
Mar 27, 2026
Merged

Pathways cluster config#5370
SwarnaBharathiMantena merged 21 commits into
GoogleCloudPlatform:developfrom
FIoannides:pathways-cluster-config

Conversation

@FIoannides
Copy link
Copy Markdown
Contributor

@FIoannides FIoannides commented Mar 18, 2026

Adds Pathways required configuration during cluster create to Cluster Toolkit

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

- Introduces `enable_pathways` to the `gke-cluster` module to provision the `cpu-np` node pool (`n2-standard-64`) with necessary GCP scopes.

- Introduces `enable_pathways` to the `kubectl-apply` module to template and apply Kueue quotas (`cpu-user` ResourceFlavor, 480 CPU, 2000G memory) automatically.

- Adds `examples/pathways-gke.yaml` blueprint demonstrating the integration without the deprecated PathwaysJob CRD.
@github-actions github-actions Bot added the external PR from external contributor label Mar 18, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for deploying and configuring Google Cloud's Pathways on GKE clusters within the toolkit. It provides a new example blueprint, integrates Pathways-specific Kueue configurations, and adds a dedicated CPU node pool, allowing users to easily provision GKE environments optimized for Pathways workloads.

Highlights

  • New Example Blueprint: Introduced a new example blueprint (pathways-gke.yaml) for deploying a GKE cluster with Pathways enabled, showcasing the integration.
  • Pathways Kueue Configuration: Added a Terraform template (pathways.yaml.tftpl) to configure Kueue ResourceFlavors and ClusterQueues specifically for Pathways workloads, enabling proper resource management.
  • Conditional Pathways Deployment: Modified the kubectl-apply module to conditionally apply the Pathways Kueue configuration based on a new enable_pathways variable, allowing flexible deployment.
  • Dedicated CPU Node Pool: Implemented a new cpu-np node pool within the gke-cluster module, which is conditionally deployed when Pathways is enabled, providing dedicated compute resources.
  • New Configuration Variable: Extended the kubectl-apply and gke-cluster modules with a new enable_pathways boolean variable to control Pathways-specific deployments and configurations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new GKE Pathways cluster configuration, enabling dedicated CPU node pools and Kueue integration. The changes include a new example blueprint, a Kubernetes template for Kueue, and modifications to the kubectl-apply and gke-cluster modules to support the new enable_pathways feature. While the overall functionality is a good addition, there are opportunities to improve the configurability and flexibility of the newly introduced resources.

Comment thread modules/management/kubectl-apply/main.tf
Comment thread modules/scheduler/gke-cluster/main.tf
Comment thread modules/scheduler/gke-cluster/main.tf
Comment thread modules/scheduler/gke-cluster/main.tf Outdated
Comment thread modules/scheduler/gke-cluster/main.tf
Comment thread modules/scheduler/gke-cluster/variables.tf
@FIoannides FIoannides marked this pull request as ready for review March 19, 2026 08:49
@FIoannides FIoannides requested review from a team and samskillman as code owners March 19, 2026 08:49
Comment thread modules/scheduler/gke-cluster/main.tf
Comment thread modules/scheduler/gke-cluster/main.tf Outdated
@FIoannides FIoannides requested a review from cboneti March 19, 2026 17:08
cboneti
cboneti previously approved these changes Mar 19, 2026
Copy link
Copy Markdown
Contributor

@SwarnaBharathiMantena SwarnaBharathiMantena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have limited knowledge on Kueue configurations. So, added a few questions to ensure there is nothing missing.

Comment thread modules/management/kubectl-apply/kueue/pathways.yaml.tftpl
Comment thread modules/management/kubectl-apply/kueue/pathways.yaml.tftpl
Comment thread modules/management/kubectl-apply/kueue/pathways.yaml.tftpl
This prevents 'rendered manifests contain a resource that already exists' Helm errors when a custom Kueue template defining a ClusterQueue is passed while enable_pathways is also set to true, by automatically grouping and merging their resourceGroups.
@FIoannides FIoannides force-pushed the pathways-cluster-config branch from 7c5504c to 9175e4e Compare March 20, 2026 16:51
Comment thread modules/management/kubectl-apply/main.tf
Comment thread modules/management/kubectl-apply/main.tf
@SwarnaBharathiMantena SwarnaBharathiMantena added the release-module-improvements Added to release notes under the "Module Improvements" heading. label Mar 24, 2026
SwarnaBharathiMantena and others added 3 commits March 24, 2026 18:03
This renaming clarifies that Pathways coordination infrastructure (cpu-np and its Kueue flavor) should only be enabled when TPU node pools are also being deployed, helping prevent misconfigurations where users might enable it on purely CPU or standalone environments.
@SwarnaBharathiMantena
Copy link
Copy Markdown
Contributor

/gcbrun

@SwarnaBharathiMantena
Copy link
Copy Markdown
Contributor

Running tests using babysit

Comment thread modules/scheduler/gke-cluster/variables.tf Outdated
@SwarnaBharathiMantena
Copy link
Copy Markdown
Contributor

/gcbrun

@SwarnaBharathiMantena
Copy link
Copy Markdown
Contributor

SUCCESS PR-test-gke go/ghpc-cb/6a2419e0-336f-4535-9c59-0dfe8302c59b
SUCCESS PR-test-gke-a2-highgpu-kueue-onspot go/ghpc-cb/67ba51bf-8b30-4bd3-8130-c00f13d768a5
SUCCESS PR-test-gke-a3-highgpu-onspot go/ghpc-cb/3d26567e-ad57-4e3b-bff7-6c5d18b20e5a
SUCCESS PR-test-gke-a3-megagpu-onspot go/ghpc-cb/16e0c812-a6e4-4383-b5c0-d2a3b7265749
SUCCESS PR-test-gke-a3-ultragpu-onspot go/ghpc-cb/3a5c5970-693a-4e3b-8821-3a8d798283f3
SUCCESS[2] PR-test-gke-a4-onspot go/ghpc-cb/ce55ecfb-de3a-4346-9133-84de42a6ba12
SUCCESS[2] PR-test-gke-g4 go/ghpc-cb/cf827603-80f6-4793-b15d-e682f2c422b6
SUCCESS PR-test-gke-h4d go/ghpc-cb/8504d053-81d3-41f1-8b72-1dcf00640020
SUCCESS PR-test-gke-h4d-onspot go/ghpc-cb/d4f430af-786e-460e-acfb-0cc061b0f6b2
SUCCESS PR-test-gke-inactive-reservation go/ghpc-cb/770f24e2-049e-46e9-931e-bf22f705f34f
SUCCESS PR-test-gke-managed-hyperdisk go/ghpc-cb/1308e052-7ea0-45f4-a631-3059fe7dcd2f
SUCCESS PR-test-gke-managed-lustre go/ghpc-cb/2d200670-1405-438c-bb89-6372a9e6d3c7
SUCCESS PR-test-gke-storage go/ghpc-cb/2dfaed49-108b-4e06-a3af-46ba3e739514
SUCCESS PR-test-gke-tpu-7x go/ghpc-cb/c322b28e-53ee-46f5-9d67-450fb40b073d
SUCCESS PR-test-gke-tpu-v6e go/ghpc-cb/2829206a-7236-4d1e-ab00-3f7afbf4141f
SUCCESS PR-test-ml-gke go/ghpc-cb/cb0077fb-3789-498f-a6d9-855820a1bc04
SUCCESS PR-test-ml-gke-e2e go/ghpc-cb/55804fa1-5f4d-4c46-b683-d1eb0442cb60
SUCCESS[2] PR-test-slurm-gke go/ghpc-cb/83596fe9-3338-4a8f-9429-52267b9f0663
------- TOTAL:18 | SUCCESS: 18

Copy link
Copy Markdown
Contributor

@SwarnaBharathiMantena SwarnaBharathiMantena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@SwarnaBharathiMantena SwarnaBharathiMantena enabled auto-merge (squash) March 26, 2026 01:56
Neelabh94
Neelabh94 previously approved these changes Mar 26, 2026
auto-merge was automatically disabled March 26, 2026 10:18

Head branch was pushed to by a user without write access

@FIoannides FIoannides force-pushed the pathways-cluster-config branch from 766bb48 to 28e7dbe Compare March 26, 2026 10:18
@SwarnaBharathiMantena SwarnaBharathiMantena merged commit 561ab9c into GoogleCloudPlatform:develop Mar 27, 2026
13 of 72 checks passed
FIoannides added a commit to FIoannides/cluster-toolkit that referenced this pull request Mar 27, 2026
Co-authored-by: Swarna Bharathi Mantena <[email protected]> and Neelabh94
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external PR from external contributor release-module-improvements Added to release notes under the "Module Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants