feat: Add resource-policy accelerator_topology_mode#5393
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the developer experience by automating the management of core external dependencies like Terraform and Packer. It also introduces advanced capabilities for GKE, such as the GKE Slice Controller for GPU resource optimization, and refines Slurm cluster configurations with more robust topology generation. A new example blueprint showcases high-performance GCS RAPID storage with Anywhere Cache, expanding the toolkit's storage integration options. Additionally, various internal dependencies have been updated to maintain a modern and stable codebase. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
db60f67 to
b2cdc93
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces support for "Super-slicing" and the GKE Slice Controller. It includes new variables such as accelerator_topology_mode and enable_slice_controller, and updates Terraform configurations to integrate these features. Crucial preconditions have been added to ensure compatibility with GKE versions for the Slice Controller. Additionally, the Google provider versions have been updated across relevant files. The changes appear to be well-implemented and include necessary validations for the new functionalities.
b2cdc93 to
cf3873b
Compare
|
/gcbrun |
d3891d2 to
d73faf0
Compare
Adds accelerator_topology_mode to the resource-policy module mapped to workload_policy. Includes an exclusivity precondition within the gke-node-pool module to ensure PROVISION_ONLY clusters are never incorrectly configured alongside queued provisioning.
…ake workload_policy type explicit
…tor_topology_mode
…tor_topology_mode
eac1c38 to
a2b8179
Compare
a2b8179 to
d392af0
Compare
|
/gcbrun |
0fed3dd to
7cd7c31
Compare
68e2bd2 to
f1174b0
Compare
|
/gcbrun |
|
/gcbrun |
|
/gcbrun |
5c1001d
into
GoogleCloudPlatform:develop
Description
This PR introduces generic support for the
accelerator_topology_modefield in theresource-policymodule, enabling the use ofPROVISION_ONLYfor Compute Engine Super-slicing architectures. It also enforces necessary architectural constraints (like Dynamic Workload Scheduler exclusivity) without hardcoding marketing terminology into the modules.Motivation and Context
Super-slicing support in CT.
Key Changes
accelerator_topology_mode: Integrated this optional variable into theresource-policymodule'sworkload_policyblock.workload_policy.type: Ensured explicit assignment ofworkload_policy.typewhen topologies are specified, preventing the masking of future GCP API additions.lifecycle { precondition {} }blocks to ensure bothworkload_policy.typeandaccelerator_topologyare provided whenever anaccelerator_topology_modeis defined.google-betaprovider to7.24.0in affected modules (resource-policyandgke-node-pool) and adjusted the compiler'sGoogleProviderVersionupper bound to<= 7.24.0inpkg/config/expand.go.accelerator_topology_modeattribute was officially added to thegoogle_compute_resource_policyresource in this specific provider release. (Terraform Google Beta v7.24.0 Release Notes)lifecycle { precondition {} }block to thegke-node-poolmodule to explicitly prevent combiningenable_queued_provisioning(DWS) with custom topology modes likePROVISION_ONLY.Verification
ghpcand provisioned a compute resource policy withaccelerator_topology = "16x16"andaccelerator_topology_mode = "PROVISION_ONLY"via Terraform. Confirmed the configuration is correctly parsed, sent by the updatedgoogle-betaprovider, and accepted asREADYby the GCP API.Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.