Update wait flag and resolving helm_release deadlock destruction error by agrawalkhushi18 · Pull Request #5147 · GoogleCloudPlatform/cluster-toolkit

agrawalkhushi18 · 2026-01-28T19:36:16Z

This PR addresses transient kueue-webhook build failures caused by asynchronous Helm installs (wait=false) by enabling wait=true. To prevent the resulting destruction-time race condition where node pools are deleted before Helm resources, it introduces an explicit dependency on the system_node_pool to ensure a clean, sequential teardown.

Changes:

gke-cluster/outputs.tf: Added system_node_pool_id output to expose the node pool's unique identifier for dependency tracking.
kubectl-apply/variables.tf: Introduced system_node_pool_id variable to receive the dependency anchor from the cluster module.
kubectl-apply/main.tf: Forced wait = true for Kueue installation and passed the system_node_pool_id into the install_kueue module.
helm_install/main.tf: Implemented a tf_dependency_anchor using a Helm set block to force Terraform to honor sequential resource ordering during destruction.
Updated GKE blueprints to remove explicit setting of the wait flag

To clean up the orphaned resources created through the ansible post-deploy test, a cleanup step to remove these resources has been added for helm and terraform to handle the gcluster destroy gracefully in the build environment.

gemini-code-assist · 2026-01-28T19:36:43Z

Summary of Changes

Hello @agrawalkhushi18, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and reliability of Kubernetes cluster deployments, particularly concerning Helm chart installations and resource teardown. By enforcing synchronous Helm installations for Kueue and introducing explicit dependency tracking using the system node pool ID, it resolves issues related to transient build failures and prevents deadlock errors during the destruction of Helm releases. The changes ensure a more predictable and robust lifecycle management for cluster components.

Highlights

Kueue Helm Installation Reliability: The wait flag for Kueue Helm installations has been explicitly set to true. This change addresses transient build failures by ensuring that all resources are fully ready before the installation is considered complete, preventing issues caused by asynchronous Helm installs.
Helm Release Destruction Order: A mechanism has been introduced to enforce explicit Terraform dependency tracking for Helm releases. This prevents destruction-time race conditions, specifically addressing scenarios where node pools might be deleted before dependent Helm resources, which previously led to deadlock errors.
System Node Pool ID Exposure: The system_node_pool_id is now exposed as an output from the GKE cluster module. This ID is crucial for establishing dependencies and ensuring proper resource teardown order.
Dependency Anchoring in Helm: A tf_dependency_anchor Helm set block has been added within the helm_release resource. This block uses the deployment_dependency_ids variable to create an implicit dependency, forcing Terraform to respect the desired sequential ordering during resource destruction.
GKE Blueprint Updates: Various GKE blueprints have been updated to pass the newly exposed system_node_pool_id from the cluster module to the kubectl-apply module, integrating the new dependency tracking across the infrastructure definitions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively resolves a Helm release deadlock during destruction by enforcing wait=true for Kueue installations and creating an explicit dependency on the system node pool. The use of a tf_dependency_anchor is a solid approach to manage this race condition in Terraform. The changes are well-propagated through the various modules and examples. I've included a couple of suggestions to improve the robustness of a new output and to address an inconsistency introduced by hardcoding the wait flag, which will enhance the long-term maintainability of the module.

SwarnaBharathiMantena · 2026-01-29T09:18:20Z

Added release-breaking-changes label as the default value of wait is being updated.

GoogleCloudPlatform#5147) All the relevant tests passed successfully on running babysit.

agrawalkhushi18 added release-improvements Added to release notes under the "Improvements" heading. release-bugfix Added to release notes under the "Bug fixes" heading. labels Jan 28, 2026

gemini-code-assist bot reviewed Jan 28, 2026

View reviewed changes

Comment thread modules/management/kubectl-apply/main.tf

Comment thread modules/scheduler/gke-cluster/outputs.tf

agrawalkhushi18 changed the title ~~Updating wait flag and resolving helm_release deadlock destruction error~~ Update wait flag and resolving helm_release deadlock destruction error Jan 28, 2026

SwarnaBharathiMantena added the release-breaking-changes Prevents "smooth" re-deploy across versions label Jan 29, 2026

agrawalkhushi18 marked this pull request as ready for review January 30, 2026 07:04

agrawalkhushi18 requested review from a team and samskillman as code owners January 30, 2026 07:04

agrawalkhushi18 force-pushed the helm-destroy branch from 39e60a6 to df71088 Compare January 30, 2026 08:01

agrawalkhushi18 marked this pull request as draft January 30, 2026 08:02

agrawalkhushi18 force-pushed the helm-destroy branch from df71088 to 178bc98 Compare January 30, 2026 09:43

agrawalkhushi18 added 6 commits February 4, 2026 17:19

Updating wait flag and resolving helm_release deadlock destruction error

f9e7192

removing wait flag from the blueprint

8c8ac17

Adding gemini's suggestion to ceck system_node_pool_enabled

a3d6867

Removing blueprint changes

95fea2a

removing wait var from variables.tf of kubectl-apply[gemini]

8993af2

Updating destruction flow to handle build failures

457011f

agrawalkhushi18 force-pushed the helm-destroy branch from 3881b15 to 457011f Compare February 4, 2026 17:20

agrawalkhushi18 added 3 commits February 9, 2026 13:43

Adding a task to clean CQ and RF resources

d2d9e61

adding changes to the test files to delete all orphaned resources

5d0bba2

adding changes to the test files to delete all orphaned resources

6763a9f

agrawalkhushi18 requested review from SwarnaBharathiMantena, bytetwin and shubpal07 February 11, 2026 07:04

agrawalkhushi18 marked this pull request as ready for review February 11, 2026 07:05

bytetwin reviewed Feb 11, 2026

View reviewed changes

Comment thread modules/management/kubectl-apply/variables.tf

Comment thread modules/management/kubectl-apply/helm_install/variables.tf Outdated

SwarnaBharathiMantena requested changes Feb 11, 2026

View reviewed changes

Comment thread modules/management/kubectl-apply/variables.tf Outdated

Updating var name and description

877a6e5

agrawalkhushi18 requested review from SwarnaBharathiMantena and bytetwin February 11, 2026 17:19

agrawalkhushi18 mentioned this pull request Feb 12, 2026

Migrate configure_kueue from gavinbunney to helm #5129

Merged

bytetwin approved these changes Feb 12, 2026

View reviewed changes

SwarnaBharathiMantena approved these changes Feb 12, 2026

View reviewed changes

agrawalkhushi18 merged commit b50773b into GoogleCloudPlatform:develop Feb 12, 2026
16 of 85 checks passed

kadupoornima pushed a commit to kadupoornima/cluster-toolkit that referenced this pull request Feb 17, 2026

Update wait flag and resolving helm_release deadlock destruction error (

147ecc5

GoogleCloudPlatform#5147) All the relevant tests passed successfully on running babysit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update wait flag and resolving helm_release deadlock destruction error#5147

Update wait flag and resolving helm_release deadlock destruction error#5147
agrawalkhushi18 merged 10 commits intoGoogleCloudPlatform:developfrom
agrawalkhushi18:helm-destroy

agrawalkhushi18 commented Jan 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

SwarnaBharathiMantena commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

agrawalkhushi18 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

SwarnaBharathiMantena commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agrawalkhushi18 commented Jan 28, 2026 •

edited

Loading