Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(helm): Add expand env template for dependsOn, fix concurrent installation #9689

Merged
merged 2 commits into from
Jan 31, 2025

Conversation

idsulik
Copy link
Contributor

@idsulik idsulik commented Jan 28, 2025

Description

  1. Fixed the releases installation order to keep the original order for each level, it needs at least to keep the old behavior, because the v2.14.0 changed the order.
  2. Fixed an issue with the dependsOn, we don't expand the chart's name. Now we use env in the dependsOn chart's name, example:
    deploy:
      helm:
        concurrency: 0 #no-limit
        releases:
          - name: my-service-{{.MYUSER}}
            remoteChart: repo/php-nginx
            upgradeOnChange: true
            version: ~4.30.0
            setValues:
              image.pullPolicy: IfNotPresent
            wait: true
            dependsOn:
              - my-cron-{{.MYUSER}}
          - name: my-cron-{{.MYUSER}}
            remoteChart: repo/php-nginx
            upgradeOnChange: true
            version: ~4.30.0
            setValues:
              image.pullPolicy: IfNotPresent
            wait: true

@idsulik idsulik requested a review from a team as a code owner January 28, 2025 07:16
@idsulik idsulik requested a review from alphanota January 28, 2025 07:16
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello everyone,

I'm currently reviewing this pull request and will provide my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly understand the changes:

This pull request, submitted by @idsulik, aims to fix an issue with the dependsOn functionality in Skaffold's Helm deployer. The problem was that environment variables in the dependsOn chart names weren't being expanded correctly.

Intent:
The primary goal is to ensure that dependsOn references correctly resolve Helm release names containing environment variables.

Changes:
The changes are focused on the helm.go and util.go files within the pkg/skaffold/deploy/helm directory.

  • pkg/skaffold/deploy/helm/helm.go: The core logic of the Helm deployer has been modified. Specifically:
    • Lines 297-301: Removed unnecessary expansion of release names within the releaseNameToRelease map. This simplifies the code and addresses the root cause of the issue.
    • Lines 313-317: The loop iterating through releases now uses the original name instead of a potentially incorrectly expanded releaseName.
    • Lines 332-338: Added expansion of the release name after the chart path is expanded, ensuring that the correct name is used during deployment.
  • pkg/skaffold/deploy/helm/util.go: The BuildDependencyGraph function was simplified by removing the unnecessary expansion of release names in lines 19-23. This removes redundant code and improves efficiency.

Summary Table:

File Description Lines Changed
helm.go Core Helm deployer logic 297-301, 313-317, 332-338
util.go BuildDependencyGraph function simplification 19-23

I'll be providing a more detailed review shortly. Thanks for your contributions!

And now, a little haiku to brighten your day:

Env vars now expand,
Helm charts deploy smoothly,
Skaffold builds soar high.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request addresses an issue with the dependsOn functionality in Helm deployments by correctly expanding environment variables in chart names. The changes are well-implemented and improve the functionality. I've summarized the Google Go Style Guide elements referenced in this review below:

  • Function and method names: Should not use a Get or get prefix unless the underlying concept uses the word "get".
  • Error handling: Wrap errors with context using fmt.Errorf.
  • Logging: Use olog for logging and provide context in log messages.
  • Variable naming: Use descriptive names like releaseName instead of just name where appropriate for clarity.

Overall, the changes are well-done and address the issue effectively. The code is clear, concise, and follows Go best practices.

@pull-request-size pull-request-size bot added size/M and removed size/S labels Jan 28, 2025
@idsulik idsulik marked this pull request as draft January 28, 2025 08:27
@idsulik idsulik changed the title fix(helm): Add expand env template for dependsOn fix(helm): Add expand env template for dependsOn, fix concurrent installation Jan 28, 2025
@idsulik idsulik force-pushed the fix-dependson branch 2 times, most recently from ac07fca to 9130a4e Compare January 28, 2025 09:32
@idsulik idsulik changed the title fix(helm): Add expand env template for dependsOn, fix concurrent installation fix(helm): Add expand env template for dependsOn, fix release order Jan 28, 2025
@idsulik idsulik force-pushed the fix-dependson branch 2 times, most recently from fbb6d4a to f82ef4c Compare January 28, 2025 09:55
@idsulik idsulik marked this pull request as ready for review January 28, 2025 10:01
@idsulik
Copy link
Contributor Author

idsulik commented Jan 28, 2025

@alphanota hi! Could you please check the PR and publish the patch release? the v2.14.0 has some issues which I fixed in this PR. Tested it locally with/without dependencies, as I see it works correctly

@idsulik
Copy link
Contributor Author

idsulik commented Jan 29, 2025

@plumpy hi! check pls the PR, it's very important because the v2.14.0 breaks the release order and it can lead to unexpected behavior

@alphanota
Copy link
Contributor

alphanota commented Jan 29, 2025

@idsulik Can you explain what you mean by "keep the original order for each level, " as far as I can tell, internally dependencies that were on the same level were stored on a hashmap so by definition they had no order.

If this is trying to fix a bug, can you describe the bug and steps to reproduce so that we can confirm? Thanks!

@idsulik
Copy link
Contributor Author

idsulik commented Jan 29, 2025

@alphanota check the tests, the releases are split level by level , if there is no "dependsOn" in the config file, then all releases will be at the 0 level and if we have dependsOn, then there will be more levels. inside each level, the releases keep the original sorting(as they appear in the config file). the important part is to keep the order if there is no "dependsOn", otherwise the installation won't be stable(in our case with the new skaffold deploy failed because we had specific order in the config file, but the new version doesn't keep the order from the config file)

// DependencyGraph represents a graph of helm release dependencies
type DependencyGraph struct {
graph map[string][]string
releases []latest.HelmRelease
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alphanota here, we save releases as they're passed into the constructor

}

// Process releases in their original order
for _, release := range g.releases {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alphanota here, we process the releases by their original order

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, @idsulik. The comment you made is helpful in understanding the intent of the code. The current implementation in calculateDeploymentOrder iterates through g.releases to initiate the topological sort. This ensures that the original order of releases is preserved as much as possible during the sorting process. This approach is consistent with the description of maintaining the original order where possible, as stated in the function's documentation. No changes are needed here. I recommend adding a comment to the calculateDeploymentOrder function to explicitly state this behavior for better clarity. This will help future maintainers understand the design choice and avoid potential misunderstandings.

I also recommend adding a comment to the GetReleasesByLevel function to clarify that the original order of releases is preserved within each level. This is important because the function's description already mentions preserving the original order, but the implementation details aren't explicitly documented.

I'm using the Google Go Style Guide as a reference for my recommendations.

// groupReleasesByLevel groups releases by their dependency level while preserving
// the original order within each level
func (g *DependencyGraph) groupReleasesByLevel(order []string) map[int][]string {
levels := make(map[int][]string)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alphanota and, here we have a map, but the value is slice, and the slice will have the original order

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @idsulik, thanks for the clarification. Yes, the []string in the map preserves the original order because Go slices maintain insertion order. This is important for ensuring the deployment order respects the order specified in the releases list. No changes needed here. The comment is accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idsulik The purpose of this method is to organize the releases such that a release is installed only after all its dependencies are installed. All the releases in the same level do not depend on each other, so the order in which they are installed should not matter. Relying on releases on the same level to have a certain ordering seems to me like an anti-pattern. Can you explain why the order should matter in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the releases in the same level do not depend on each other, so the order in which they are installed should not matter

Why do you think so? They could depend on each other implicitly, and in my case, they depend on each other implicitly - first we have to install the release that comes first in the config order, then second, then third and so on

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alphanota raises a valid point about the implicit ordering assumption. While releases at the same level don't have direct dependencies within that level, the order between levels is crucial. The original order within each level is preserved to maintain the user-specified order of releases in the configuration. Changing this could lead to unexpected behavior if a release implicitly relies on another release's state or resources, even if there's no explicit dependsOn relationship. @idsulik's example of implicit dependencies highlights this scenario. The current approach ensures predictable and consistent deployment, adhering to the user's intended sequence. Therefore, I believe the current implementation is correct and the comment accurately reflects the behavior.

}
}

func TestOrderPreservationWithinLevels(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alphanota here, I wrote separate tests to check the order

@idsulik
Copy link
Contributor Author

idsulik commented Jan 29, 2025

@alphanota, you're right, I missed one place: the level-by-level iteration isn't correct because it iterates the map. I've just pushed fix for it, it extracts keys(levels), sorts them and iterates

@idsulik
Copy link
Contributor Author

idsulik commented Jan 29, 2025

the rolebinding release should be installed first, but v1.24.0 installs in random order not in the order they described in the config file

    deploy:
      helm:
        releases:
        - name: rolebinding-{{.BRANCH}}
          remoteChart: repo/config-helper
          upgradeOnChange: true
          valuesFiles:
          - ./branch/values.rolebinding.yaml
          version: ~4.1.0
        - name: redis-{{.BRANCH}}
          repo: https://charts.helm.sh/stable
          remoteChart: redis
          upgradeOnChange: true
          valuesFiles:
          - ./branch/values.redis.yaml
          version: ~10.4
          setValues:
            image.pullPolicy: IfNotPresent
            fullnameOverride: redis
          wait: true

@alphanota
Copy link
Contributor

the rolebinding release should be installed first, but v1.24.0 installs in random order not in the order they described in the config file

    deploy:
      helm:
        releases:
        - name: rolebinding-{{.BRANCH}}
          remoteChart: repo/config-helper
          upgradeOnChange: true
          valuesFiles:
          - ./branch/values.rolebinding.yaml
          version: ~4.1.0
        - name: redis-{{.BRANCH}}
          repo: https://charts.helm.sh/stable
          remoteChart: redis
          upgradeOnChange: true
          valuesFiles:
          - ./branch/values.redis.yaml
          version: ~10.4
          setValues:
            image.pullPolicy: IfNotPresent
            fullnameOverride: redis
          wait: true

This seems fine to me. rolebinding doesn't appear to be a dependency of redis so there's no need for it to be installed first? Keep in mind also the concurrency factor, they might even be installed in parallel. From what I could tell from the code pre-2.14 release, order was never guaranteed in the first place.

@idsulik
Copy link
Contributor Author

idsulik commented Jan 30, 2025

rolebinding doesn't appear to be a dependency of redis

you're wrong, the order matters, and redis depends on rolebinding, but not explicitly(I mean, we didn't use dependsOn because it was added only in v2.14.0)

Keep in mind also the concurrency factor, they might even be installed in parallel. From what I could tell from the code pre-2.14 release, order was never guaranteed in the first place.

There wasn't concurrency in the helm installation part before https://github.com/GoogleContainerTools/skaffold/blob/v2.13.2/pkg/skaffold/deploy/helm/helm.go#L265, it was added in v2.14.0

@idsulik
Copy link
Contributor Author

idsulik commented Jan 30, 2025

@alphanota Let’s assume the order is correct, but in v2.14.0, there’s another bug: when dependsOn is present, the first level will be set, but the second one won’t. This happens because errgroup is created outside the for loop, and after the first successful g.Wait(), the context gets canceled, preventing the second group of releases from being set. In this PR, I moved the errgroup creation inside the loop so that a new one is created for each group.

@idsulik
Copy link
Contributor Author

idsulik commented Jan 30, 2025

@alphanota I’ll try to explain the installation order again. Before version 2.14.0, there was no concurrency—each release was installed one after another in the order they appeared in the config. However, in v2.14.0, this order was broken. To enforce the correct order, dependsOn needed to be added, but the old behavior was disrupted, even though this was just a patch release.

In our project, the installation order of releases is crucial, and due to this change, deployments started failing after upgrading to v2.14.0. To fix this, I added the dependsOn option, but even that didn’t work because, on the second iteration (second level), errgroup canceled the context, preventing the next releases from being installed. We had to roll back to the previous version.

@alphanota
Copy link
Contributor

@idsulik Can you limit this PR to just the fix for the dependsOn bug? Lets adress the order thing in a separate PR. Thanks!

@idsulik
Copy link
Contributor Author

idsulik commented Jan 30, 2025

@alphanota done

@idsulik idsulik changed the title fix(helm): Add expand env template for dependsOn, fix release order fix(helm): Add expand env template for dependsOn, fix concurrent installation Jan 30, 2025
@idsulik
Copy link
Contributor Author

idsulik commented Jan 30, 2025

@alphanota created a separate PR for the order fix #9693

@alphanota
Copy link
Contributor

@idsulik Thanks for breaking this up, can you add unit test / and or integration test for this behavior? Thanks!

@idsulik
Copy link
Contributor Author

idsulik commented Jan 31, 2025

@alphanota pushed tests

Signed-off-by: Suleiman Dibirov <[email protected]>
@alphanota alphanota merged commit dcc573f into GoogleContainerTools:main Jan 31, 2025
11 checks passed
@idsulik idsulik deleted the fix-dependson branch January 31, 2025 19:30
@idsulik
Copy link
Contributor Author

idsulik commented Feb 4, 2025

@plumpy hi! Could you please release a patch version? because the latest version has an issue with dependsOn. Or, you're waiting for this PR #9693 ?

@plumpy
Copy link
Collaborator

plumpy commented Feb 4, 2025

Yeah, I want #9693 in there as well.

@idsulik
Copy link
Contributor Author

idsulik commented Feb 4, 2025

@plumpy great, it has just been approved

plumpy pushed a commit to plumpy/skaffold that referenced this pull request Feb 4, 2025
…tallation (GoogleContainerTools#9689)

* fix(helm): Add expand env template for dependsOn and fix concurrent installation

* tests

Signed-off-by: Suleiman Dibirov <[email protected]>

---------

Signed-off-by: Suleiman Dibirov <[email protected]>
plumpy pushed a commit to plumpy/skaffold that referenced this pull request Feb 4, 2025
…tallation (GoogleContainerTools#9689)

* fix(helm): Add expand env template for dependsOn and fix concurrent installation

* tests

Signed-off-by: Suleiman Dibirov <[email protected]>

---------

Signed-off-by: Suleiman Dibirov <[email protected]>
plumpy pushed a commit to plumpy/skaffold that referenced this pull request Feb 4, 2025
…tallation (GoogleContainerTools#9689)

* fix(helm): Add expand env template for dependsOn and fix concurrent installation

* tests

Signed-off-by: Suleiman Dibirov <[email protected]>

---------

Signed-off-by: Suleiman Dibirov <[email protected]>
plumpy added a commit that referenced this pull request Feb 4, 2025
…tallation (#9689) (#9707)

* fix(helm): Add expand env template for dependsOn and fix concurrent installation

* tests



---------

Signed-off-by: Suleiman Dibirov <[email protected]>
Co-authored-by: Suleiman Dibirov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants