Skip to content

fix(recipes): aws-efa hardcodes us-west-2 ECR; should template region (and partition) #764

Description

@mchmarny

Summary

recipes/components/aws-efa/values.yaml:19 pins the EFA device plugin image to 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/aws-efa-k8s-device-plugin. AWS publishes EKS add-on images regionally, in private regional ECR repositories that EKS nodes are auto-authorized to pull from. Hardcoding us-west-2 makes the recipe wrong for clusters in any other region.

Why it matters

AWS intentionally restricts EKS operational add-ons (EFA plugin, VPC CNI, kube-proxy, etc.) to per-region private ECRs for three reasons:

  1. Network isolation — pulls go over the AWS internal backbone, not the public internet.
  2. Cost & rate limits — sidesteps Docker Hub / public-registry rate limits and avoids NAT Gateway data-transfer charges that hit cross-region pulls.
  3. Availability — the add-on image lives in the same region as the cluster. If the public internet or another region is degraded, the cluster can still pull and scale.

A non-us-west-2 cluster running this recipe today either pays cross-region NAT egress to pull from us-west-2, or fails entirely if cross-region access is blocked.

Current state

recipes/components/aws-efa/values.yaml:

image:
  repository: 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/aws-efa-k8s-device-plugin

This shows up in the committed BOM (docs/user/container-images.md, after #763) as the canonical image AICR deploys. It's misleading — every customer outside us-west-2 actually wants a different URI.

Proposed fix

Template the region at bundle time using the existing --dynamic mechanism (already documented in docs/user/cli-reference.md):

# recipes/components/aws-efa/values.yaml
image:
  repository: 602401143452.dkr.ecr.{{ .region }}.amazonaws.com/eks/aws-efa-k8s-device-plugin
  • Default region: pick one (suggest us-east-1 as the most common starting point) or require explicit setting via --dynamic awsefa:region=<region>.
  • The account ID 602401143452 is stable across all standard AWS partitions and stays hardcoded.
  • Document the override in docs/integrator/recipe-development.md and the EKS guide.

Partition-aware (deferred)

Standard AWS uses 602401143452, but GovCloud and China partitions use different account IDs and the URI shape changes (amazonaws.com.cn for China). These are rare in the AICR user base today and add notable complexity (partition is implicit from region prefix but requires a lookup table). Capture as a follow-up rather than blocking this fix:

Partition Account ID URI suffix
aws (standard) 602401143452 amazonaws.com
aws-cn (China) 961992271922 (Beijing) / 961992271922 (Ningxia) amazonaws.com.cn
aws-us-gov (GovCloud) 013241004608 amazonaws.com

Also worth noting: the EFA plugin in particular has been moving to public visibility (public.ecr.aws/eks/aws-efa-k8s-device-plugin exists). Worth a sanity check before pinning the templated form — if Public ECR is now the canonical home, switching to that avoids regional templating entirely.

Acceptance criteria

  • recipes/components/aws-efa/values.yaml no longer hardcodes a region.
  • Default bundle for --service eks produces a valid image URI for the chosen default region.
  • aicr bundle --dynamic awsefa:region=<region> produces the correct per-region URI.
  • The committed BOM (docs/user/container-images.md) reflects either the templated form or a representative sample, with a note explaining region-templated images.
  • A test in pkg/bundler (or component test) verifies the templated value resolves correctly under both default and overridden region.
  • Recipe-development docs document the awsefa:region dynamic variable.

Out of scope

  • GovCloud / China partition support (filed-as-follow-up note above).
  • Auditing other recipes for similar regional hardcoding — aws-efa is the only EKS add-on AICR ships today; VPC CNI and kube-proxy are EKS-managed, not AICR components. A general audit can be a follow-up.

Metadata

Metadata

Assignees

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions